WordPress to Jekyll: converting gallery shortcodes
As I move along with my Jekyll/Octopress transition, I’m working to make the move as clean as possible. I’m importing my WordPress database rather than starting fresh, and I’ll be sharing tidbits of discoveries as I go. These posts will only be of interest to people making similar transitions, but they’ll also be serving as notes for myself and Google search results for people up against the same conundrums.
I have heavily extended the Octopress Rakefile, and built an almost entirely-new WordPress import module. The importer now…
- converts my Download Monitor shortcodes to actual download links, respecting the format parameters and including descriptions, versions and titles.
- generates an .htaccess file with redirects from my old permalink structure for every post it imports.
- replaces shortcodes from my gist plugin with Octopress formatting
- replaces
[ caption]
and<img>
tags, maintaining classes, alt and title attributes and alignment settings - replaces YouTube shortcodes with YouTube embed code
- updates multiple formats of code blocks to standard fenced code with language specifier where it finds one
- replaces video and audio shortcodes with Octopress format and HTML5 embed, respectively.
- strips out some extra markup I used to compensate for elements of my WordPress theme
- gathers slug, redirect alias, tags, categories and a custom “series” plugin data as YAML front matter
- locates WordPress gallery shortcodes and replaces them with all of the included attachments as an unordered list of thumbnails linked to their full size images
It’s that last item that I’ll share today. The input is any content that includes [ gallery]
code (with optional extra parameters). The output is Markdown, with some extra Kramdown syntax. I think it might also work with Maruku, but you may have to adjust depending on your chosen Jekyll Markdown interpreter.
Basically, if it detects [ gallery]
codes in the post, it runs a query for all “attachment” posts with the current post as the parent. It passes those to a function that replaces the single [ gallery]
code with a full Markdown list of the images, using WordPress’ automatically generated thumbnails as the visible image, and linking them to the full-size upload. If the 150x150 thumbnail doesn’t exist, it uses sips to create it. If you want to alter the thumbnail process, see the sips commands in the thumbnail_image
function.
This snippet is added as part of the WordPress module in lib/jekyll/migrators/wordpress.rb
. I’ve rebuilt it completely in a new module. Because so much of the code is specific to my own plugins and content, I probably won’t post the entire file, but I’ll pull out the useful bits for incorporation into your own.
To kick it off, here’s the code for the [ gallery]
replacer:
# create a 150x150 thumbnail of the passed image using `sips`
# img is an absolute path to the base image file
def thumbnail_image(img)
return_dir = Dir.pwd
Dir.chdir(File.dirname(img))
width = %x{sips -g pixelWidth #{img.strip}|tail -n 1}.gsub(/pixelWidth: /,'').strip.to_i
height = %x{sips -g pixelHeight #{img.strip}|tail -n 1}.gsub(/pixelHeight: /,'').strip.to_i
thumb_name = img.strip.gsub(/^(.*?)(\..{3,4})$/,"\\1-150x150\\2").strip
thumb = File.expand_path(thumb_name)
FileUtils.cp(File.expand_path(img),thumb)
type = width > height ? '--resampleHeight' : '--resampleWidth'
%x{sips #{type} 150 #{thumb} && sips -c 150 150 #{thumb}}
Dir.chdir(return_dir)
return thumb_name
end
# `content` is a passed string containing post_content
# `attachments` is an array of hashes containing 'title' and 'url' for each attachment on the post
# replace_galleries uses kramdown syntax for attributes and classes, adjust as needed
def replace_galleries(content, attachments)
images = "\n" # unordered list of thumbnails linked to images as references
imagerefs = "\n" # block of reference defenitions
counter = 1
content.gsub!(/\[gallery.*?\]/) do |gall|
attachments.each do |att|
image = att['url'].gsub(/^#{@domain}\/wp-content/,'').sub(/-\d+x\d+\./,'.')
thumb = image.sub(/(.*?)(\..{3,4})/,'\\1-150x150\\2')
assets_dir = Dir.pwd+"/source"
return gall unless File.directory?(assets_dir+File.dirname(thumb))
unless File.exists?(assets_dir+thumb)
thumb = thumbnail_image(assets_dir+image)
puts thumb
end
title = att['title']
images += %Q{* [![#{title}][img#{counter}thumb]{: width="150" height="150"}][img#{counter}]\n}
imagerefs += %Q{[img#{counter}thumb]: #{thumb}\n}
imagerefs += %Q{[img#{counter}]: #{image} '#{title}'\n}
counter += 1
end
images + "{:.gallery}\n" +imagerefs + "\n"
end
content
end
# `content` is the post_content field for the row of the WordPress database query being looped
# `px` is a variable containing the table prefix in the WordPress database
# `db` is a Sequel.mysql object
if content =~ /\[gallery/
attachments = []
gquery =
"SELECT
posts.guid AS `attachment`,
posts.post_title AS `title`
FROM
#{px}posts AS `posts`
WHERE
posts.post_parent = '#{post[:ID]}' AND
posts.post_type = 'attachment'"
db[gquery].each do |a|
attachments << { 'url' => a[:attachment], 'title' => a[:title] }
end
replace_galleries(content, attachments) unless attachments.empty?
end