about:benjie

Random learnings and other thoughts from an unashamed geek

From Wordpress to Octopress

| Comments

I had a few issues going from Wordpress to Octopress (which I did using exitwp) - I’ve outlined some of my solutions below.

Crossed out code blocks

The markdown produced by pandoc (an exitwp dependency) uses ~~~~ in places to denote a code block. However, rdiscount (Octopress’ markdown parser) recognises this instead as strike-through. A simple bash command fixes this (though it does lose some of the formatting details).

Fix tildes generated by pandoc (exitwptildes.sh) download
1
2
3
4
5
for I in *; do sed -i '' -e 's/^~~~~ {\(.*\)}$/\
<!-- \1 -->\
```/g' $I; done
for I in *; do sed -i '' -e 's/^~~~~$/\
```/g' $I; done

Line breaks in links

For some reason exitwp added some linebreaks into the markup for the links. And for some other reason, octopress’ style sheet (screen.css) tells <a> tags that their white-space should be pre-wrap, these two issues combine to give you broken links.

This next command should fix these newline issues. (It finds a [ with no matching ] on the same line and then removes the newline at the end of said line, being careful not to have two newlines next to one another.)

Fix newlines in links
1
2
for I in source/*.markdown source/*/*.markdown; do sed -E -i '' -e ':a' -e 'N' -e '$!ba' -e 's/\n?(\[[^]]*)\n([\t ]*)/\
\2\1 /g' "$I"; done

NOTE: You may need to run the above more than once.

Turns out that the above broke some image embeds - no worries - here’s the solution:

Fix image embeds that were broken by previous command
1
for I in *; do sed -i '' -e ':a' -e 'N' -e '$!ba' -e 's/\!\n\[/\![/g' $I; done;

Ordered/Unordered Lists

exitwp does not insert newlines before lists, and rdiscount does not always recognise it as a list unless it has a blank newline above it. Simply find the ‘1.’ entry and put a blank newline above it.

Tildes

rdiscount doesn’t do strikethrough with a single tilde, so it does not need escaping. However exitwp escapes them, so text like ~/Documents/ will become \~/Documents/

Random quotes

rdiscount and exitwp seem to disagree on indentation - causing rdiscount to render indentation (sometimes) as a quote rather than continuation of the text.

URLs with brackets in

exitwp or pandoc do not correctly escape (with a backslash) brackets within links, so captions/etc may be broken. Simply add in the backslashes.

/blog/archives/

I was looking for a line in _config.yml to configure the archives. Turns out it’s actually the file source/blog/archives/index.html that needs to move - I moved this to source/archives/index.html

Change /blog/archives/ to justlink
1
2
mv source/blog/archives source/
rmdir source/blog/

And then updated the navigation source/_includes/custom/navigation.html

Hosting on S3

I followed these instructions from Ian Wootten (which reference these instructions by Jerome Bernard) - thanks!

Disclaimer

This code in this post worked for me, but it is quite dumb - you will need to check the changes it makes are good (this is where git diff comes in handy - make sure to commit before running these commands!)

Comments