Monday 18 April 2011

Turning a Blogger blog into a book

I had an adventure printing out one of my travel blogs for a friend who cannot read it online at the moment. Essentially I wanted to generate a PDF of my blog with posts in chronological order.


The best solution is the service Blogger partners with, blog2print. I am however a cheapskate, and besides I didn't want to wait for postal delivery. I needed it now.


Ignoring the chronological order requirement for the moment, I tried the obvious solution: print as PDF from a browser. No go. Blogger sites print badly. Chrome generated only the first entry. Firefox formatted it badly, putting an almost blank page at the beginning and more elsewhere. Konqueror cut off pieces near the bottom of the page, as did Opera.


Next I tried the web2pdf conversion service. This didn't generate any extra blank areas, but put page breaks right across photographs. No good.


More searching turned up various browser plugins or tools, which didn't work well. Also there were suggestions to customise the CSS of my blog to handle print media, which looked like too much work, but may be worth pursuing when I have time.


Eventually the solution I came up with was to install a local instance of Wordpress on my machine, run the Atom XML export from Blogger (which I always have on my machine anyway, for backup) through this converter, inport the XML, then print from the Wordpress page using the browser's print function. The output is acceptable. Now I only have to get Wordpress to display in chronological order. I've found a plugin that works with Wordpress 3.1 called default-sort-ascend. 


Another advantage of Wordpress is that the reader has control over the number of posts to show per page. I can therefore set it to a large number to get the whole blog. In Blogger, only the blog admin can change this.


There is an online service called ljbook, but this requires me to make my Wordpress site public. I'd rather do it all on my machine privately.


There are various Wordpress plugins for printing one or a group of posts as PDF, but I think they don't do any better than the print as PDF function of web browsers.


One tip I want to explore when I have more time is a toolchain converting the Atom XML to Docbook using PHP+XSLT and then FOP to PDF. Apparently this can generate very good quality output.


At this point I'd like to insert a gripe about Chrome 11. I can see no way to tell it to generate A4 PDF output. The "paper" size dropdown is disabled so I can only get Letter size PDFs. Print A4 to printer, yes it takes that from my environment settings. Someone please prove me wrong, otherwise I'm surprised that Google could miss this.


The next adventure, which isn't really to do with blog import was when I tried to print the PDF. It really is my problem because I have an old Lexmark E312 Postscript printer. When I tried to print the PDF from Okular, it would abort on a complicated page. I fared a little better with Acroread by selecting Level 2 Postcript but eventually a too-complex page would stymie the printer. I think it's not because my printer doesn't have enough memory; I have 12MB installed, which should be sufficient for a page at a time. I think it's because my printer only handles Level 2 Postscript and even then, splutters on any slight deviations from the standard, or perhaps has some implementation deficiencies.


Eventually my solution was to generate and print the Postscript a page at a time from the PDF using pdftops from Poppler. I don't really understand why this works. I surmise it's because pdftops makes sure to generate only standard Postscript and filters out any quirks in the PDF. Thank goodness there's more than one way to do things on Linux.