Converting double-page scans from a book into a useful PDF ebookThis is the situation:
I'm using Kubuntu 12.04 LTS, so the versions of the programs I used are
Obtain PDFs with only left or right pages
Having the two facing pages together on one PDF page we need a solution to crop this PDF page to seperate the facing pages again (a process we will refer to as cropping). We to this using a tool called "Ghostscript". First of all, note that in PostScript dimensions are given in
In gimp the origin of the coordinate system is on the upper left corner, so to be useful for PostScript one has to substract these values from the page height of 707 pt, resulting in this table
gs -o left.pdf -sDEVICE=pdfwrite -c "[/CropBox [32 31 592 707] /PAGES pdfmark" -f thescan.pdf gs -o right.pdf -sDEVICE=pdfwrite -c "[/CropBox [544 31 1104 707] /PAGES pdfmark" -f thescan.pdfThe above always worked for me, but in some rare cases you might need other solutions. Check out this source for more information.
Assembly of the final PDFThe exact procedure depends on if you have just one PDF with scanned pages or multiple PDFs with scans.
Only one scan PDFFor successful duplex-printing of your PDF the first page has to be on the right (in the finished "book"). So we have to remove the first page in the PDF containing only the left pages. The last page in the PDF containing the right pages has to be removed to avoid having blank left pages in the final book.
When you think about it for a minute or so you will realize that no pages are lost: Consider a typical book. On the very first page of your scanned PDF the left page is always blank, only the right one is useful. And the last filled page will always be a left page.
gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -sOutputFile=left2.pdf -dFirstPage=2 left.pdf gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -sOutputFile=right2.pdf -dLastPage=20 right.pdf
What still has to be done is to combine/interleave these two files, the first page being from the left PDF. I did this by doing
mkdir x cd x pdftk ../left2.pdf burst output %04d_B.pdf pdftk ../right2.pdf burst output %04d_A.pdf pdftk *.pdf cat output out.pdfThis requires some explanation. The first two lines create an empty scratch directory for the following commands. Any existing pdf files in the directory would interfere with what we are going to do! The first pdftk command writes all pages in the PDF left2.pdf into files with names 0001_B.pdf, 0002_B.pdf, ..., 0020_B.pdf (here left2.pdf has 20 pages total). The last pdftk command then compiles the final PDF file ready for print. The trick is that *.pdf sorts the pages like 0001_A.pdf (first page from the right PDF), 0001_B.pdf (first page left PDF), 0002_A.pdf (second page right PDF), 0002_B.pdf (second page left PDF), ...
The file out.pdf should be what you want, ready for duplex-printing.
Multiple scan PDFsBegin with cropping the individual PDF files. Usually you will need different CropBox settings. This results in multiple PDFs containing the left and right pages of the different input files. I assume you called them left1.pdf, right1.pdf, left2.pdf, right2.pdf, ... Hence the A and the B in the filenames ensure that the right page (with the A) comes first. After cropping the six scan PDF files, my working directory looks like:
cl@clnb:/tmp/y$ ls -l total 198424 -rw-rw-r-- 1 cl cl 13306002 Jun 24 14:03 left1.pdf -rw-rw-r-- 1 cl cl 8775308 Jun 24 15:59 left2.pdf -rw-rw-r-- 1 cl cl 20898172 Jun 24 16:03 left3.pdf -rw-rw-r-- 1 cl cl 18837880 Jun 24 16:08 left4.pdf -rw-rw-r-- 1 cl cl 14141272 Jun 24 16:12 left5.pdf -rw-rw-r-- 1 cl cl 25616583 Jun 24 16:14 left6.pdf -rw-rw-r-- 1 cl cl 13306004 Jun 24 14:04 right1.pdf -rw-rw-r-- 1 cl cl 8775310 Jun 24 15:59 right2.pdf -rw-rw-r-- 1 cl cl 20898174 Jun 24 16:03 right3.pdf -rw-rw-r-- 1 cl cl 18837882 Jun 24 16:08 right4.pdf -rw-rw-r-- 1 cl cl 14141274 Jun 24 16:11 right5.pdf -rw-rw-r-- 1 cl cl 25616585 Jun 24 16:14 right6.pdf cl@clnb:/tmp/y$This time we have to remove the first page of left1.pdf and the last page of right6.pdf without disturbing the page order of the other pages. Plus we have to maintain the corrector ordering of the pages from all the PDF files!
mkdir x cd x pdftk ../left1.pdf burst output 1_%04d_A.pdf pdftk ../left2.pdf burst output 2_%04d_A.pdf pdftk ../left3.pdf burst output 3_%04d_A.pdf pdftk ../left4.pdf burst output 4_%04d_A.pdf pdftk ../left5.pdf burst output 5_%04d_A.pdf pdftk ../left6.pdf burst output 6_%04d_A.pdf pdftk ../right1.pdf burst output 1_%04d_B.pdf pdftk ../right2.pdf burst output 2_%04d_B.pdf pdftk ../right3.pdf burst output 3_%04d_B.pdf pdftk ../right4.pdf burst output 4_%04d_B.pdf pdftk ../right5.pdf burst output 5_%04d_B.pdf pdftk ../right6.pdf burst output 6_%04d_B.pdfThe prefixes 1_, 2_, ... make sure that the pages from different PDFs aren't mixed up. Of course, when you have more than 9 scan PDFs you will need prefixes of the type 01_, 02_, ...
Delete the first left page (from the first scan PDF) and the last right page (from the last scan PDF):
rm 1_0001_A.pdf 6_0055_B.pdfIn your case the filename for the last page will be different, use ls to find it out!
What is left is the assembly of the final PDF file:
pdftk *.pdf cat output out.pdf