Researching the state of PDF manipulation tools in the world of Free Software (2): PDFedit
Yes, pinotree, PDFedit is one of the two applications I discovered a few weeks ago when I searched Google for PDF manipulation tools... :-) (I'm really curious if you'd know about the other one already -- but that's a few days away to blog about. Today is about PDFedit.)
[image:2656 align=right hspace=6 vspace=6 border=1 size=thumbnail] I'm not aware that there are already "official" packages in any of the major distros. So getting your hands on PDFedit requires you to start up your compiler.... Wait. There is also a klik package. If you have the klik client installed, use this link: klik://pdfedit or visit the pdfedit page on the klik server (see also this wiki page with some more links).
PDFedit is already a quite advanced application. It has a GUI, and the GUI is based on Qt (click thumbnail to see more). Some of its internal PDF processing engine is based on Xpdf. However, when I say "GUI", don't expect a "displays-PDF-pages-and-lets-you-click-on-a-shape-or-glyph-shown-and-you-can-drag-it-to-a-different-place" type of tools. Such things do exist on Windows or Mac, but PDFedit is not (yet?) such one.
However, when I tested it, PDFedit was able to parse even the most complicated PDFs I had lying on my harddisk. When it opens a file, it will present you in a tree structure access to all objects, dictionaries, streams, meta-data and all key:value pairs that are inside. You need to know already a bit about PDF in order to start "editing" it.
You can search for text strings. And add to the text, or replace words, correct typos.
Actually, that is what I did at the beginning of December, when my first encounter with PDFedit was hardly 48 hours old. I had gotten from our office this important PDF file to take to a customer when I visited him. It was kind of a contract, prepared by our secretary and my boss, to be signed by him. I was already on-site, when I discovered a really embarassing typo in the name used for the customer's IT director. (No, I won't translate it -- but believe me, it *was* embarassing.) There was no chance in hell to get access to the original Microsoft Word document and again run it through Acrobat Distiller to create a new version. But luckily, I had a Knoppix with me. And a link to the internet. I fetched the klik-ed PDFedit program, opened the problem file with it, and ... made the problem go away. I even found another typo, with a date not being correct. And, for the fun of it, I removed all traces of its MS Word origins from the PDF meta data and pretended it was made by "D**ka PDF Exporter". It was done in less than 2 minutes. And luckily, it worked flawlessly for me. Then I handed the PDF to the customer on a USB stick and asked him to print it out. The day was saved...
[Here's a paraphrazing of a phonecall to my boss while I travelled back that day, late in the evening. He asked how the day was, and I asked him to fetch and open said document and read aloud the customer's name underneath the signature line. I could hear him gasp, and not just once. ;-)
He asked, hesitating: "Did you let him sign?" -- "Yes, of course; why not?"
"Did he retain a copy?" -- "Sure. As he should!"
"Did he say anything? Did he notice?" -- "No, not that I knew..."
He was really nervous. "When did *you* notice it?" -- "Hmm, just about 15 minutes before he signed?"
"What do we do if he realizes it?" -- "He won't."
"But it's obvious.... I'll phone him tomorrow morning, and offer my deep apologies." -- "Don't. He won't notice a thing."
The next morning, I showed him what I had done, and how. (He will not trust me any more with any PDF I'll send him, that's for sure now.) But now back to a short description of PDFedit features...]
Of course, PDFedit can do more than just change a typo here and there.
Advanced users and people who are savvy in PDF will go to the bare metal, and use the application to change raw PDF objects. Beginners will probably prefer to use predefined GUI functions. The best thing is this: Functions can be easily added -- in PDFedit everything is based on a scripting language(QSA by Trolltech).
It can even create completely new PDF files. With the help of scripting, PDFedit can be used as PDF creator. Start with an arbitrary empty PDF file (call it a "template" if you want) created by any drawing tool. Then use PDFedit to add more objects into it. (I don't say this is easy, or intuitive. But it is also a great way to learn more about PDFs). Creating PDFs with sophisticated text layout is limited by the fact that (for now) PDFedit does not support other fonts than the "13 standard" ones.
So please, if you have not yet looked at PDFedit, go get it via klik://pdfedit (unless you are a Gentoo user, and Lucky You can "emerge" it....).
PDFedit is developed by a group of people in the Czech republic. If you visit the PDFedit website, you'll notice immediately -- but the link to the English version is also easy to find.
What is really good is their quite exhaustive documentation. Go, look yourself:
- PDFedit Scripting API: http://pdfedit.petricek.net/pdfedit.appendix
- PDFedit Design Document: http://pdfedit.petricek.net/pdfedit.design_doc
- PDFedit User Documentation: http://pdfedit.petricek.net/pdfedit.user_doc
- PDFedit Screenshots: http://pdfedit.petricek.net/pdfedit.ss_e
In a Wiki, they collect even more stuff:
- Howto use PDFedit without installation (the trick is "klik")
- How to add your script to PDFedit (step by step example)
- PDFedit screenshots
- How to add accented text
- How to create PDF to whatever conversion filters (for developers)
- Howto convert PDF to XML
- PDFedit Toolbar List
My first impression is, that the PDFedit developers consist are a group of students, and their PDFedit work was somehow closely related to a university project. That was confirmed by someone (can't remember: tsdgeos? pinotree?) in IRC/#kpdf when I asked (around beginning of December) if anybody knew already about PDFedit. The "someone" said the project seemed to be stalled since a few months, ostensibly because of the developers not getting academic bonus points any more for it.
However, I think this is not true. On November 8th they moved their CVS repository to Sourceforge. On December 13th they released version 0.2.3 (which added XML export from PDF). Very recently, on 18th of January, they submitted PDFedit as an entry to KDE Appsy. The last commit to the Changelog is barely 2 hours old indicating they are working on a 0.2.4 release :-)
Oh, and it's GPL-licensed, of course. The developers don't mention to target Windows with it (despite their Qt foundation), they seem to be focussed exclusively on Linux and *BSD.
All in all, PDFedit looks like a very promising project.
(Next in this series: another new, exciting PDF manipulation tool for Linux, which hardly anyone has ever heard about...)