Skip to content

Looking for a DjVu document

Thursday, 25 January 2007  |  pinotree

<img src="http://www.okular.org/screenies/okular-backend-djvu-1.thumb.png" align="right" width="110" height=86" hspace="10" /> As you might know, okular supports a number of file formats. One of the formats it supports is [w:DjVu|DjVu], as you can see in the screenshot.

Its implementaton works quite nicely, although the page pixmaps generation is still synchronous, and we can not extract text from DjVu documents yet, but these are problems we are working on, hoping to fix them soon.

In the implementation I wrote, I was able to extract almost all the kind of metadata in a documents: for example the table of contents, or hyperlinks, or also the text or line annotations (you did not know a DjVu document could have annotation, did you? ;-) ) What I'm missing to implement is the extraction from the metadata the information about author, year, title, etc., not because it's particularly difficult, but because I still miss a simple document test case with such kind of information.

So, basically, what I'm asking is if anyone of you have any documents with this information :) Knowing if a DjVu document has this information is really simple: use a simple DjVuLibre utility called djvused (usually packaged with DjVuLibre or in a separate djvulibre-bin, like Debian/Ubuntu) this way: djvused -e 'output-all' mydocument.djvu | grep '(metadata' If you get any output, then that document might be a nice candidate! If the document is not private, you could sent it to me. There's no real prize, just a big "Thanks!" and your name in the commit log of the feature :-)