JAN
4
2008

Creole support

So I have been working on a Wiki parser in Qt and so far have most of the basic cases working. I tried two approaches one with a series of regexps and the other using a tokenizing parser.

Well the regexp one gave some very nice quick results but the corner cases proved too much for QRegExp and my utter lack of regexp-fu. Next was to try a tokenizing parser. This was amusing since I have not touched parsers in almost 10 years now, so it was mostly a refresher course. So far this has been slower progress, but things are more reliable and tests seem to be passing and staying passing. With the regexp approach things seemed kinda fragile. Now I am utterly useless with lex/yacc so I am still doing it by hand, but I have encapsulated things so that at a later date the grammar portion can be cleaned up and the rest can remain unchanged. Ideally when things take off here I can get more people interested who know more about parsers.

Once the XHTML backend is done then I can start on the QTextDocument backend. One nice thing about this approach is that one can basically provide a WYSIWYG wiki editor to any QTextEditor widget with one line of code. Damn it feels good to be a KDE Developer ;)

Comments

FYI the markup I am using is documented at: http://www.wikicreole.org.


By Ian Reinhart Geiser at Fri, 01/04/2008 - 13:37

no Mediawiki then...


By Jarosław Staniek at Fri, 01/04/2008 - 15:14

Its close enough that most markup won't be noticed. I toyed with MediaWiki support (there actually is no reason why we cant drop out the grammar part of the parser and replace it with a MediaWiki one though) but Creole was far simpler and had fewer inconsistencies. Now markup is a moot point for WYSIWYG, but the main feature of Wiki vs XML, or HTML is that its MUCH MUCH easier to diff and merge without trashing the output.


By Ian Reinhart Geiser at Fri, 01/04/2008 - 17:44

Since there are already parts of KDE that depend on boost, it might makes sense to take a look at Spirit (http://www.boost.org/libs/spirit/index.html). It's a C++ parser generator framework with EBNF like syntax.


By Christian Loose at Sat, 01/05/2008 - 10:34