Skip to content

Improving KDE's support for Korean (and other CJK languages)

Saturday, 17 January 2015  |  eike hein

Hunminjeongeum
The Hunminjeongeum (or 훈민정음). This 1446 document first introduced the modern Korean writing system to the Korean people and is now listed among the UNESCO Memory of the World. (Photo: Jeon Han, CC BY-SA 2.0)

In addition to my usual work on things like Plasma, I've been hacking away on bugs that pose barriers to the use of the Korean language and writing system in KDE/Qt systems lately (I took up studying Korean as a new hobby). As a bonus, many fixes also tend to help out users of other CJK (Chinese, Japanese, Korean) languages, or even generally of languages other than English.

Localization defects come in many shapes and forms and tend to be fun puzzles to solve, with lots of vertical cross-cutting through the stack. Here's a few examples from my plate in the past half year:

Kate: Input method / IME / ibus support doesn't seem to work well in KF5 version and the resulting Qt: ibus plugin badly maps text attributes to QTextCharFormat

On Linux systems using IBus for complex writing system input (which is most of our desktops), Qt 5 would generate QInputMethodEvents with badly-formed formatting directives, causing pre-edit text (such as incomplete Hangul blocks) to turn invisible in some applications. This fixes text entry in Korean and any other language doing complex in-line composition in KWrite/Kate, KDevelop and other applications built on KDE's KTextEditor framework (and likely others).

Amarok: Podcast save location containing Korean characters are garbled

A fun mess where a chain of QString::arg() calls against a template string would go awry by replacing a URL with percent-encoded Korean into the template, causing subsequent calls to trip up on unexpected % placeholders. Your Korean podcasts will now work nicely again in Amarok.

KCodecs and kdelibs: KCharsets does not support CP949 encoding

Code page 949 is a superset of the EUC-KR character encoding. Introduced by Microsoft, it supports additional Korean characters not supported by EUC-KR and remains in use on some Korean websites and IRC networks (unfortunately - please switch to UTF-8!). QTextCodec gained support for CP949 in the Qt 4.x series, but our KCharsets was never sync'ed up to those enhancements, hiding the codec from selection UIs throughout KDE's products. The effective impact of this is reduced somewhat by Qt 5's support for ICU, which is smart enough to handle CP949 transparently in EUC-KR mode, but the situation was still confusing for users nonetheless (and left broken when running non-ICU builds of Qt).

This last one is interesting because patches to address this were actually supplied by the community in 2010 already, but they sat around unloved until recently despite not being very complicated - developers are often reluctant to engage in tickets like this because they feel out of their depth, or simply struggle with the setup necessary to reproduce a problem. I worry this may cause a bad feedback cycle of bugs not being reported by users who don't have the time or energy to educate developers about the problem space.

If you're a user of Korean (or other CJK languages) and KDE, please do report them. I'll be keeping an eye out. If you're a developer and struggling with Korean or CJK support in your application, you should consider getting in touch, too.