Mixed character set text
Not long ago, the various language spheres engaged in computing maintained seperate, private standards for encoding the characters and symbols used in their written communication. This made it hard or impossible to author documents (or provide a user interface) mixing characters from multiple spheres. Unicode and its various encodings have mostly addressed this problem (though not to everyone's satisfaction, keeping some narrower encodings alive for some time yet), which has been wonderful for information sharing across the globe.
However, we have a similar situation to the old incompatible-encodings mess on the presentation layer. Font files typically only include glyphs for one or a small number of writing systems, e.g. Latin and its offshoots, or the Korean Hangul alphabet and the hanja sometimes used by Korean speakers. There are efforts to create more comprehensive "Unicode fonts" (e.g. Google's Noto family), but it's likely they'll remain few: Type designers tend to stick to the characters they're intimately familiar with, and most type faces are designed by small, localized teams sitting in this or that country. Trying to design glyphs for an alphabet you haven't used is a tall order (though it would be awesome to see more designers try and rise to the challenge).
Even monolingual documents increasingly source from more than one character set: Emoji are a thing, and it's getting bigger every day. Right now, few fonts provide their own emoji glyphs; they're usually stored separately (more on that in a moment).
Here's a screenshot of the font settings for the Plasma Desktop workspace:
Applications run inside Plasma Desktop inherit these settings. For example, Konversation, KDE's IRC client, will default to using that "General" font to display chat messages. Konversation also has a config knob to deviate from the system default and use a custom font, of course; it looks very similar.
Now consider a mixed-language, mixed-alphabet chat: English, written using the Latin alphabet, will use the glyphs from the wonderful Fira Sans. But Fira Sans doesn't contain any glyphs for the Hangul alphabet, so what will happen to a message containing any Korean text? Or any emoji, for that matter?
Enter glyph substitution: The font stack on Linux - and another platform we'll look at alongside, Windows - is smart enough to handle the case of a missing requested glyph, and supports sourcing it from a different font instead.
How this fallback lookup happens is governed by fontconfig configuration files. Fontconfig allows one to specify one or more aliases for a font family, and in case of missing glyphs, the system will try and locate them in one of the families specified as aliases.
MyLatinFont can have
MyHangulFont as an alias in the system, and Konversation will wind up displaying that Korean text using the Hangul glyphs from
MyHangulFont, even when nominally configured to use
MyLatinFont. This is roughly the same mechanism used to make generic family names like
Sans Serif work and map them to actual, specific fonts.
In practice, this means your typical desktop-y Linux distribution maintains a large set of configuration files in
/etc/fonts/ trying to provide reasonable fallbacks, and trying to anticipate what font packages you might install. Windows actually works almost exactly the same way - there's an alias mechanism, controlled by the registry, with some default mappings hardcoded into Uniscribe and others getting installed by Windows' language packs.
The problem with this is the lack of user control. Neither Plasma Desktop nor Windows provide any graphical means to influence glyph substitution behavior. You get to set one font per UI role, and what happens in case of missing glyphs is left to distro-maintained config, or your hand-written config files (on my system, I maintain an elaborate
~/.config/fontconfig/fonts.conf for this reason). There's no easy way to pick "your Korean font" or "your emoji font" if your font settings point to Latin-focused fonts.
This only affects a few users now, but it's users in the very interesting position of bridging multiple language spheres, a use case I feel we should care deeply about in KDE. And with the rise of Unicode-based emoji, it's a growing problem.
Mulling a solution
So I've been wondering how we could improve the font settings in Plasma Desktop to address this problem. One approach would be adding an "Advanced" sub-dialog exposing the fontconfig alias machinery in a very straight-forward way, loading in the system-level mappings and allowing user-local variation of them. But this would clearly result in a very user-unfriendly, highly complex interface.
I'm thinking a better fix would be to enhance our font management UI to let users create virtual user fonts: We add a Create Font button bringing up a dialog that lets you specify a name, a base font, and an ordered list of fallback fonts. Once saved, this virtual font becomes available in the font settings of both the workspace and applications, under the user-chosen name.
This would allow users to easily compose a
MyChatFont sourcing from different real fonts, for different character sets including emoji. Implementation-side, it would (mostly) work by writing out aliases to the user-level
fonts.conf (which we already write things like rendering settings to today).
There's a bunch of traits to this approach that I find compelling: Users are already familiar with the idiom of picking a font from the list, and this builds on it by simply letting them expand the list with custom entries. Reverting settings is as easy as switching back to a real font. It's easy to set up different sets of settings (in the form of different user fonts) and use them in different apps. It stays out of the hair of the distro's alias config as much as possible. And it's extensible should means of font composition other than aliases (e.g. based on language-tagging) come along.
Putting it into action
I'm hoping to take a stab at implementing this pretty soon, to try it out in practice. But I'd love to hear your feedback on the general problem and the proposed solution - if you have any experience with how other workspaces tackle this problem (I haven't found any attempt yet) or are in possession of a piece of the solution already, please get in touch!