Skip to content

Linux desktop shell IPC: Wayland vs. D-Bus, and the lack of agreement on when to use them

Sunday, 11 October 2020  |  eike hein

On the Linux desktop today, we have two dominant IPC technologies in use between applications and the desktop environment: Wayland and D-Bus. While created for different reasons, both are generically extensible and can be used to exchange data, synchronize state and send requests and signals between peers. A large number of desktop use cases are implemented using either technology, and some use cases are already distributed across both of them. The status quo is mostly the result of organic growth, with individual implementation choices down to tech friction or the lack thereof.

For some use cases the choice of which to use is not obvious. This is one of the factors still slowing down the standardization and hence adoption of Wayland-based sessions currently.

The overlap

While the semantics of Wayland and D-Bus are really pretty different, both enable the exchange of structured, typed data between peers along with RPC-shaped uses. While D-Bus supports many more connection topologies than just one process talking to another, for applications interacting with the desktop environment via protocols both parties agree on the difference is small and usually hidden by toolkit abstractions.

More importantly, both technologies were created to service the same key users - desktop environments and application toolkits - and widely adopted protocols layered over both transports deal in some of the same primitives coming from that ecosystem. For example, .desktop file names as stable application ids are referenced in both the Wayland xdg-shell protocol as well as well as <a href=">'s <a href-">notifications protocol.

The current usage split between the two protocols is a result of tech friction and little planning - some things are easier to talk about on a Wayland connection, others are easier to talk about on the D-Bus. It's down to the protocols we already have in hand now, adoption in various codebases and a spectrum of opinions.

Tech friction

Wayland (cool book link this time!) is the designated successor to the venerable X windowing system. Born from the same community, it's certainly informed by some of X's successes, but also many of the pain points experienced by implementers of X over the years. A lot of the advances in Wayland relate to the particular problems of windowing and presentation, but its heritage also did much to set the scene for revisiting what really belongs into the core windowing system and what doesn't. D-Bus and even its own direct predecessors did not exist for much of X's long and storied history. Conversely, in the Wayland world it has become a lot harder (in terms of scrutiny applied by the community) to get a desktop feature into a widely-adopted spec than it was in X, which for a long time was the only widely adopted transport medium in place.

D-Bus is a far more generic IPC/RPC technology supporting a wider variety of connection patterns between parties. Service activation through the bus, multicast signals open to any participant, pervasive introspection of interfaces - you won't find much of this in Wayland, and D-Bus is the latest in a chain of technologies driven by genuine needs for such capabilities.

There's a third element to the discussion, and it's the rise of the standards ecosystem, broadly promoting interoperability between desktop environments and the portability of apps between them. Put on a timeline, and D-Bus happened a decent number of years prior to the arrival of Wayland - D-Bus, therefore, has a headstart in being the medium of choice for specs and fd.o standards being referenced in protocol and service designs.

In the end:

  • Wayland is the natural transport between the application and the compositor, one of the key services of the desktop environment. Windowing being the goal, a lot of the protocols in place now make it easy for the two parties to converse at the level of windows. In xdg-shell, the application can also let the compositor know its application id as part of window metadata. This is one of several established ways for the compositor to know what application it's talking to. (A more recent aid is mapping from the client to a cgroup set up by the application launcher, e.g. Flatpak or Plasma, which can improve the security of this type of authentication. In both Wayland and D-Bus, you sometimes see this need - and much more - addressed by inserting a protocol proxy like xdg-dbus-proxy or Sommelier, respectively.)
  • D-Bus is the transport used for many other app-to-service interactions, e.g. notifications. On D-Bus it's often convenient to find the application id of a participant, but it's hard to relate the conversation to the windowing system. For example, efforts to relate a D-Bus notification event to a particular window to jump to are fairly recent and not yet widely implemented.

The decision vacuum

There's a particular snag in the timeline: With D-Bus arriving on the scene a lot later than X, a lot of interactions between apps and the desktop environment were spec'd into the X medium in the past. For example, various forms of window or broadly application state (e.g. requesting user attention) or requesting focus/activation from the window manager. No one thought to put onto D-Bus what was widely adopted and working already, although sometimes more comprehensive replacements gravitated towards using D-Bus instead and managed to get traction.

With X being replaced, a lot of the stuff in ICCCM and EWMH/NetWM is now up for grabs for either Wayland- or D-Bus-based specs.

Current trends: Plasma vs. Gnome vs. wlroots/Sway

Wayland is the natural choice for a compositor service to talk to apps. D-Bus is the most popular option for other desktop environment services to talk to apps. In some desktop environments, the compositor and other services may live in the same process (Gnome Shell in particular), others distribute services over few or many processes.

In Plasma, the compositor and the desktop shell are two seperate processes. The compositor implements the window management policy, and the desktop shell process draws your wallpaper, your panels and your menus. The notifications service lives in the Plasma desktop shell process as well. This architecture is a straight port-over from the X way of doing things, but it remains advantageous - for example, a crash in the shell won't bring your compositor (and then your apps) down with it. In fact, we're aiming for still more process isolation in future generations of Plasma.

The compositor and the shell being in different processes requires them to synchronize state. The shell also needs to pose requests to the compositor. In Plasma, this communication happens through a set of Plasma-specific Wayland protocols. This is a good example of a friction-driven implementation choice: As the shell and the compositor want to mainly converse about windows, using Wayland posed the least friction. On the other hand, in cases of the compositor talking to support processes such as the screen configuration service or the System Settings app, the choice of transport varies - Wayland protocols in some cases, D-Bus interfaces for others (notably virtual desktops configuration). Overall, we don't have a clear IPC usage policy at the moment.

For other desktop environments I cannot speak with any true authority, especially make no claims about implementation policy. Perhaps owing to the single-process shell architecture, I see our friends at Gnome using D-Bus-based protocols in a few places where we use Wayland-based ones, e.g. for screen configuration (edit: lack of friction between D-Bus and XDG desktop portals has been pointed out to me as another factor; makes sense).

The wlroots community (consisting of Sway and lots of other users) seems to broadly prefer Wayland over D-Bus to run protocols through. It's easy to surmise why - wlroots' raison d'être is to enable building shells on top of Wayland, and in particular shells built up of interoperating pieces made by distinct authors. Enabling this interoperability through the community's Wayland-based tooling must come naturally, and the wlroots community is no doubt leading this particular effort at the moment. Some of these protocols are great, and it's likely Plasma will implement more of them in the future.

Unresolved cases

An example where the lack of a clear choice has been slowing the desktop community down is focus/activation requests. Consider the following common usage pattern: The user clicks a link in a chat app. It opens in the default browser. For convenience, the user might want the browser window to be raised to the front now, the click being an instruction to the system to form a smooth workflow from one app to the other.

It might come as a surprise, but 12 years in, there's no good, widely-adopted solution to making this work in Wayland-based desktops.

It should not come as a surprise, however, that this an example handled in the past by an X-based protocol that's now out of the picture. The X way of doing things wasn't very robust - it put a lot of trust into the application posting a legitimate request to be activated, and it required the window manager to implement complicated heuristics to filter out illegitimate or just ill-timed requests. These heuristics are collectively known as "focus stealing prevention", e.g. sticking to the current window while the user is typing into it. For now, Wayland-based desktops by and large are skating by on even poorer semantics - giving focus to any new window while relying on heuristics, with partial solutions to the problem of already-existing windows not pervasively implemented across apps.

It's a good example of a case where the community really doesn't want to miss a chance to get it right this time. While trying to do so, a lot of more or less related desktop features have been looked at as well - for example, the X convention of apps placing attention hints on their windows to blink them in the taskbar, to be cancelled by the window manager when the window is raised and focused. Also the case of applications posting notification events through D-Bus carrying a link back to particular window, e.g. to jump from a chat message alert bubble to the specific conversation window.

Over the last couple of years, many different approaches have been proposed and tried, and we still don't have the One True Spec.

There seems to be fairly solid agreement on the need for a token exchange mechanism to clear the focus handover in the compositor - the chat app requests a token from the compositor, hands it to the browser handling the link (e.g. through an environment variable) and the browser may use it to request focus from the compositor. But there remains a lot of division over whether the activation request (along with other use cases) should be layered over the D-Bus-based notification spec, or whether all of this should remain solidly Wayland territory:

  • The argument for D-Bus is that a mechanism to relate notification events to particular windows (by including a window handle in the metadata) is needed anyway, and once that's in place, it's easy to add in the focus token. Using special notification events to inform about window-specific ongoings is an extensible pattern potentially spanning all of activation and attention-seeking, along with things like taskbar job progress meters and others. The event spec also already has fields useful for accessibility - imagine a system where a blind user gets a "app X wants attention for Y" readout, without needing to craft a new protocol to tell the compositor specifics about Y. On the other hand, going this route clearly also requires some spec work for existing notification service implementations to be able to do the right thing when processing these special events.
  • The argument for Wayland is that the above requires a lot of translation forth and back (or rather, import and export of handles and tokens), and in the case of multi-process service architectures, expensive chaining of IPC. Plus it requires a Wayland-based desktop environment to use D-Bus for a core use case, while an X11-based desktop environment could do without a second transport for the same features. For some projects, this is an unattractive expansion of their current dependencies.

In conclusion

I suspect the above example of focus/activation requests will ultimately be addressed by a token exchange via Wayland, and the notification spec way of doing things will be implemented alongside it as well, rather than picking one way of doing things. And perhaps that's fine.

But it's worth stopping for a moment and being conscious of what's going on. We would all benefit from some commonly agreed-upon guidelines on where the scopes of Wayland and D-Bus end in our application platform, and where they overlap. Where does the windowing system start and end? Where should new protocols go? We also want to be smart in spec'ing out how the two mediums relate to each other, and making translations from one of the other safe and robust.

In addition to our regular comm tools of forges and merge requests, a natural venue for these kinds of discussions is the Linux App Summit, an event co-hosted by KDE and Gnome. LAS 2020 is just one month away. Perhaps it's a good chance for a sit-down about these things.