
PipeWire: The Linux audio/video bus
LWN.net needs you!Without subscribers, LWN would simply not exist. Please consider
signing up for a subscription and helping
to keep LWN publishing
March 2, 2021
This article was contributed by Ahmed S. Darwish
For more than a decade, PulseAudio
has been serving the Linux desktop as its predominant audio
mixing and routing daemon — and its audio API. Unfortunately,
PulseAudio’s internal architecture does not fit the growing
sandboxed-applications use case, even though there have been attempts to amend that. PipeWire, a new daemon created (in part)
out of these attempts, will replace
PulseAudio in the upcoming Fedora 34 release. It is a coming
transition that deserves a look.
Speaking of transitions, Fedora 8’s own switch to PulseAudio in
late 2007 was not a smooth one. Longtime Linux users still remember having
the daemon branded as the
software that will break your audio. After a bumpy start, PulseAudio
emerged as the winner of the
Linux sound-server struggles. It provided a native client audio API,
but also supported
applications that used the common audio APIs of the time — including the raw Linux ALSA
sound API, which typically allows only one application to
access the sound card. PulseAudio mixed the different applications’ audio streams and provided
a central point for audio management,
fine-grained configuration, and seamless routing to Bluetooth, USB, or
HDMI. It positioned itself as the Linux desktop equivalent of the user-mode
audio engine for Windows Vista
and the macOS CoreAudio daemon.
Cracks at PulseAudio
By 2015, PulseAudio was still enjoying its status as the de facto Linux
audio daemon, but cracks were beginning to develop. The gradual shift to
sandboxed desktop applications may be proving fatal to its design: with
PulseAudio, an application
can snoop on other applications’ audio, have unmediated access to the
microphone, or load server modules that can interfere with other
applications. Attempts were made at fixing PulseAudio, mainly through an
access-control
layer and a per-client memfd-backed
transport. This was all necessary but
not yet sufficient for isolating clients’ audio.
Around that time, David Henningson, one of the core PulseAudio developers, resigned
from the project. He cited frustrations over the daemon’s poor fit for the
sandboxed-applications use case, and its intermixing of mechanism and policy for
audio-routing decisions. At the end of his message, he wondered if the
combination of these problems might be the birth pangs of a new and much-needed Linux
audio daemon:
In software nothing is impossible, but to re-architecture PulseAudio to
support all of these requirements in a good way (rather than to “build
another layer on top” […]) would be very difficult, so my judgment
is that it would be easier to write something new from scratch.
And I do think it would be possible to write something that took the
best from PulseAudio, JACK, and AudioFlinger, and get something that
would work well for both mobile and desktop; for pro-audio, gaming,
low-power music playback, etc. […] I think we, as an open source
community, could have great use for such a sound server.
PulseVideo to Pinos
Meanwhile, GStreamer co-creator Wim Taymans was
asked to work on a Linux service to mediate web browsers’ access to camera
devices. Initially, he called the project PulseVideo. The idea behind
the name was simple: similar to the way PulseAudio was created to mediate access to
ALSA sound devices, PulseVideo was created to mediate and multiplex access
to the Video4Linux2 camera device nodes.
A bit later, Taymans discovered a similarly-named PulseVideo prototype
, created by William Manley, and helped in upstreaming
the GStreamer features required by its code. To avoid conflicts with the
PulseAudio name, and due to scope extension beyond just camera
access, Taymans later renamed the project to Pinos — in a reference to his
town of residence in Spain.
Pinos was built on top of GStreamer pipelines, using some of the
infrastructure that was earlier refined for Manley’s prototype. D-Bus with
file-descriptor passing was used for interprocess communication. At the
GStreamer 2015 conference, Taymans described the
Pinos architecture [PDF] to attendees and gave a demo of multiple applications
accessing the system camera feed in parallel.
Due to its flexible, pipeline-based, file-descriptor-passing architecture, Pinos
also supported media broadcasting in the other direction: applications could
“upload” a media stream by passing a memfd or dma-buf
file descriptor. The media stream can then be further processed and
distributed to other applications and system multimedia
sinks
like ALSA sound devices.
While only discussed in passing, the ability to send streams in both
directions and across applications allowed Pinos to act as a generic
audio/video bus — efficiently funneling media between isolated, and possibly
sandboxed, user processes. The scope of Pinos (if properly extended) could
thus overlap with, and possibly replace, PulseAudio. Taymans was explicitly
asked
that
question , and he answered: “Replacing PulseAudio is not
an easy task; it’s not on the agenda […] but
[Pinos] is very broad, so it could do more later.”
As the PulseAudio deficiencies discussed in the earlier section became more
problematic, “could do more later” was not a far-off target.
PipeWire
By 2016, Taymans started rethinking the foundations of Pinos, extending its
scope to become the standard Linux audio/video daemon. This included the
“plenty of tiny buffers” low-latency audio use case typically covered by JACK. There were two main areas that
needed to be addressed.
First, the hard dependency on GStreamer elements and pipelines for the
core daemon and client libraries proved problematic. GStreamer has
plenty of behind-the-scenes logic to achieve its flexibility. During the
processing of a GStreamer pipeline, done within the context of Pinos
realtime threads, this flexibility came at the cost of implicit memory
allocations, thread creation, and locking. These are all actions that are
well-known to negatively affect
the predictability needed for hard realtime code.
To achieve part of the GStreamer pipelines’ flexibility while still
satisfying hard realtime requirements, Taymans created a simpler multimedia
pipeline framework and called it SPA — the Simple
Plugin API [PDF]. The
framework is designed to be safely executed from realtime
threads (e.g. Pinos media processing threads), with a specific time budget
that should never be exceeded. SPA performs no memory allocations; instead,
those are the sole responsibility of the SPA framework
application.
Each node has a well-defined
set of states. There is a state for configuring the node’s ports,
formats, and buffers — done by the main (non-realtime) thread, a state
for the host to allocate all the necessary buffers required by the node
after its configuration, and a separate state where the actual
processing is done in the realtime threads. During streaming, if any of
the media pipeline nodes change state (e.g. due to an event), the realtime
threads can be notified so that control is switched back to the main thread
for reconfiguration.
Second, D-Bus was replaced as the IPC protocol. Instead, a native
fully asynchronous protocol that was inspired by Wayland — without the XML
serialization part — was implemented over Unix-domain sockets. Taymans
wanted a protocol that is simple and hard-realtime safe.
By the time the SPA framework was integrated and a native IPC protocol
was developed, the project had long-outgrown its original purpose: from a
D-Bus daemon for sharing camera access to a full realtime-capable
audio/video bus. It was thus renamed again, to PipeWire — reflecting its new status as
a prominent pipeline-based engine for multimedia sharing and processing.
Lessons learned
From the start, the PipeWire developers applied an essential set of lessons from
existing audio daemons like JACK, PulseAudio, and the Chromium
OS Audio Server (CRAS). Unlike PulseAudio’s intentional division of
the Linux audio landscape into consumer-grade versus professional realtime
audio, PipeWire was designed from the start to handle both. To avoid the
PulseAudio sandboxing limitations, security was baked-in: a per-client
permissions bitfield is attached to every PipeWire node — where one or more
SPA nodes are wrapped. This security-aware design allowed easy and safe
integration with Flatpak
portals; the sandboxed-application permissions interface now promoted
to a freedesktop XDG standard.
Like CRAS and PulseAudio, but unlike JACK, PipeWire uses timer-based
audio scheduling. A dynamically reconfigurable timer is used for
scheduling wake-ups to fill the audio buffer instead of depending on a
constant rate of sound card interrupts. Beside the power-saving benefits,
this allows the audio daemon to provide dynamic latency: higher for
power-saving and consumer-grade audio like music playback; low for
latency-sensitive workloads like professional audio.
Similar to CRAS, but unlike PulseAudio, PipeWire is not modeled on top of
audio-buffer rewinding. When timer-based audio scheduling is used with
huge buffers (as in PulseAudio), support for rewriting the sound card’s
buffer is needed to provide a low-latency response to unpredictable events like a new audio
stream or a stream’s volume change.
The big buffer already sent to the audio device must be revoked and a new
buffer needs to be submitted.
This has resulted in significant code complexity and
corner cases [PDF]. Both PipeWire and CRAS limit the maximum
latency/buffering to much lower values — thus eliminating the need for
buffer rewinding altogether.
Like JACK, PipeWire chose an
external-session-manager setup. Professional audio users typically build
their own audio pipelines in a session-manager application like
Catia or QjackCtl, then let the audio
daemon execute the final result. This has the benefit of separating policy
(how the media pipeline is built) from mechanism (how the audio daemon
executes the pipeline). At GUADEC 2018, developers explicitly asked
Taymans to let GNOME, and possibly other external
daemons, take control of that part of the audio stack. Several system
integrators had already run into problems because PulseAudio embeds audio-routing policy
decisions deep within its internal modules code. This was also one of the
pain points mentioned in Henningson’s resignation e-mail.
Finally, following the trend of multiple influential system daemons
created in the last decade, PipeWire makes extensive use of
Linux-kernel-only APIs. This includes memfd, eventfd, timerfd, signalfd,
epoll, and dma-buf — all of which make the “file-descriptor” the primary
identifier for events and shared buffers in the system. PipeWire’s
support for importing dma-buf file descriptors was key in implementing
efficient Wayland screen
capture and recording. For large 4K and 8K screens, the CPU does not
need to touch any of the massive GPU buffers: GNOME mutter (or similar
applications) passes a dma-buf descriptor that can then be integrated into
PipeWire’s SPA pipelines for further processing and capturing.
Adoption
The native PipeWire API has been declared stable since the project’s major 0.3
release. Existing raw ALSA applications are supported through a
PipeWire ALSA
plugin. JACK applications are supported through a re-implementation
of the JACK client libraries and the pw-jack
tool if both native and PipeWire JACK libraries are installed in
parallel. PulseAudio applications are supported through a
pipewire-pulse daemon
that listens to PulseAudio’s own socket and implements its native
communication protocol. This way, containerized desktop applications that
use their own copy of the native PulseAudio client libraries are still
supported. WebRTC, the communication
framework (and code) used
by all major browsers, includes native
PipeWire support for Wayland screen sharing — mediated through a
Flatpak portal.
The graph below shows a PipeWire media pipeline, generated using pw-dot
then slightly beautified, on an Arch Linux system. A combination of
PipeWire-native and PulseAudio-native applications is shown:
On the left, both GNOME Cheese and a GStreamer
pipeline instance
created with
gst-launch-1.0
are accessing the same camera feed in parallel. In the
middle, Firefox is sharing the system screen (for a Jitsi meeting) using WebRTC and Flatpak
portals. On the right, the Spotify music player (a PulseAudio app) is
playing audio, which is routed to the system’s default ALSA sink — with
GNOME Settings (another PulseAudio app) live-monitoring the Left/Right
channel status of that sink.
On the Linux distributions side of things, Fedora has been shipping the
PipeWire daemon (only for Wayland screen capture) since its Fedora
27 release. Debian offers PipeWire packages, but
replacing PulseAudio or JACK is “an unsupported use case.” Arch Linux
provides PipeWire in its
central repository and officially offers extra packages for replacing both
PulseAudio and JACK, if desired. Similarly, Gentoo provides extensive documentation
for replacing both daemons. The upcoming Fedora
34 release will be the first Linux distribution that will have PipeWire
fully replace PulseAudio by default and out of the box.
Overall, this is a critical period in the Linux multimedia scene. While
open source is a story about technology, it’s also a story about the people
hard at work creating it. There has been a notable agreement from both PulseAudio
and JACK
developers that PipeWire and its author are on the right track. The
upcoming Fedora 34 release should provide a litmus test for PipeWire’s Linux
distributions adoption moving forward.
(Log in to post comments)
Read More
Share this on knowasiak.com to discuss with people on this topicSign up on Knowasiak.com now if you’re not registered yet.
Responses