Zero-reproduction community transmission with io_uring

Zero-reproduction community transmission with io_uring
This article dropped at you by LWN subscribers

Subscribers to LWN.web made this text — and all the things that
surrounds it — imaginable. Whereas you occur to bask in our mumble, please
clutch a subscription and invent the next
space of articles imaginable.

By Jonathan Corbet
December 30, 2021

When the operate is to push bits over the community as instant because the hardware can
dash, any overhead hurts. The rate of copying information to be transmitted
from particular person web page into the kernel would possibly also be in particular painful; it adds latency,
functional CPU time, and can unbiased even be exhausting on cache performance. So it is a long way
unsurprising that the developers working with io_uring, which is all about performance, non-public
became their consideration to zero-reproduction community transmission. This
patch space from Pavel Begunkov, now in its 2nd revision, appears to be like to be to be
greatly sooner than the


supported by most as a lot as the moment

As a reminder: io_uring is a reasonably unusual API for asynchronous I/O (and
connected operations); it became as soon as first merged less than three years in the past. Particular person
web page sets up a pair of spherical buffers shared with the kernel; the first
buffer is outmoded to
post operations to the kernel, while the 2nd receives the results when
operations complete. A suitably busy job that keeps the submission
ring beefy can build an indefinite option of operations with out wanting to
invent any system calls, which clearly improves performance. Io_uring additionally
implements the notion of “mounted” buffers and recordsdata; these are held initiating,
mapped, and ready for I/O within the kernel, saving the setup and teardown
overhead that’s otherwise incurred by every operation. It all adds as a lot as
a greatly sooner plan for I/O-intensive capabilities to work.

One thing that io_uring mute would no longer non-public is zero-reproduction networking,
even supposing the networking subsystem supports zero-reproduction operation
by the MSG_ZEROCOPY socket option. In principle, adding that
enhance is exclusively a matter of wiring up the integration between the 2
subsystems. In apply, naturally, there are just a few more dinky print to deal

A nil-reproduction networking implementation must non-public a technique to uncover
capabilities when any given operation is surely complete; the applying
can no longer reuse a buffer containing information to be transmitted if the kernel
is mute engaged on it. There would possibly be a refined level that’s connected right here:
the completion of a send()
name (as an illustration) would no longer suggest that the connected buffer is not very any longer
in employ. The operation “completes” when the tips has been well-liked into the
networking subsystem for transmission; the increased layers can also unbiased neatly be completed
with it, but the buffer itself should be sitting in a community
interface’s transmission queue. A nil-reproduction operation is most efficient surely completed
with its information buffers when the hardware has completed its work — and, for plenty of
protocols, when the distant perceive has acknowledged receipt of the tips. That
can occur lengthy after the operation that initiated the transfer has

So there needs to be a mechanism in which the kernel can uncover capabilities
that a given buffer would possibly also be reused. MSG_ZEROCOPY handles this by
returning notifications by the error queue connected with the socket — a
bit awkward, however it basically works.
Io_uring, as one more, already has a completion-notification mechanism in
web page, so the “basically complete” notifications fit in naturally. However there
are mute just a few complications consequently of the necessity to precisely uncover an
application which buffers would possibly also be reused.

An application doing zero-reproduction networking with io_uring will initiating by
registering no longer less than one completion context, using the
registration operation. The context itself is a easy constructing:

    struct io_uring_tx_ctx_register {
	__u64 notice;

The notice is a caller-chosen value outmoded to title this explicit
context in future zero-reproduction operations on the connected ring. There can
be a maximum of 1024 contexts connected with the ring; particular person
web page should register all of them with a single
IORING_REGISTER_TX_CTX operation, passing the buildings as an
array. An strive to register a 2nd space of contexts will fail except an
intervening IORING_UNREGISTER_TX_CTX operation has been completed to
eliminate the first space.

Zero-reproduction writes are initiated with the unusual IORING_OP_SENDZC
operation. As frequent, a space of buffers is handed to be written out to the
socket (which have to additionally be offered, obviously). Furthermore, every
zero-reproduction write must non-public a context connected with it, stored in the
submission queue entry’s user_data field. The context is
specified as an index into the array of contexts that became as soon as
registered beforehand (no longer because the notice connected with the context).
These writes will employ the kernel’s zero-reproduction mechanism when
imaginable and can unbiased “complete” in the identical old plan, with the identical old result in the
completion ring, perchance while the supplied
buffers are mute in employ.

To know that the kernel is performed with the buffers, the applying have to wait
for the 2nd notification informing it of that reality.
These notifications are no longer (by default) despatched for every zero-reproduction
operation that’s submitted; as one more, they’re batched into “generations”.
Every completion context has a series quantity that begins at zero.
A pair of operations would possibly also be connected with every technology; the
notification for that technology is despatched as soon as all the connected
operations non-public surely accomplished.

It is miles as a lot as particular person web page to uncover the kernel when to transfer on to a peculiar
technology; that’s completed by surroundings the IORING_SENDZC_FLUSH flag
in a 0-reproduction write ask of. The flag itself lives in the ioprio
field of the submission queue entry. The presence of this flag signifies
that the ask of being submitted is the leisure of basically the most as a lot as the moment technology; the
next ask of will initiating the unusual technology. Thus, if a separate
notification is wanted for every write ask of, IORING_SENDZC_FLUSH
needs to be space on every ask of.

When a given technology completes, the notification will cloak up in the
completion ring. The user_data field will bear the context
notice, while the res field will preserve the technology quantity. As soon as
the notification arrives, the applying can be in a location to safely reuse the
buffers connected with that technology.

The cease result looks to be rather unbiased correct; benchmarks incorporated in the quilt
letter counsel that io_uring’s zero-reproduction operations can build more than
200% better than MSG_ZEROCOPY. Principal of that development likely
comes from the facility to employ mounted buffers and recordsdata with io_uring,
reducing out noteworthy of the per-operation overhead. Most capabilities would possibly perchance well no longer
peer that roughly development, obviously; they’re no longer so closely dominated
by the rate of community transmission. If your endeavor is offering the
world with cat movies, although, zero-reproduction networking with io_uring is probably going
to be sharp.

For now, the unusual zero-reproduction operations are meticulously undocumented.
Begunkov has posted a
test application that would possibly also be read to ogle how the unusual interface
is meant to be outmoded. There non-public no longer been many feedback on this model
(the 2nd) of this series. Likely that can alternate after the vacations,
however it looks likely that this work is getting near willing for inclusion.

(Log in to post feedback)

NOW WITH OVER +8500 USERS. of us can Join Knowasiak with out cost. Register on
Read More

Ava Chan

Ava Chan

I'm a researcher at Utokyo :) and a big fan of Ava Max