Discovering the practical of two unsigned integers, rounding in direction of zero, sounds easy:

unsigned practical(unsigned a, unsigned b) { return (a + b) / 2; }

On the different hand, this offers the defective solution in the face of integer overflow: To illustrate, if unsigned integers are 32 bits huge, then it says that `practical(0x80000000U, 0x80000000U)`

is zero.

In case you know which number is the elevated number (which is on the total the case), then you definately would perhaps also calculate the width and halve it:

### Featured Content Ads

add advertising hereunsigned practical(unsigned low, unsigned excessive) { return low + (excessive - low) / 2; }

There’s another algorithm that doesn’t count on vivid which mark is elevated, the U.S. patent for which expired in 2016:

unsigned practical(unsigned a, unsigned b) { return (a / 2) + (b / 2) + (a & b & 1); }

The trick right here is to pre-divide the values forward of adding. This would perhaps also fair be too low if the brand new addition contained a carry from bit 0 to bit 1, which occurs if bit 0 is determined in each of the terms, so we detect that case and murder the fundamental adjustment.

And then there’s the methodology in the model identified as SWAR, which stands for “SIMD within a register”.

unsigned practical(unsigned a, unsigned b) { return (a & b) + (a ^ b) / 2; }

If your compiler supports integers elevated than the measurement of an `unsigned`

, utter on fable of `unsigned`

is a 32-bit mark however the native register measurement is 64-bit, or on fable of the compiler supports multiword arithmetic, then you definately would perhaps also cast to the elevated knowledge kind:

### Featured Content Ads

add advertising hereunsigned practical(unsigned a, unsigned b) { // Explain "unsigned" is a 32-bit kind and // "unsigned prolonged prolonged" is a 64-bit kind. return ((unsigned prolonged prolonged)a + b) / 2; }

The outcomes would watch something take care of this for processor with native 64-bit registers. (I apply the processor’s natural calling convention for what is in the simpler 32 bits of 64-bit registers.)

// x86-64: Come by ecx = a, edx = b, better 32 bits unknown mov eax, ecx ; rax = ecx zero-prolonged to 64-bit mark mov edx, edx ; rdx = edx zero-prolonged to 64-bit mark add rax, rdx ; 64-bit addition: rax = rax + rdx shr rax, 1 ; 64-bit shift: rax = rax >> 1 ; consequence is zero-prolonged ; Reply in eax // AArch64 (ARM 64-bit): Come by w0 = a, w1 = b, better 32 bits unknown uxtw x0, w0 ; x0 = w0 zero-prolonged to 64-bit mark uxtw x1, w1 ; x1 = w1 zero-prolonged to 64-bit mark add x0, x1 ; 64-bit addition: x0 = x0 + x1 ubfx x0, x0, 1, 32 ; Extract bits 1 by technique of 32 from consequence ; (shift + zero-lengthen in a single instruction) ; Reply in x0 // Alpha AXP: Come by a0 = a, a1 = b, each in canonical murder insll a0, #0, a0 ; a0 = a0 zero-prolonged to 64-bit mark insll a1, #0, a1 ; a1 = a1 zero-prolonged to 64-bit mark addq a0, a1, v0 ; 64-bit addition: v0 = a0 + a1 srl v0, #1, v0 ; 64-bit shift: v0 = v0 >> 1 addl zero, v0, v0 ; Force canonical murder ; Reply in v0 // MIPS64: Come by a0 = a, a1 = b, signal-prolonged dext a0, a0, 0, 32 ; Zero-lengthen a0 to 64-bit mark dext a1, a1, 0, 32 ; Zero-lengthen a1 to 64-bit mark daddu v0, a0, a1 ; 64-bit addition: v0 = a0 + a1 dsrl v0, v0, #1 ; 64-bit shift: v0 = v0 >> 1 sll v0, #0, v0 ; Signal-lengthen consequence ; Reply in v0 // Vitality64: Come by r3 = a, r4 = b, zero-prolonged add r3, r3, r4 ; 64-bit addition: r3 = r3 + r4 rldicl r3, r3, 63, 32 ; Extract bits 63 by technique of 32 from consequence ; (shift + zero-lengthen in a single instruction) ; consequence in r3 // Itanium Ia64: Come by r32 = a, r4 = b, better 32 bits unknown extr r32 = r32, 0, 32 // zero-lengthen r32 to 64-bit mark extr r33 = r33, 0, 32 ;; // zero-lengthen r33 to 64-bit mark add.i8 r8 = r32, r33 ;; // 64-bit addition: r8 = r32 + r33 shr r8 = r8, 1 // 64-bit shift: r8 = r8 >> 1

Show that we must murder definite that the simpler 32 bits of the 64-bit registers are zero, so that any leftover values in bit 32 don’t infect the sum. The directions to zero out the simpler 32 bits would perhaps also fair be elided must you know sooner than time that they are already zero. That is general on x86-64 and AArch64 since those architectures naturally zero-lengthen 32-bit values to 64-bit values, but no longer general on Alpha AXP and MIPS64 on fable of those architectures naturally *signal*-lengthen 32-bit values to 64-bit values.

I gain it a laugh that the PowerPC, patron saint of ridiculous directions, has an instruction whose name nearly actually publicizes its ridiculousness: rldicl. (It stands for “rotate left doubleword by on the spot and effective left”.)

For 32-bit processors with compiler make stronger for multiword arithmetic, you cease up with something take care of this:

// x86-32 mov eax, a ; eax = a xor ecx, ecx ; Zero-lengthen to 64 bits add eax, b ; Salvage low 32 bits in eax, space keep on overflow adc ecx, ecx ; Salvage excessive 32 bits in ecx ; ecx:eax = 64-bit consequence shrd eax, ecx, 1 ; Multiword shift appropriate ; Reply in eax // ARM 32-bit: Come by r0 = a, r1 = b mov r2, #0 ; r2 = 0 provides r0, r1, r2 ; Salvage low 32 bits in r0, space keep on overflow adc r1, r2, #0 ; Salvage excessive 32 bits in r1 ; r1:r0 = 64-bit consequence lsrs r1, r1, #1 ; Shift excessive 32 bits appropriate one scheme ; Bottom bit goes into carry rrx r0, r0 ; Rotate bottom 32 bits appropriate one scheme ; Lift bit goes into top bit ; Reply in r0 // SH-3: Come by r4 = a, r5 = b ; (MSVC 13.10.3343 code generation right here is rarely any longer truly that helpful) clrt ; Certain T flag mov #0, r3 ; r3 = 0, zero-prolonged excessive 32 bits of a addc r5, r4 ; r4 = r4 + r5 + T, overflow goes into T bit mov #0, r2 ; r2 = 0, zero-prolonged excessive 32 bits of b addc r3, r2 ; r2 = r2 + r3 + T, calculate excessive 32 bits ; r3:r2 = 64-bit consequence mov #31, r3 ; Put together for left shift shld r3, r2 ; r2 = r2 << r3 shlr r4 ; r4 = r4 >> 1 mov r2, r0 ; r0 = r2 or r4, r0 ; r0 = r0 | r4 ; Reply in r0 // MIPS: Come by a0 = a, a1 = b addu v0, a0, a1 ; v0 = a0 + a1 sltu a0, v0, a0 ; a0 = 1 if overflow came about sll a0, 31 ; Switch to bit 31 srl v0, v0, #1 ; Shift low 32 bits appropriate one scheme or v0, v0, a0 ; Combine the 2 facets ; Reply in v0 // PowerPC: Come by r3 = a, r4 = b ; (gcc 4.8.5 -O3 code generation right here is rarely any longer truly that helpful) mr r9, r3 ; r9 = r3 (low 32 bits of 64-bit a) mr r11, r4 ; r11 = r4 (low 32 bits of 64-bit b) li r8, #0 ; r8 = 0 (excessive 32 bits of 64-bit a) li r10, #0 ; r10 = 0 (excessive 32 bits of 64-bit b) addc r11, r11, r9 ; r11 = r11 + r9, space keep on overflow adde r10, r10, r8 ; r10 = r10 + r8, excessive 32 bits of 64-bit consequence rlwinm r3, r10, 31, 1, 31 ; r3 = r10 >> 1 rlwinm r9, r11, 31, 0, 0 ; r9 = r1 << 31 or r3, r3, r9 ; Combine the two parts ; Answer in r3 // RISC-V: Assume a0 = a, a1 = b add a1, a0, a1 ; a1 = a0 + a1 sltu a0, a1, a0 ; a0 = 1 if overflow occurred slli a0, a0, 31 ; Shift to bit 31 slri a1, a1, 1 ; a1 = a1 >> 1 or a0, a0, a1 ; Combine the 2 facets ; Reply in a0

Or must you contain access to SIMD registers that are elevated than the native register measurement, you may perchance well perchance perhaps also enact the math there. (Despite the indisputable fact that crossing the boundary from typical-reason register to SIMD register and encourage would perhaps also fair cease up too expensive.)

// x86-32 unsigned practical(unsigned a, unsigned b) { auto a128 = _mm_cvtsi32_si128(a); auto b128 = _mm_cvtsi32_si128(b); auto sum = _mm_add_epi64(a128, b128); auto avg = _mm_srli_epi64(sum, 1); return _mm_cvtsi128_si32(avg); } movd xmm0, a ; Load a into bottom 32 bits of 128-bit register movd xmm1, b ; Load b into bottom 32 bits of 128-bit register paddq xmm1, xmm0 ; Add as 64-bit integers psrlq xmm1, 1 ; Shift 64-bit integer appropriate one scheme movd eax, xmm1 ; Extract bottom 32 bits of consequence // 32-bit ARM (A32) has an "practical" instruction constructed inunsigned practical(unsigned a, unsigned b) { auto a64 = vdup_n_u32(a); auto b64 = vdup_n_u32(b); auto avg = vhadd_u32(a64, b64); // hadd = half of of add (practical) return vget_lane_u32(avg); } vdup.32 d16, r0 ; Broadcast r0 into each halves of d16 vdup.32 d17, r1 ; Broadcast r1 into each halves of d17 vhadd.u32 d16, d16, d17 ; d16 = practical of d16 and d17 vmov.32 r0, d16[0] ; Extract consequence

However you may perchance well perchance perhaps also restful enact better, if easiest you had access to better intrinsics.

In processors that make stronger add-with-carry, you may perchance well perchance perhaps also gape the sum of register-sized integers as a (`N` + 1)-bit consequence, the achieve the bonus bit `N` is the carry bit. If the processor also supports rotate-appropriate-by technique of-carry, you may perchance well perchance perhaps also shift (`N` + 1)-bit consequence appropriate one scheme, getting better the appropriate practical without dropping the bit that overflows.

// x86-32 mov eax, a add eax, b ; Add, overflow goes into carry bit rcr eax, 1 ; Rotate appropriate one scheme by technique of carry // x86-64 mov rax, a add rax, b ; Add, overflow goes into carry bit rcr rax, 1 ; Rotate appropriate one scheme by technique of carry // 32-bit ARM (A32) mov r0, a provides r0, b ; Add, overflow goes into carry bit rrx r0 ; Rotate appropriate one scheme by technique of carry // SH-3 clrt ; Certain T flag mov a, r0 addc b, r0 ; r0 = r0 + b + T, overflow goes into T bit rotcr r0 ; Rotate appropriate one scheme by technique of carry

Whereas there is an intrinsic for the operation of “add two values and document the consequence besides as carry”, we don’t contain one for “rotate appropriate by technique of carry”, so we can get easiest midway there:

unsigned practical(unsigned a, unsigned b) { #if outlined(_MSC_VER) unsigned sum; auto carry = _addcarry_u32(0, a, b, &sum); return _rotr1_carry(sum, carry); // lacking intrinsic! #elif outlined(__clang__) unsigned carry; auto sum = _builtin_adc(a, b, 0, &carry); return _builtin_rotateright1throughcarry(sum, carry); // lacking intrinsic! #else #error Unsupported compiler. #endif }

We’ll contain to fraudulent it, alas. Right here’s one scheme:

unsigned practical(unsigned a, unsigned b) { #if outlined(_MSC_VER) unsigned sum; auto carry = _addcarry_u32(0, a, b, &sum); return (sum

NOW WITH OVER +8500 USERS. __ folk can Be a half of Knowasiak without cost. Sign in on Knowasiak.com__

Read More