Knowasiak

Knowledge Social Network

Discovering the practical of two unsigned integers without overflow

44

Raymond

Discovering the practical of two unsigned integers, rounding in direction of zero, sounds easy:

```unsigned practical(unsigned a, unsigned b)
{
return (a + b) / 2;
}
```

On the different hand, this offers the defective solution in the face of integer overflow: To illustrate, if unsigned integers are 32 bits huge, then it says that `practical(0x80000000U, 0x80000000U)` is zero.

In case you know which number is the elevated number (which is on the total the case), then you definately would perhaps also calculate the width and halve it:

```unsigned practical(unsigned low, unsigned excessive)
{
return low + (excessive - low) / 2;
}
```

There’s another algorithm that doesn’t count on vivid which mark is elevated, the U.S. patent for which expired in 2016:

```unsigned practical(unsigned a, unsigned b)
{
return (a / 2) + (b / 2) + (a & b & 1);
}
```

The trick right here is to pre-divide the values forward of adding. This would perhaps also fair be too low if the brand new addition contained a carry from bit 0 to bit 1, which occurs if bit 0 is determined in each of the terms, so we detect that case and murder the fundamental adjustment.

And then there’s the methodology in the model identified as SWAR, which stands for “SIMD within a register”.

```unsigned practical(unsigned a, unsigned b)
{
return (a & b) + (a ^ b) / 2;
}
```

If your compiler supports integers elevated than the measurement of an `unsigned`, utter on fable of `unsigned` is a 32-bit mark however the native register measurement is 64-bit, or on fable of the compiler supports multiword arithmetic, then you definately would perhaps also cast to the elevated knowledge kind:

```unsigned practical(unsigned a, unsigned b)
{
// Explain "unsigned" is a 32-bit kind and
// "unsigned prolonged prolonged" is a 64-bit kind.
return ((unsigned prolonged prolonged)a + b) / 2;
}
```

The outcomes would watch something take care of this for processor with native 64-bit registers. (I apply the processor’s natural calling convention for what is in the simpler 32 bits of 64-bit registers.)

```// x86-64: Come by ecx = a, edx = b, better 32 bits unknown
mov     eax, ecx        ; rax = ecx zero-prolonged to 64-bit mark
mov     edx, edx        ; rdx = edx zero-prolonged to 64-bit mark
shr     rax, 1          ; 64-bit shift:    rax = rax >> 1
;                  consequence is zero-prolonged

// AArch64 (ARM 64-bit): Come by w0 = a, w1 = b, better 32 bits unknown
uxtw    x0, w0          ; x0 = w0 zero-prolonged to 64-bit mark
uxtw    x1, w1          ; x1 = w1 zero-prolonged to 64-bit mark
ubfx    x0, x0, 1, 32   ; Extract bits 1 by technique of 32 from consequence
; (shift + zero-lengthen in a single instruction)

// Alpha AXP: Come by a0 = a, a1 = b, each in canonical murder
insll   a0, #0, a0      ; a0 = a0 zero-prolonged to 64-bit mark
insll   a1, #0, a1      ; a1 = a1 zero-prolonged to 64-bit mark
addq    a0, a1, v0      ; 64-bit addition: v0 = a0 + a1
srl     v0, #1, v0      ; 64-bit shift:    v0 = v0 >> 1
addl    zero, v0, v0    ; Force canonical murder

// MIPS64: Come by a0 = a, a1 = b, signal-prolonged
dext    a0, a0, 0, 32   ; Zero-lengthen a0 to 64-bit mark
dext    a1, a1, 0, 32   ; Zero-lengthen a1 to 64-bit mark
daddu   v0, a0, a1      ; 64-bit addition: v0 = a0 + a1
dsrl    v0, v0, #1      ; 64-bit shift:    v0 = v0 >> 1
sll     v0, #0, v0      ; Signal-lengthen consequence

// Vitality64: Come by r3 = a, r4 = b, zero-prolonged
add     r3, r3, r4      ; 64-bit addition: r3 = r3 + r4
rldicl  r3, r3, 63, 32  ; Extract bits 63 by technique of 32 from consequence
; (shift + zero-lengthen in a single instruction)
; consequence in r3

// Itanium Ia64: Come by r32 = a, r4 = b, better 32 bits unknown
extr    r32 = r32, 0, 32    // zero-lengthen r32 to 64-bit mark
extr    r33 = r33, 0, 32 ;; // zero-lengthen r33 to 64-bit mark
add.i8  r8 = r32, r33 ;;    // 64-bit addition: r8 = r32 + r33
shr     r8 = r8, 1          // 64-bit shift:    r8 = r8 >> 1
```

Show that we must murder definite that the simpler 32 bits of the 64-bit registers are zero, so that any leftover values in bit 32 don’t infect the sum. The directions to zero out the simpler 32 bits would perhaps also fair be elided must you know sooner than time that they are already zero. That is general on x86-64 and AArch64 since those architectures naturally zero-lengthen 32-bit values to 64-bit values, but no longer general on Alpha AXP and MIPS64 on fable of those architectures naturally signal-lengthen 32-bit values to 64-bit values.

I gain it a laugh that the PowerPC, patron saint of ridiculous directions, has an instruction whose name nearly actually publicizes its ridiculousness: rldicl. (It stands for “rotate left doubleword by on the spot and effective left”.)

For 32-bit processors with compiler make stronger for multiword arithmetic, you cease up with something take care of this:

```// x86-32
mov     eax, a          ; eax = a
xor     ecx, ecx        ; Zero-lengthen to 64 bits
add     eax, b          ; Salvage low 32 bits in eax, space keep on overflow
adc     ecx, ecx        ; Salvage excessive 32 bits in ecx
; ecx:eax = 64-bit consequence
shrd    eax, ecx, 1     ; Multiword shift appropriate

// ARM 32-bit: Come by r0 = a, r1 = b
mov     r2, #0          ; r2 = 0
provides    r0, r1, r2      ; Salvage low 32 bits in r0, space keep on overflow
adc     r1, r2, #0      ; Salvage excessive 32 bits in r1
; r1:r0 = 64-bit consequence
lsrs    r1, r1, #1      ; Shift excessive 32 bits appropriate one scheme
; Bottom bit goes into carry
rrx     r0, r0          ; Rotate bottom 32 bits appropriate one scheme
; Lift bit goes into top bit

// SH-3: Come by r4 = a, r5 = b
; (MSVC 13.10.3343 code generation right here is rarely any longer truly that helpful)
clrt                    ; Certain T flag
mov     #0, r3          ; r3 = 0, zero-prolonged excessive 32 bits of a
addc    r5, r4          ; r4 = r4 + r5 + T, overflow goes into T bit
mov     #0, r2          ; r2 = 0, zero-prolonged excessive 32 bits of b
addc    r3, r2          ; r2 = r2 + r3 + T, calculate excessive 32 bits
; r3:r2 = 64-bit consequence
mov     #31, r3         ; Put together for left shift
shld    r3, r2          ; r2 = r2 << r3
shlr    r4              ; r4 = r4 >> 1
mov     r2, r0          ; r0 = r2
or      r4, r0          ; r0 = r0 | r4

// MIPS: Come by a0 = a, a1 = b
addu    v0, a0, a1      ; v0 = a0 + a1
sltu    a0, v0, a0      ; a0 = 1 if overflow came about
sll     a0, 31          ; Switch to bit 31
srl     v0, v0, #1      ; Shift low 32 bits appropriate one scheme
or      v0, v0, a0      ; Combine the 2 facets

// PowerPC: Come by r3 = a, r4 = b
; (gcc 4.8.5 -O3 code generation right here is rarely any longer truly that helpful)
mr      r9, r3          ; r9 = r3 (low 32 bits of 64-bit a)
mr      r11, r4         ; r11 = r4 (low 32 bits of 64-bit b)
li      r8, #0          ; r8 = 0 (excessive 32 bits of 64-bit a)
li      r10, #0         ; r10 = 0 (excessive 32 bits of 64-bit b)
addc    r11, r11, r9    ; r11 = r11 + r9, space keep on overflow
adde    r10, r10, r8    ; r10 = r10 + r8, excessive 32 bits of 64-bit consequence
rlwinm  r3, r10, 31, 1, 31 ; r3 = r10 >> 1
rlwinm  r9, r11, 31, 0, 0 ; r9 = r1 << 31
or      r3, r3, r9      ; Combine the two parts

// RISC-V: Assume a0 = a, a1 = b
add     a1, a0, a1      ; a1 = a0 + a1
sltu    a0, a1, a0      ; a0 = 1 if overflow occurred
slli    a0, a0, 31      ; Shift to bit 31
slri    a1, a1, 1       ; a1 = a1 >> 1
or      a0, a0, a1      ; Combine the 2 facets
```

Or must you contain access to SIMD registers that are elevated than the native register measurement, you may perchance well perchance perhaps also enact the math there. (Despite the indisputable fact that crossing the boundary from typical-reason register to SIMD register and encourage would perhaps also fair cease up too expensive.)

```// x86-32
unsigned practical(unsigned a, unsigned b)
{
auto a128 = _mm_cvtsi32_si128(a);
auto b128 = _mm_cvtsi32_si128(b);
auto avg = _mm_srli_epi64(sum, 1);
return _mm_cvtsi128_si32(avg);
}

movd    xmm0, a         ; Load a into bottom 32 bits of 128-bit register
movd    xmm1, b         ; Load b into bottom 32 bits of 128-bit register
psrlq   xmm1, 1         ; Shift 64-bit integer appropriate one scheme
movd    eax, xmm1       ; Extract bottom 32 bits of consequence

// 32-bit ARM (A32) has an "practical" instruction constructed inunsigned practical(unsigned a, unsigned b)
{
auto a64 = vdup_n_u32(a);
auto b64 = vdup_n_u32(b);
return vget_lane_u32(avg);
}

vdup.32 d16, r0         ; Broadcast r0 into each halves of d16
vdup.32 d17, r1         ; Broadcast r1 into each halves of d17
vhadd.u32 d16, d16, d17 ; d16 = practical of d16 and d17
vmov.32 r0, d16[0]      ; Extract consequence
```

However you may perchance well perchance perhaps also restful enact better, if easiest you had access to better intrinsics.

In processors that make stronger add-with-carry, you may perchance well perchance perhaps also gape the sum of register-sized integers as a (N + 1)-bit consequence, the achieve the bonus bit N is the carry bit. If the processor also supports rotate-appropriate-by technique of-carry, you may perchance well perchance perhaps also shift (N + 1)-bit consequence appropriate one scheme, getting better the appropriate practical without dropping the bit that overflows.

```// x86-32
mov     eax, a
rcr     eax, 1          ; Rotate appropriate one scheme by technique of carry

// x86-64
mov     rax, a
rcr     rax, 1          ; Rotate appropriate one scheme by technique of carry

// 32-bit ARM (A32)
mov     r0, a
provides    r0, b           ; Add, overflow goes into carry bit
rrx     r0              ; Rotate appropriate one scheme by technique of carry

// SH-3
clrt                    ; Certain T flag
mov     a, r0
addc    b, r0           ; r0 = r0 + b + T, overflow goes into T bit
rotcr   r0              ; Rotate appropriate one scheme by technique of carry
```

Whereas there is an intrinsic for the operation of “add two values and document the consequence besides as carry”, we don’t contain one for “rotate appropriate by technique of carry”, so we can get easiest midway there:

```unsigned practical(unsigned a, unsigned b)
{
#if outlined(_MSC_VER)
unsigned sum;
auto carry = _addcarry_u32(0, a, b, &sum);
return _rotr1_carry(sum, carry); // lacking intrinsic!
#elif outlined(__clang__)
unsigned carry;
auto sum = _builtin_adc(a, b, 0, &carry);
return _builtin_rotateright1throughcarry(sum, carry); // lacking intrinsic!
#else
#error Unsupported compiler.
#endif
}
```

We’ll contain to fraudulent it, alas. Right here’s one scheme:

```unsigned practical(unsigned a, unsigned b)
{
#if outlined(_MSC_VER)
unsigned sum;
auto carry = _addcarry_u32(0, a, b, &sum);
return (sum```

NOW WITH OVER +8500 USERS. folk can Be a half of Knowasiak without cost. Sign in on Knowasiak.com

WRITTEN BY

Vanic

“Simplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te Ching