Discovering the practical of two unsigned integers without overflow

44
Discovering the practical of two unsigned integers without overflow

Raymond

Discovering the practical of two unsigned integers, rounding in direction of zero, sounds easy:

unsigned practical(unsigned a, unsigned b)
{
    return (a + b) / 2;
}

On the different hand, this offers the defective solution in the face of integer overflow: To illustrate, if unsigned integers are 32 bits huge, then it says that practical(0x80000000U, 0x80000000U) is zero.

In case you know which number is the elevated number (which is on the total the case), then you definately would perhaps also calculate the width and halve it:

unsigned practical(unsigned low, unsigned excessive)
{
    return low + (excessive - low) / 2;
}

There’s another algorithm that doesn’t count on vivid which mark is elevated, the U.S. patent for which expired in 2016:

unsigned practical(unsigned a, unsigned b)
{
    return (a / 2) + (b / 2) + (a & b & 1);
}

The trick right here is to pre-divide the values forward of adding. This would perhaps also fair be too low if the brand new addition contained a carry from bit 0 to bit 1, which occurs if bit 0 is determined in each of the terms, so we detect that case and murder the fundamental adjustment.

And then there’s the methodology in the model identified as SWAR, which stands for “SIMD within a register”.

unsigned practical(unsigned a, unsigned b)
{
    return (a & b) + (a ^ b) / 2;
}

If your compiler supports integers elevated than the measurement of an unsigned, utter on fable of unsigned is a 32-bit mark however the native register measurement is 64-bit, or on fable of the compiler supports multiword arithmetic, then you definately would perhaps also cast to the elevated knowledge kind:

unsigned practical(unsigned a, unsigned b)
{
    // Explain "unsigned" is a 32-bit kind and
    // "unsigned prolonged prolonged" is a 64-bit kind.
    return ((unsigned prolonged prolonged)a + b) / 2;
}

The outcomes would watch something take care of this for processor with native 64-bit registers. (I apply the processor’s natural calling convention for what is in the simpler 32 bits of 64-bit registers.)

// x86-64: Come by ecx = a, edx = b, better 32 bits unknown
    mov     eax, ecx        ; rax = ecx zero-prolonged to 64-bit mark
    mov     edx, edx        ; rdx = edx zero-prolonged to 64-bit mark
    add     rax, rdx        ; 64-bit addition: rax = rax + rdx
    shr     rax, 1          ; 64-bit shift:    rax = rax >> 1
                            ;                  consequence is zero-prolonged
                            ; Reply in eax

// AArch64 (ARM 64-bit): Come by w0 = a, w1 = b, better 32 bits unknown
    uxtw    x0, w0          ; x0 = w0 zero-prolonged to 64-bit mark
    uxtw    x1, w1          ; x1 = w1 zero-prolonged to 64-bit mark
    add     x0, x1          ; 64-bit addition: x0 = x0 + x1
    ubfx    x0, x0, 1, 32   ; Extract bits 1 by technique of 32 from consequence
                            ; (shift + zero-lengthen in a single instruction)
                            ; Reply in x0

// Alpha AXP: Come by a0 = a, a1 = b, each in canonical murder
    insll   a0, #0, a0      ; a0 = a0 zero-prolonged to 64-bit mark
    insll   a1, #0, a1      ; a1 = a1 zero-prolonged to 64-bit mark
    addq    a0, a1, v0      ; 64-bit addition: v0 = a0 + a1
    srl     v0, #1, v0      ; 64-bit shift:    v0 = v0 >> 1
    addl    zero, v0, v0    ; Force canonical murder
                            ; Reply in v0

// MIPS64: Come by a0 = a, a1 = b, signal-prolonged
    dext    a0, a0, 0, 32   ; Zero-lengthen a0 to 64-bit mark
    dext    a1, a1, 0, 32   ; Zero-lengthen a1 to 64-bit mark
    daddu   v0, a0, a1      ; 64-bit addition: v0 = a0 + a1
    dsrl    v0, v0, #1      ; 64-bit shift:    v0 = v0 >> 1
    sll     v0, #0, v0      ; Signal-lengthen consequence
                            ; Reply in v0

// Vitality64: Come by r3 = a, r4 = b, zero-prolonged
    add     r3, r3, r4      ; 64-bit addition: r3 = r3 + r4
    rldicl  r3, r3, 63, 32  ; Extract bits 63 by technique of 32 from consequence
                            ; (shift + zero-lengthen in a single instruction)
                            ; consequence in r3

// Itanium Ia64: Come by r32 = a, r4 = b, better 32 bits unknown
    extr    r32 = r32, 0, 32    // zero-lengthen r32 to 64-bit mark
    extr    r33 = r33, 0, 32 ;; // zero-lengthen r33 to 64-bit mark
    add.i8  r8 = r32, r33 ;;    // 64-bit addition: r8 = r32 + r33
    shr     r8 = r8, 1          // 64-bit shift:    r8 = r8 >> 1

Show that we must murder definite that the simpler 32 bits of the 64-bit registers are zero, so that any leftover values in bit 32 don’t infect the sum. The directions to zero out the simpler 32 bits would perhaps also fair be elided must you know sooner than time that they are already zero. That is general on x86-64 and AArch64 since those architectures naturally zero-lengthen 32-bit values to 64-bit values, but no longer general on Alpha AXP and MIPS64 on fable of those architectures naturally signal-lengthen 32-bit values to 64-bit values.

I gain it a laugh that the PowerPC, patron saint of ridiculous directions, has an instruction whose name nearly actually publicizes its ridiculousness: rldicl. (It stands for “rotate left doubleword by on the spot and effective left”.)

For 32-bit processors with compiler make stronger for multiword arithmetic, you cease up with something take care of this:

// x86-32
    mov     eax, a          ; eax = a
    xor     ecx, ecx        ; Zero-lengthen to 64 bits
    add     eax, b          ; Salvage low 32 bits in eax, space keep on overflow
    adc     ecx, ecx        ; Salvage excessive 32 bits in ecx
                            ; ecx:eax = 64-bit consequence
    shrd    eax, ecx, 1     ; Multiword shift appropriate
                            ; Reply in eax

// ARM 32-bit: Come by r0 = a, r1 = b
    mov     r2, #0          ; r2 = 0
    provides    r0, r1, r2      ; Salvage low 32 bits in r0, space keep on overflow
    adc     r1, r2, #0      ; Salvage excessive 32 bits in r1
                            ; r1:r0 = 64-bit consequence
    lsrs    r1, r1, #1      ; Shift excessive 32 bits appropriate one scheme
                            ; Bottom bit goes into carry
    rrx     r0, r0          ; Rotate bottom 32 bits appropriate one scheme
                            ; Lift bit goes into top bit
                            ; Reply in r0

// SH-3: Come by r4 = a, r5 = b
    ; (MSVC 13.10.3343 code generation right here is rarely any longer truly that helpful)
    clrt                    ; Certain T flag
    mov     #0, r3          ; r3 = 0, zero-prolonged excessive 32 bits of a
    addc    r5, r4          ; r4 = r4 + r5 + T, overflow goes into T bit
    mov     #0, r2          ; r2 = 0, zero-prolonged excessive 32 bits of b
    addc    r3, r2          ; r2 = r2 + r3 + T, calculate excessive 32 bits
                            ; r3:r2 = 64-bit consequence
    mov     #31, r3         ; Put together for left shift
    shld    r3, r2          ; r2 = r2 << r3
    shlr    r4              ; r4 = r4 >> 1
    mov     r2, r0          ; r0 = r2
    or      r4, r0          ; r0 = r0 | r4
                            ; Reply in r0

// MIPS: Come by a0 = a, a1 = b
    addu    v0, a0, a1      ; v0 = a0 + a1
    sltu    a0, v0, a0      ; a0 = 1 if overflow came about
    sll     a0, 31          ; Switch to bit 31
    srl     v0, v0, #1      ; Shift low 32 bits appropriate one scheme
    or      v0, v0, a0      ; Combine the 2 facets
                            ; Reply in v0

// PowerPC: Come by r3 = a, r4 = b
    ; (gcc 4.8.5 -O3 code generation right here is rarely any longer truly that helpful)
    mr      r9, r3          ; r9 = r3 (low 32 bits of 64-bit a)
    mr      r11, r4         ; r11 = r4 (low 32 bits of 64-bit b)
    li      r8, #0          ; r8 = 0 (excessive 32 bits of 64-bit a)
    li      r10, #0         ; r10 = 0 (excessive 32 bits of 64-bit b)
    addc    r11, r11, r9    ; r11 = r11 + r9, space keep on overflow
    adde    r10, r10, r8    ; r10 = r10 + r8, excessive 32 bits of 64-bit consequence
    rlwinm  r3, r10, 31, 1, 31 ; r3 = r10 >> 1
    rlwinm  r9, r11, 31, 0, 0 ; r9 = r1 << 31
    or      r3, r3, r9      ; Combine the two parts
                            ; Answer in r3

// RISC-V: Assume a0 = a, a1 = b
    add     a1, a0, a1      ; a1 = a0 + a1
    sltu    a0, a1, a0      ; a0 = 1 if overflow occurred
    slli    a0, a0, 31      ; Shift to bit 31
    slri    a1, a1, 1       ; a1 = a1 >> 1
    or      a0, a0, a1      ; Combine the 2 facets
                            ; Reply in a0

Or must you contain access to SIMD registers that are elevated than the native register measurement, you may perchance well perchance perhaps also enact the math there. (Despite the indisputable fact that crossing the boundary from typical-reason register to SIMD register and encourage would perhaps also fair cease up too expensive.)

// x86-32
unsigned practical(unsigned a, unsigned b)
{
    auto a128 = _mm_cvtsi32_si128(a);
    auto b128 = _mm_cvtsi32_si128(b);
    auto sum = _mm_add_epi64(a128, b128);
    auto avg = _mm_srli_epi64(sum, 1);
    return _mm_cvtsi128_si32(avg);
}

    movd    xmm0, a         ; Load a into bottom 32 bits of 128-bit register
    movd    xmm1, b         ; Load b into bottom 32 bits of 128-bit register
    paddq   xmm1, xmm0      ; Add as 64-bit integers
    psrlq   xmm1, 1         ; Shift 64-bit integer appropriate one scheme
    movd    eax, xmm1       ; Extract bottom 32 bits of consequence

// 32-bit ARM (A32) has an "practical" instruction constructed inunsigned practical(unsigned a, unsigned b)
{
    auto a64 = vdup_n_u32(a);
    auto b64 = vdup_n_u32(b);
    auto avg = vhadd_u32(a64, b64); // hadd = half of of add (practical)
    return vget_lane_u32(avg);
}

    vdup.32 d16, r0         ; Broadcast r0 into each halves of d16
    vdup.32 d17, r1         ; Broadcast r1 into each halves of d17
    vhadd.u32 d16, d16, d17 ; d16 = practical of d16 and d17
    vmov.32 r0, d16[0]      ; Extract consequence

However you may perchance well perchance perhaps also restful enact better, if easiest you had access to better intrinsics.

In processors that make stronger add-with-carry, you may perchance well perchance perhaps also gape the sum of register-sized integers as a (N + 1)-bit consequence, the achieve the bonus bit N is the carry bit. If the processor also supports rotate-appropriate-by technique of-carry, you may perchance well perchance perhaps also shift (N + 1)-bit consequence appropriate one scheme, getting better the appropriate practical without dropping the bit that overflows.

// x86-32
    mov     eax, a
    add     eax, b          ; Add, overflow goes into carry bit
    rcr     eax, 1          ; Rotate appropriate one scheme by technique of carry

// x86-64
    mov     rax, a
    add     rax, b          ; Add, overflow goes into carry bit
    rcr     rax, 1          ; Rotate appropriate one scheme by technique of carry

// 32-bit ARM (A32)
    mov     r0, a
    provides    r0, b           ; Add, overflow goes into carry bit
    rrx     r0              ; Rotate appropriate one scheme by technique of carry

// SH-3
    clrt                    ; Certain T flag
    mov     a, r0
    addc    b, r0           ; r0 = r0 + b + T, overflow goes into T bit
    rotcr   r0              ; Rotate appropriate one scheme by technique of carry

Whereas there is an intrinsic for the operation of “add two values and document the consequence besides as carry”, we don’t contain one for “rotate appropriate by technique of carry”, so we can get easiest midway there:

unsigned practical(unsigned a, unsigned b)
{
#if outlined(_MSC_VER)
    unsigned sum;
    auto carry = _addcarry_u32(0, a, b, &sum);
    return _rotr1_carry(sum, carry); // lacking intrinsic!
#elif outlined(__clang__)
    unsigned carry;
    auto sum = _builtin_adc(a, b, 0, &carry);
    return _builtin_rotateright1throughcarry(sum, carry); // lacking intrinsic!
#else
#error Unsupported compiler.
#endif
}

We’ll contain to fraudulent it, alas. Right here’s one scheme:

unsigned practical(unsigned a, unsigned b)
{
#if outlined(_MSC_VER)
    unsigned sum;
    auto carry = _addcarry_u32(0, a, b, &sum);
    return (sum

NOW WITH OVER +8500 USERS. folk can Be a half of Knowasiak without cost. Sign in on Knowasiak.com
Read More

Vanic
WRITTEN BY

Vanic

“Simplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te Ching