Strange issue with floating point accuracy on ARM64

Strange issue with floating point accuracy on ARM64

I'm running into a really strange issue with floating point accuracy on ARM64. I have a very simple piece of C++ code that looks something like this:

float sx = some_float_number_1; float sy = some_float_number_2; float ex = some_float_number_3; float ey = some_float_number_4; float px = ex; float py = ey; float d1 = (ex - sx) * (py - sy); float d2 = (px - sx) * (ey - sy); float d = d1 - d2; float t = (ex - sx) * (py - sy) - (px - sx) * (ey - sy); //32-bit output: d == t == 0 //64-bit output: d == 0, t != 0

In theory, d is supposed to be equal to t and equal to 0, and that's exactly what happened on 32-bit ARM. But for some odd reason, the output of t is not equal to 0 on 64-bit ARM while d is still correct. I have never seen any bug like this, so I have no idea what could've caused this kind of problem.

EDIT: Some more info

EDIT2: Here's the disassembly

4c: 52933348 mov w8, #0x999a // #39322 50: 72a82828 movk w8, #0x4141, lsl #16 54: b90683e8 str w8, [sp,#1664] 58: 52933348 mov w8, #0x999a // #39322 5c: 72a82728 movk w8, #0x4139, lsl #16 60: b9067fe8 str w8, [sp,#1660] 64: 52933348 mov w8, #0x999a // #39322 68: 72a838a8 movk w8, #0x41c5, lsl #16 6c: b9067be8 str w8, [sp,#1656] 70: 529999a8 mov w8, #0xcccd // #52429 74: 72a855e8 movk w8, #0x42af, lsl #16 78: b90677e8 str w8, [sp,#1652] 7c: bd467be0 ldr s0, [sp,#1656] 80: bd0673e0 str s0, [sp,#1648] 84: bd4677e0 ldr s0, [sp,#1652] 88: bd066fe0 str s0, [sp,#1644] 8c: bd467be0 ldr s0, [sp,#1656] 90: bd4683e1 ldr s1, [sp,#1664] 94: 1e213800 fsub s0, s0, s1 98: bd466fe1 ldr s1, [sp,#1644] 9c: bd467fe2 ldr s2, [sp,#1660] a0: 1e223821 fsub s1, s1, s2 a4: 1e210800 fmul s0, s0, s1 a8: bd066be0 str s0, [sp,#1640] ac: bd4673e0 ldr s0, [sp,#1648] b0: bd4683e1 ldr s1, [sp,#1664] b4: 1e213800 fsub s0, s0, s1 b8: bd4677e1 ldr s1, [sp,#1652] bc: bd467fe2 ldr s2, [sp,#1660] c0: 1e223821 fsub s1, s1, s2 c4: 1e210800 fmul s0, s0, s1 c8: bd0667e0 str s0, [sp,#1636] cc: bd466be0 ldr s0, [sp,#1640] d0: bd4667e1 ldr s1, [sp,#1636] d4: 1e213800 fsub s0, s0, s1 d8: bd0663e0 str s0, [sp,#1632] dc: bd467be0 ldr s0, [sp,#1656] e0: bd4683e1 ldr s1, [sp,#1664] e4: 1e213800 fsub s0, s0, s1 e8: bd466fe2 ldr s2, [sp,#1644] ec: bd467fe3 ldr s3, [sp,#1660] f0: 1e233842 fsub s2, s2, s3 f4: bd4673e4 ldr s4, [sp,#1648] f8: 1e243821 fsub s1, s1, s4 fc: bd4677e4 ldr s4, [sp,#1652] 100: 1e233883 fsub s3, s4, s3 104: 1e230821 fmul s1, s1, s3 108: 1f020400 fmadd s0, s0, s2, s1 10c: bd065fe0 str s0, [sp,#1628]

Can you give an example? I mean concrete values for which d and t "completely wrong" (all 4 inputs and values of d and t).
– geza
Jul 1 at 15:11

d

t

d

t

It can be any floating point value, as long as the fractional part of the input is not equal to 0. On 64-bit ARM, the value of t is always wrong, but d is always correct.
– bigbadbug
Jul 2 at 7:18

So, by "completely wrong", you mean it is not zero, but a small number? If that's the case, then Eric's answer is correct, and your description is very misleading, as t is not completely wrong, it is just the usual "floating point is imprecise" thing.
– geza
Jul 2 at 8:54

t

Can you share the disassembly of the 64-bit version? I'm curious what causes the issue exactly.
– geza
Jul 2 at 8:58

I suppose "completely wrong" is a bit of an overstatement. But given that (ex - sx) * (py - sy) is supposed to be equal to (px - sx) * (ey - sy), I find it very strange that the output of t is not equal to 0 on 64-bit ARM. I'm getting the disassembly right now.
– bigbadbug
Jul 2 at 9:15

(ex - sx) * (py - sy)

(px - sx) * (ey - sy)

t

2 Answers
2

The C++ standard allows an implementation to evaluate floating-point expressions with more precision than the type nominally has. It requires implementations to discard the excess precision when a value is assigned to an object.

Thus, in assigning to d1 and d2, the excess precision is discarded and does not contribute to d1 - d2, but, in (ex - sx) * (py - sy) - (px - sx) * (ey - sy), the excess precision participates in the evaluation. Note that C++ not only allows excess precision in the evaluation but allows it to be used for parts of the expression and not others.

d1

d2

d1 - d2

(ex - sx) * (py - sy) - (px - sx) * (ey - sy)

In particular, a common way to evaluate an expression like a*b - c*d is to compute -c*d with a multiply instruction (that does not use excess precision) and then compute a*b - c*d with a fused multiply-add instruction that uses effectively infinite precision for the multiplication.

a*b - c*d

-c*d

a*b - c*d

Your compiler may have a switch to disable this behavior and always use the nominal precision.

But (ex - sx) * (py - sy) is supposed to be equal to (px - sx) * (ey - sy) since px and py are assigned the same value as ex and ey respectively. Why would the output of (ex - sx) * (py - sy) not be the same as (px - sx) * (ey - sy) when place in one single expression? Btw, I'm using Clang and it does not seem to have an option to disable excess floating point precision.
– bigbadbug
Jul 2 at 8:51

(ex - sx) * (py - sy)

(px - sx) * (ey - sy)

px

py

ex

ey

(ex - sx) * (py - sy)

(px - sx) * (ey - sy)

@bigbadbug: C++ not only allows an implementation to evaluate floating-point expressions with more precision but allows it to use it at some places in the expression and not others. Thus, in (ex - sx) * (py - sy) - (px - sx) * (ey - sy), one of the products may be computed with a simple multiply instruction (which does not use extra precision) while the remaining product is then computed and added/subtracted to the first product with a fused multiply-add or fused multiply-subtract instruction, which uses extra precision.
– Eric Postpischil
Jul 2 at 11:00

(ex - sx) * (py - sy) - (px - sx) * (ey - sy)

@EricPostpischil you are right, I should say that a + b == a + b may be false (ref) so certainly it should not be expected that the two different multiplications in bigbadbug's comment should be equal
– M.M
Jul 2 at 11:27

a + b == a + b

@M.M: because of your (now deleted) previous comment, I've asked this question: stackoverflow.com/questions/51134021/floating-point-equality. It seems that C++ standard doesn't guarantee that if (a==a) is true. It is only true, if the implementation follows IEEE-754 (which is mostly true nowadays, I suppose)
– geza
Jul 2 at 12:04

if (a==a)

Analyzing the disassembly you gave, I think I found the reason: fused-multiply-add (it's the fmadd instruction). I haven't analyzed the disassembly fully, but I think that t is evaluated like this:

fmadd

t

t = fma((ex - sx) * (py - sy), sx - px, ey - sy);

where the meaning of fma is:

fma

fma(a, b, c) = a + b*c;

So, the calculation differs a little bit, because fma rounds only once while evaluating a + b*c (without fma, there is a rounding at x=b*c, thena+x)

fma

a + b*c

fma

x=b*c

a+x

You have some compiler switch which turns on this feature, like -ffast-math or -ffp-contract=on.

-ffast-math

-ffp-contract=on

Turn off FMA, and I think your problem will be solved.

(Now I see that Eric answered the same thing, but I leave my answer here, maybe it helps a little bit to understand the issue)

Thank you. Disabling -ffast-math solved my problem.
– bigbadbug
Jul 3 at 9:38

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

3e,ILzWU1AKvMf,Z4ajnTH t6frjEunW1TMgkI0X1Jrc,p

搜尋此網誌

Gtjkyu