|
this should increase performance and reduce code size on aarch64.
the compiled code was checked against using __builtin_* instead
of inline asm with gcc-6.2.0.
lrint is two instructions.
c with inline asm is used because it is safer than a pure asm
implementation, this prevents ll{rint,round} to be an alias
of l{rint,round} (because the types don't match) and depends
on gcc style inline asm support.
ceil, floor, round, trunc can either raise inexact on finite
non-integer inputs or not raise any exceptions. the new
implementation does not raise exceptions while the generic
c code does.
on aarch64, the underflow exception is signaled before rounding
(ieee 754 allows both before and after rounding, but it must be
consistent), the generic fma c code signals it after rounding
so using single instruction fixes a slight conformance issue too.
|