summaryrefslogtreecommitdiff
path: root/src/internal/libm.h
AgeCommit message (Collapse)AuthorLines
2019-04-20make new math code compatible with unused variable warning/errorRich Felker-3/+6
commit b50d315fd23f0fbc4c11e2583801dd123d933745 introduced fp_force_eval implemented by default with a dead store to a volatile variable. unfortunately introduces warnings with -Wunused-variable and breaks the ability to use -Werror with the default warning options set by configure when warnings are enabled. we could just call fp_barrier instead, but that results in a spurious load after the store due to volatile semantics. the fix committed here avoids the load. it will still produce warnings without -Wno-unused-but-set-variable, but that's part of our default warning profile, and there are already other locations in the source where an unused variable warning will occur without it.
2019-04-17math: new powSzabolcs Nagy-0/+1
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc The underflow exception is signaled if the result is in the subnormal range even if the result is exact. code size change: +3421 bytes. benchmark on x86_64 before, after, speedup: -Os: pow rthruput: 102.96 ns/call 33.38 ns/call 3.08x pow latency: 144.37 ns/call 54.75 ns/call 2.64x -O3: pow rthruput: 98.91 ns/call 32.79 ns/call 3.02x pow latency: 138.74 ns/call 53.78 ns/call 2.58x
2019-04-17math: new powfSzabolcs Nagy-0/+6
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc POWF_SCALE != 1.0 case only matters if TOINT_INTRINSICS is set, which is currently not supported for any target. SNaN is not supported, it would require an issignalingf implementation. code size change: -816 bytes. benchmark on x86_64 before, after, speedup: -Os: powf rthruput: 95.14 ns/call 20.04 ns/call 4.75x powf latency: 137.00 ns/call 34.98 ns/call 3.92x -O3: powf rthruput: 92.48 ns/call 13.67 ns/call 6.77x powf latency: 131.11 ns/call 35.15 ns/call 3.73x
2019-04-17math: new exp2f and expfSzabolcs Nagy-0/+16
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc In expf TOINT_INTRINSICS is kept, but is unused, it would require support for __builtin_round and __builtin_lround as single instruction. code size change: +94 bytes. benchmark on x86_64 before, after, speedup: -Os: expf rthruput: 9.19 ns/call 8.11 ns/call 1.13x expf latency: 34.19 ns/call 18.77 ns/call 1.82x exp2f rthruput: 5.59 ns/call 6.52 ns/call 0.86x exp2f latency: 17.93 ns/call 16.70 ns/call 1.07x -O3: expf rthruput: 9.12 ns/call 4.92 ns/call 1.85x expf latency: 34.44 ns/call 18.99 ns/call 1.81x exp2f rthruput: 5.58 ns/call 4.49 ns/call 1.24x exp2f latency: 17.95 ns/call 16.94 ns/call 1.06x
2019-04-17math: add configuration macrosSzabolcs Nagy-0/+5
Musl currently aims to support non-nearest rounding mode and does not support SNaNs. These macros allow marking relevant code paths in case these decisions are changed later (they also help documenting the corner cases involved).
2019-04-17math: add macros for static branch prediction hintsSzabolcs Nagy-0/+9
These don't have an effectw with -Os so not useful with default settings other than documenting the expectation. With --enable-optimize=internal,malloc,string,math the libc.so code size increases by 18K on x86_64 and performance varies in -2% .. +10%.
2019-04-17math: add double precision error handling functionsSzabolcs Nagy-0/+5
2019-04-17math: add single precision error handling functionsSzabolcs Nagy-0/+7
These are supposed to be used in tail call positions when handling special cases in new code. (fp exceptions may be raised "naturally" by the common code path if special casing is more effort.) This implements the error handling apis used in https://github.com/ARM-software/optimized-routines without errno setting.
2019-04-17math: add eval_as_float and eval_as_doubleSzabolcs Nagy-0/+17
Previously type casts or assignments were used for handling excess precision, which assumed standard C99 semantics, but since it's a rarely needed obscure detail, it's better to use explicit helper functions to document where we rely on this. It also helps if the code is used outside of the libc in non-C99 compilation mode: with the default excess precision handling of gcc, explicit inline asm barriers are needed for narrowing on FLT_EVAL_METHOD!=0 targets. I plan to use this in new code with the existing style that uses double_t and float_t as much as possible. One ugliness is that it is required for almost every return statement since that does not drop excess precision (the standard changed this in C11 annex F, but that does not help in non-standard compilation modes or with old compilers).
2019-04-17math: add fp_arch.h with fp_barrier and fp_force_evalSzabolcs Nagy-6/+65
C99 has ways to support fenv access, but compilers don't implement it and assume nearest rounding mode and no fp status flag access. (gcc has -frounding-math and then it does not assume nearest rounding mode, but it still assumes the compiled code itself does not change the mode. Even if the C99 mechanism was implemented it is not ideal: it requires all code in the library to be compiled with FENV_ACCESS "on" to make it usable in non-nearest rounding mode, but that limits optimizations more than necessary.) The math functions should give reasonable results in all rounding modes (but the quality may be degraded in non-nearest rounding modes) and the fp status flag settings should follow the spec, so fenv side-effects are important and code transformations that break them should be prevented. Unfortunately compilers don't give any help with this, the best we can do is to add fp barriers to the code using volatile local variables (they create a stack frame and undesirable memory accesses to it) or inline asm (gcc specific, requires target specific fp reg constraints, often creates unnecessary reg moves and multiple barriers are needed to express that an operation has side-effects) or extern call (only useful in tail-call position to avoid stack-frame creation and does not work with lto). We assume that in a math function if an operation depends on the input and the output depends on it, then the operation will be evaluated at runtime when the function is called, producing all the expected fenv side-effects (this is not true in case of lto and in case the operation is evaluated with excess precision that is not rounded away). So fp barriers are needed (1) to prevent the move of an operation within a function (in case it may be moved from an unevaluated code path into an evaluated one or if it may be moved across a fenv access), (2) force the evaluation of an operation for its side-effect when it has no input dependency (may be constant folded) or (3) when its output is unused. I belive that fp_barrier and fp_force_eval can take care of these and they should not be needed in hot code paths.
2019-04-17math: remove sun copyright from libm.hSzabolcs Nagy-23/+0
Nothing is left from the original fdlibm header nor from the bsd modifications to it other than some internal api declarations. Comments are dropped that may be copyrightable content.
2019-04-17math: add asuint, asuint64, asfloat and asdoubleSzabolcs Nagy-33/+15
Code generation for SET_HIGH_WORD slightly changes, but it only affects pow, otherwise the generated code is unchanged.
2019-04-17math: move complex math out of libm.hSzabolcs Nagy-15/+0
This makes it easier to build musl math code with a compiler that does not support complex types (tcc) and in general more sensible factorization of the internal headers.
2018-09-12apply hidden visibility to internal math functionsRich Felker-24/+24
this makes significant differences to codegen on archs with an expensive PLT-calling ABI; on i386 and gcc 7.3 for example, the sin and sinf functions no longer touch call-saved registers or the stack except for pushing outgoing arguments. performance is likely improved too, but no measurements were taken.
2018-09-12move lgamma-related internal declarations to libm.hRich Felker-0/+4
2018-06-14add support for m68k 80-bit long double variantRich Felker-0/+11
since x86 and m68k are the only archs with 80-bit long double and each has mandatory endianness, select the variant via endianness. differences are minor: apparently just byte order and representation of infinities. the m68k format is not well-documented anywhere I could find, so if other differences are found they may require additional changes later.
2015-03-11math: add dummy implementations of 128 bit long double functionsSzabolcs Nagy-0/+14
This is in preparation for the aarch64 port only to have the long double math symbols available on ld128 platforms. The implementations should be fixed up later once we have proper tests for these functions. Added bigendian handling for ld128 bit manipulations too.
2014-12-17provide CMPLX macros in implementation-internal libm.hRich Felker-0/+12
this avoids assuming the presence of C11 macro definitions in the public complex.h, which need changes potentially incompatible with the way these macros are being used internally.
2013-09-06math: remove STRICT_ASSIGN macroSzabolcs Nagy-11/+0
gcc did not always drop excess precision according to c99 at assignments before version 4.5 even if -std=c99 was requested which caused badly broken mathematical functions on i386 when FLT_EVAL_METHOD!=0 but STRICT_ASSIGN was not used consistently and it is worked around for old compilers with -ffloat-store so it is no longer needed the new convention is to get the compiler respect c99 semantics and when excess precision is not harmful use float_t or double_t or to specialize code using FLT_EVAL_METHOD
2013-09-05math: remove libc.h include from libm.hSzabolcs Nagy-2/+0
libc.h is only for weak_alias so include it directly where it is used
2013-09-05math: cosmetic cleanup (use explicit union instead of fshape and dshape)Szabolcs Nagy-66/+56
2013-09-05math: remove *_WORD64 macros from libm.hSzabolcs Nagy-16/+0
only fma used these macros and the explicit union is clearer
2013-09-05math: remove old longdbl.hSzabolcs Nagy-2/+0
2013-09-05long double cleanup, initial commitSzabolcs Nagy-0/+28
new ldshape union, ld128 support is kept, code that used the old ldshape union was rewritten (IEEEl2bits union of freebsd libm is not touched yet) ld80 __fpclassifyl no longer tries to handle invalid representation
2012-12-11make CMPLX macros available in complex.h in non-c11 mode as wellSzabolcs Nagy-8/+0
2012-11-13math: turn off the STRICT_ASSIGN workaround by defaultSzabolcs Nagy-5/+3
the volatile hack in STRICT_ASSIGN is only needed if assignment is not respected and excess precision is kept. gcc -fexcess-precision=standard and -ffloat-store both respect assignment and musl use these flags by default. i kept the macro for now so the workaround may be used for bad compilers in the future.
2012-11-13complex: add C11 CMPLX macros and replace cpack with themSzabolcs Nagy-18/+5
2012-05-06add FORCE_EVAL macro to evaluate float expr for their side effectnsz-0/+13
updated nextafter* to use FORCE_EVAL, it can be used in many other places in the math code to improve readability.
2012-03-22add creal/cimag macros in complex.h (and use them in the functions defs)Rich Felker-8/+0
2012-03-19don't inline __rem_pio2l so the code size is smallernsz-0/+1
2012-03-18fix loads of missing const in new libm, and some global vars (?!) in powlRich Felker-2/+2
2012-03-16fix namespace issues for lgamma, etc.Rich Felker-0/+2
standard functions cannot depend on nonstandard symbols
2012-03-13first commit of the new libm!Rich Felker-0/+186
thanks to the hard work of Szabolcs Nagy (nsz), identifying the best (from correctness and license standpoint) implementations from freebsd and openbsd and cleaning them up! musl should now fully support c99 float and long double math functions, and has near-complete complex math support. tgmath should also work (fully on gcc-compatible compilers, and mostly on any c99 compiler). based largely on commit 0376d44a890fea261506f1fc63833e7a686dca19 from nsz's libm git repo, with some additions (dummy versions of a few missing long double complex functions, etc.) by me. various cleanups still need to be made, including re-adding (if they're correct) some asm functions that were dropped.