summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorLines
2019-07-01add new syscall numbers from linux v5.1Szabolcs Nagy-0/+327
syscall numbers are now synced up across targets (starting from 403 the numbers are the same on all targets other than an arch specific offset) IPC syscalls sem*, shm*, msg* got added where they were missing (except for semop: only semtimedop got added), the new semctl, shmctl, msgctl imply IPC_64, see linux commit 0d6040d4681735dfc47565de288525de405a5c99 arch: add split IPC system calls where needed new 64bit time_t syscall variants got added on 32bit targets, see linux commit 48166e6ea47d23984f0b481ca199250e1ce0730a y2038: add 64-bit time_t syscalls to all 32-bit architectures new async io syscalls got added, see linux commit 2b188cc1bb857a9d4701ae59aa7768b5124e262e Add io_uring IO interface linux commit edafccee56ff31678a091ddb7219aba9b28bc3cb io_uring: add support for pre-mapped user IO buffers a new syscall got added that uses the fd of /proc/<pid> as a stable handle for processes: allows sending signals without pid reuse issues, intended to eventually replace rt_sigqueueinfo, kill, tgkill and rt_tgsigqueueinfo, see linux commit 3eb39f47934f9d5a3027fe00d906a45fe3a15fad signal: add pidfd_send_signal() syscall on some targets (arm, m68k, s390x, sh) some previously missing syscall numbers got added as well.
2019-07-01ipc: prefer SYS_ipc when it is definedSzabolcs Nagy-12/+12
Linux v5.1 introduced ipc syscalls on targets where previously only SYS_ipc was available, change the logic such that the ipc code keeps using SYS_ipc which works backward compatibly on older kernels. This changes behaviour on microblaze which had both mechanisms, now SYS_ipc will be used instead of separate syscalls.
2019-07-01mips64: fix syscall numbers of io_pgetevents and rseqSzabolcs Nagy-2/+2
the numbers added in commit d149e69c02eb558114f20ea718810e95538a3b2f add io_pgetevents and rseq syscall numbers from linux v4.18 were incorrect.
2019-07-01elf.h: add NT_ARM_PAC{A,G}_KEYS from linux v5.1Szabolcs Nagy-0/+2
to request or change pointer auth keys for criu via ptrace, new in linux commit d0a060be573bfbf8753a15dca35497db5e968bb0 arm64: add ptrace regsets for ptrauth key management
2019-07-01netinet/in.h: add INADDR_ALLSNOOPERS_GROUP from linux v5.1Szabolcs Nagy-0/+1
RFC 4286: "The IPv4 multicast address for All-Snoopers is 224.0.0.106." from linux commit 4effd28c1245303dce7fd290c501ac2c11052114 bridge: join all-snoopers multicast address
2019-07-01sys/socket.h: add SO_BINDTOIFINDEX from linux v5.1Szabolcs Nagy-0/+1
SO_BINDTOIFINDEX behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. see linux commit f5dd3d0c9638a9d9a02b5964c4ad636f06cf7e2c net: introduce SO_BINDTOIFINDEX sockopt
2019-07-01s390x: drop SO_ definitions from bits/socket.hSzabolcs Nagy-28/+0
the s390x definitions matched the generic ones in sys/socket.h.
2019-07-01netinet/in.h: add IPV6_ROUTER_ALERT_ISOLATE from linux v5.1Szabolcs Nagy-0/+1
restricts router alert packets received by the socket to the socket's namespace only. see linux commit 9036b2fe092a107856edd1a3bad48b83f2b45000 net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATE
2019-07-01sys/prctl.h: add PR_SPEC_DISABLE_NOEXEC from linux v5.1Szabolcs Nagy-0/+1
allows specifying that the speculative store bypass disable bit should be cleared on exec. see linux commit 71368af9027f18fe5d1c6f372cfdff7e4bde8b48 x86/speculation: Add PR_SPEC_DISABLE_NOEXEC
2019-07-01fcntl.h: add F_SEAL_FUTURE_WRITE from linux v5.1Szabolcs Nagy-0/+1
needed for android so it can migrate from its ashmem to memfd. allows making the memfd readonly for future users while keeping a writable mmap of it. see linux commit ab3948f58ff841e51feb845720624665ef5b7ef3 mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd
2019-07-01sys/fanotify.h: update for linux v5.1Szabolcs Nagy-1/+33
includes changes from linux v5.1 linux commit 235328d1fa4251c6dcb32351219bb553a58838d2 fanotify: add support for create/attrib/move/delete events linux commit 5e469c830fdb5a1ebaa69b375b87f583326fd296 fanotify: copy event fid info to user linux commit e9e0c8903009477b630e37a8b6364b26a00720da fanotify: encode file identifier for FAN_REPORT_FID as well as earlier changes that were missed. sys/statfs.h is included for fsid_t.
2019-07-01fix deadlock in synccall after threaded forkSamuel Holland-0/+1
synccall may be called by AS-safe functions such as setuid/setgid after fork. although fork() resets libc.threads_minus_one, causing synccall to take the single-threaded path, synccall still takes the thread list lock. This lock may be held by another thread if for example fork() races with pthread_create(). After fork(), the value of the lock is meaningless, so clear it. maintainer's note: commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af and e4235d70672d9751d7718ddc2b52d0b426430768 introduced this regression. the state protected by this lock is the linked list, which is entirely replaced in the child path of fork (next=prev=self), so resetting it is semantically sound.
2019-06-28cap getdents length argument to INT_MAXRich Felker-0/+2
the linux syscall treats this argument as having type int, so passing extremely long buffer sizes would be misinterpreted by the kernel. since "short reads" are always acceptable, just cap it down. patch based on report and suggested change by Florian Weimer.
2019-06-25remove unnecessary and problematic _Noreturn from crt/ldso startupRich Felker-5/+5
after commit a48ccc159a5fa061a18419296100ee48a1cd6cc9 removed the use of _Noreturn on the stage3_func type (which only worked due to it being defined to the "GNU C" attribute in C99 mode), GCC could no longer assume that the ends of __dls2 and __dls2b are unreachable, and produced a warning that a function marked _Noreturn returns. also, since commit 4390383b32250a941ec616e8bff6f568a801b1c0, the _Noreturn declaration for __libc_start_main in crt1/rcrt1 has been not only inconsistent with the definition, but wrong. formally, __libc_start_main does return, via a (hopefully) tail call to a helper function after the barrier. incorrect usage of _Noreturn in the declaration was probably formal UB. the _Noreturn specifiers were not useful in any of these places, so remove them all. now, the only remaining usage of _Noreturn is in public interfaces where _Noreturn is part of their contract.
2019-06-25allow fmemopen with zero sizeRich Felker-1/+1
previously, POSIX erroneously required this to fail with EINVAL despite the traditional glibc implementation, on which the POSIX interface was based, allowing it. the resolution of Austin Group issue 818 removes the requirement to fail.
2019-06-21do not use _Noreturn for a function pointer in dynamic linkerMatthew Maurer-1/+1
_Noreturn is a C11 construct, and may only be used at the site of a function definition.
2019-06-21remove implicit include of sys/sysmacros.h from sys/types.hRich Felker-1/+0
this reverts commit f552c792c7ce5a560f214e1104d93ee5b0833967, which exposed the sysmacros.h macros (device major/minor calculations) for BSD and GNU profiles to mimic an unintentional glibc behavior some code depended on. glibc has deprecated and since removed them as the resolution to bug #19239, so it makes no sense for us to keep this behavior. affected code should all have been fixed by now, and if it's not yet fixed it needs to be for use with modern glibc anyway.
2019-06-14add riscv64 architecture supportRich Felker-0/+1362
Author: Alex Suykov <alex.suykov@gmail.com> Author: Aric Belsito <lluixhi@gmail.com> Author: Drew DeVault <sir@cmpwn.com> Author: Michael Clark <mjc@sifive.com> Author: Michael Forney <mforney@mforney.org> Author: Stefan O'Rear <sorear2@gmail.com> This port has involved the work of many people over several years. I have tried to ensure that everyone with substantial contributions has been credited above; if any omissions are found they will be noted later in an update to the authors/contributors list in the COPYRIGHT file. The version committed here comes from the riscv/riscv-musl repo's commit 3fe7e2c75df78eef42dcdc352a55757729f451e2, with minor changes by me for issues found during final review: - a_ll/a_sc atomics are removed (according to the ISA spec, lr/sc are not safe to use in separate inline asm fragments) - a_cas[_p] is fixed to be a memory barrier - the call from the _start assembly into the C part of crt1/ldso is changed to allow for the possibility that the linker does not place them nearby each other. - DTP_OFFSET is defined correctly so that local-dynamic TLS works - reloc.h LDSO_ARCH logic is simplified and made explicit. - unused, non-functional crti/n asm files are removed. - an empty .sdata section is added to crt1 so that the __global_pointer reference is resolvable. - indentation style errors in some asm files are fixed.
2019-05-26optimize aarch64 dynamic tlsdesc function to spill fewer registersRich Felker-10/+7
with the glibc generation counter model for reusing dynamic tls slots after dlclose, it's really not possible to get away with fewer than 4 working registers. for us however it's always been possible, but tricky, and only became apparent after the switch to installing new dynamic tls at dlopen time. by merging the negated thread pointer into the addend early, the register holding the thread pointer can immediately be reused, bringing the working register count down to three. this allows saving/restoring via a single stp/ldp pair, since the return register x0 does not need to be saved. net reduction of 3 instructions, 2 of which were push/pop.
2019-05-22make powerpc64 vrregset_t logical layout match expected APIRich Felker-1/+4
between v2 and v3 of the powerpc64 port patch, the change was made from a 32x4 array of 32-bit unsigned ints for vrregs[] to a 32-element array of __int128. this mismatches the API applications working with mcontext_t expect from glibc, and seems to have been motivated by a misinterpretation of a comment on how aarch64 did things as a suggestion to do the same on powerpc64.
2019-05-22fix vrregset_t layout and member naming on powerpc64Rich Felker-4/+8
the mistaken layout seems to have been adapted from 32-bit powerpc, where vscr and vrsave are packed into the same 128-bit slot in a way that looks like it relies on non-overlapping-ness of the value bits in big endian. the powerpc64 port accounted for the fact that the 64-bit ABI puts each in its own 128-bit slot, but ordered them incorrectly (matching the bit order used on the 32-bit ABI), and failed to account for vscr being padded according to endianness so that it can be accessed via vector moves. in addition to ABI layout, our definition used different logical member layout/naming from glibc, where vscr is a structure to facilitate access as a 32-bit word or a 128-bit vector. the inconsistency here was unintentional, so fix it.
2019-05-16fix tls offsets when p_vaddr%p_align != 0 on TLS_ABOVE_TP targetsSzabolcs Nagy-4/+6
currently the bfd linker does not seem to create tls segments where p_vaddr%p_align != 0, but this is valid in ELF and then the runtime computed tls offset must satisfy offset%p_align == (base+p_vaddr)%p_align and in case of local exec tls (main executable) the smallest such offset must be used (otherwise it is incompatible with the offset computed by the static linker). the !TLS_ABOVE_TP case is handled correctly (the offset is negative then in the formula). the ldso code for TLS_ABOVE_TP is changed so the static tls offset of each module satisfies the formula.
2019-05-16fix static tls offsets of shared libs on TLS_ABOVE_TP targetsSzabolcs Nagy-4/+2
tls_offset should always point to the end of the allocated static tls area, but this was not handled correctly on "tls variant 1" targets in the dynamic linker: after application tls was allocated, tls_offset was aligned up, potentially wasting tls space. (alignment may be needed at the begining of the tls area, not at the end, but that will be fixed separately as it is unlikely to affect real binaries.) when static tls was allocated for a shared library, tls_offset was only updated with the size of the tls segment which does not include alignment gaps, which can easily happen if the tls size update for one library leaves tls_offset misaligned for the next one. this can cause oob access in __copy_tls or arbitrary breakage at tls access. (the issue was observed on aarch64 with rust binaries)
2019-05-16fix format strings for uid/gid values in putpwent/putgrentRich Felker-2/+2
commit 648c3b4e18b2ce2b6af7d44783e42ca267ea49f5 omitted this change, which is needed to be able to use uid/gid values greater than INT_MAX with these interfaces. it fixes alpine linux bug #10460.
2019-05-12remove unused struct dso members from dynlink.cFangrui Song-1/+0
maintainer's note: commit 9d44b6460ab603487dab4d916342d9ba4467e6b9 removed their use.
2019-05-11improve i386 inline syscall asm on non-broken compilersRich Felker-1/+34
we have to avoid using ebx unconditionally in asm constraints for i386, because gcc 3 and 4 and possibly other simplistic compilers (pcc?) implement PIC via making ebx a fixed-use register, and disallow its use for anything else. rather than hard-coding knowledge of which compilers work (at least gcc 5+ and clang), perform a configure test; this should give us the good codegen on any new compilers we don't yet know about. swapping ebx and edx is kept for 1- and 2-arg syscalls because it avoids having any spills/stack-frame at all in small functions. for 6-arg, if ebx is directly usable, the complex shuffling introduced in commit c8798ef974d21c338a7d8d874a402978ffc6168e can be avoided, and ebp can be loaded the same way ebx is in 5-arg syscalls for compilers that don't support direct use of ebx.
2019-05-10fix regression in i386 inline syscall asm producing invalid codeRich Felker-6/+6
commit 22e5bbd0deadcbd767864bd714e890b70e1fe1df inlined the i386 syscall mechanism, but wrongly assumed memory operands to the 5- and 6-argument syscall asm would be esp-based. however, nothing in the constraints prevented them from being ebx- or ebp-based, and in those cases, ebx and ebp could be clobbered before use of the memory operand was complete. in the 6-argument case, this prevented restoration of the original register values before the end of the asm block, breaking the asm contract since ebx and ebp are not marked as clobbered. (they can't be, because lots of compilers don't accept these registers in constraints or clobbers if PIC or frame pointer is enabled). doing this right is complicated by the fact that, after a single push, no operands which might be memory operands are usable. if they are esp-based, the value of esp has changed, rendering them invalid. introduce some new dances to load the registers. for the 5-arg case, push the operand that may be a memory operand first, and after that, it doesn't matter if the operand is invalid, since we'll just use the newly pushed value. for the 6-arg case, we need to put both operands in memory to begin with, like the old non-inline code prior to commit 22e5bbd0deadcbd767864bd714e890b70e1fe1df accepted, so that there's only one potentially memory-based operand to the asm. this can then be saved with a single push, and after that the values can be read off into the registers they're needed in. there's some size overhead, but still a lot less execution overhead than the old out-of-line code. doing it better depends on a modern compiler that lets you use ebx and ebp in asm constraints without restriction. the failure modes on compilers where this doesn't work are inconsistent and dangerous (on at least some gcc versions 4.x and earlier, wrong codegen!), so this is a delicate matter. it can be addressed later if needed.
2019-05-05make fgetwc set error indicator for stream on encoding errorsRich Felker-2/+8
this is a requirement in POSIX that's omitted, and seemed potentially non-conforming, in the C standard. as such it was omitted here. however, as part of Austin Group issue #1170, the discrepancy was raised with WG14 and determined to be unintended; future versions of the C standard will require the error indicator to be set, as POSIX does.
2019-05-05fix broken posix_fadvise on mips due to missing 7-arg syscall supportRich Felker-0/+25
commit 788d5e24ca19c6291cebd8d1ad5b5ed6abf42665 exposed the breakage at build time by removing support for 7-argument syscalls; however, the external __syscall function provided for mips before did not pass a 7th argument from the stack, so the behavior was just silently broken.
2019-05-05allow archs to provide a 7-argument syscall if neededRich Felker-0/+1
commit 788d5e24ca19c6291cebd8d1ad5b5ed6abf42665 noted that we could add this if needed, and in fact it is needed, but not for one of the archs documented as having a 7th syscall arg register. rather, it's needed for mips (o32), where all but the first 4 arguments are passed on the stack, and the stack can accommodate a 7th.
2019-05-05fix build regression on mips n32 due to typo in new inline syscallRich Felker-1/+1
commit 1bcdaeee6e659f1d856717c9aa562a068f2f3bd4 introduced the regression.
2019-05-05fix passing of 64-bit syscall arguments on microblazeRich Felker-1/+1
this has been wrong since the beginning of the microblaze port: the syscall ABI for microblaze does not align 64-bit arguments on even register boundaries. commit 788d5e24ca19c6291cebd8d1ad5b5ed6abf42665 exposed the problem by introducing references to a nonexistent __syscall7. the ABI is not documented well anywhere, but I was able to confirm against both strace source and glibc source that microblaze is not using the alignment. per the syscall(2) man page, posix_fadvise, ftruncate, pread, pwrite, readahead, sync_file_range, and truncate were all affected and either did not work at all, or only worked by chance, e.g. when the affected argument slots were all zero.
2019-04-23fix regression in s390x SO_PEERSEC definitionRich Felker-0/+1
analogous to commit efda534b212f713fe2b92a62b06e45f656b763ce for powerpc. commit 587f5a53bc3a68d80b239ba515d583df690a96df moved the definition of SO_PEERSEC to bits/socket.h for archs where the SO_* macros differ.
2019-04-20make new math code compatible with unused variable warning/errorRich Felker-3/+6
commit b50d315fd23f0fbc4c11e2583801dd123d933745 introduced fp_force_eval implemented by default with a dead store to a volatile variable. unfortunately introduces warnings with -Wunused-variable and breaks the ability to use -Werror with the default warning options set by configure when warnings are enabled. we could just call fp_barrier instead, but that results in a spurious load after the store due to volatile semantics. the fix committed here avoids the load. it will still produce warnings without -Wno-unused-but-set-variable, but that's part of our default warning profile, and there are already other locations in the source where an unused variable warning will occur without it.
2019-04-17math: new powSzabolcs Nagy-303/+521
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc The underflow exception is signaled if the result is in the subnormal range even if the result is exact. code size change: +3421 bytes. benchmark on x86_64 before, after, speedup: -Os: pow rthruput: 102.96 ns/call 33.38 ns/call 3.08x pow latency: 144.37 ns/call 54.75 ns/call 2.64x -O3: pow rthruput: 98.91 ns/call 32.79 ns/call 3.02x pow latency: 138.74 ns/call 53.78 ns/call 2.58x
2019-04-17math: new exp and exp2Szabolcs Nagy-480/+434
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc TOINT_INTRINSICS and EXP_USE_TOINT_NARROW cases are unused. The underflow exception is signaled if the result is in the subnormal range even if the result is exact (e.g. exp2(-1023.0)). code size change: -1672 bytes. benchmark on x86_64 before, after, speedup: -Os: exp rthruput: 12.73 ns/call 6.68 ns/call 1.91x exp latency: 45.78 ns/call 21.79 ns/call 2.1x exp2 rthruput: 6.35 ns/call 5.26 ns/call 1.21x exp2 latency: 26.00 ns/call 16.58 ns/call 1.57x -O3: exp rthruput: 12.75 ns/call 6.73 ns/call 1.89x exp latency: 45.91 ns/call 21.80 ns/call 2.11x exp2 rthruput: 6.47 ns/call 5.40 ns/call 1.2x exp2 latency: 26.03 ns/call 16.54 ns/call 1.57x
2019-04-17math: new log2Szabolcs Nagy-106/+335
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +2458 bytes (+1524 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log2 rthruput: 16.08 ns/call 10.49 ns/call 1.53x log2 latency: 44.54 ns/call 25.55 ns/call 1.74x -O3: log2 rthruput: 15.92 ns/call 10.11 ns/call 1.58x log2 latency: 44.66 ns/call 26.16 ns/call 1.71x
2019-04-17math: new logSzabolcs Nagy-104/+454
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc Assume __FP_FAST_FMA implies __builtin_fma is inlined as a single instruction. code size change: +4588 bytes (+2540 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log rthruput: 12.61 ns/call 7.95 ns/call 1.59x log latency: 41.64 ns/call 23.38 ns/call 1.78x -O3: log rthruput: 12.51 ns/call 7.75 ns/call 1.61x log latency: 41.82 ns/call 23.55 ns/call 1.78x
2019-04-17math: new powfSzabolcs Nagy-240/+232
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc POWF_SCALE != 1.0 case only matters if TOINT_INTRINSICS is set, which is currently not supported for any target. SNaN is not supported, it would require an issignalingf implementation. code size change: -816 bytes. benchmark on x86_64 before, after, speedup: -Os: powf rthruput: 95.14 ns/call 20.04 ns/call 4.75x powf latency: 137.00 ns/call 34.98 ns/call 3.92x -O3: powf rthruput: 92.48 ns/call 13.67 ns/call 6.77x powf latency: 131.11 ns/call 35.15 ns/call 3.73x
2019-04-17math: new exp2f and expfSzabolcs Nagy-179/+193
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc In expf TOINT_INTRINSICS is kept, but is unused, it would require support for __builtin_round and __builtin_lround as single instruction. code size change: +94 bytes. benchmark on x86_64 before, after, speedup: -Os: expf rthruput: 9.19 ns/call 8.11 ns/call 1.13x expf latency: 34.19 ns/call 18.77 ns/call 1.82x exp2f rthruput: 5.59 ns/call 6.52 ns/call 0.86x exp2f latency: 17.93 ns/call 16.70 ns/call 1.07x -O3: expf rthruput: 9.12 ns/call 4.92 ns/call 1.85x expf latency: 34.44 ns/call 18.99 ns/call 1.81x exp2f rthruput: 5.58 ns/call 4.49 ns/call 1.24x exp2f latency: 17.95 ns/call 16.94 ns/call 1.06x
2019-04-17math: new log2fSzabolcs Nagy-58/+108
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +177 bytes. benchmark on x86_64 before, after, speedup: -Os: log2f rthruput: 11.38 ns/call 5.99 ns/call 1.9x log2f latency: 35.01 ns/call 22.57 ns/call 1.55x -O3: log2f rthruput: 10.82 ns/call 5.58 ns/call 1.94x log2f latency: 35.13 ns/call 21.04 ns/call 1.67x
2019-04-17math: new logfSzabolcs Nagy-54/+109
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc, with minor changes to better fit into musl. code size change: +289 bytes. benchmark on x86_64 before, after, speedup: -Os: logf rthruput: 8.40 ns/call 6.14 ns/call 1.37x logf latency: 31.79 ns/call 24.33 ns/call 1.31x -O3: logf rthruput: 8.43 ns/call 5.58 ns/call 1.51x logf latency: 32.04 ns/call 20.88 ns/call 1.53x
2019-04-17math: add configuration macrosSzabolcs Nagy-0/+5
Musl currently aims to support non-nearest rounding mode and does not support SNaNs. These macros allow marking relevant code paths in case these decisions are changed later (they also help documenting the corner cases involved).
2019-04-17math: add macros for static branch prediction hintsSzabolcs Nagy-0/+9
These don't have an effectw with -Os so not useful with default settings other than documenting the expectation. With --enable-optimize=internal,malloc,string,math the libc.so code size increases by 18K on x86_64 and performance varies in -2% .. +10%.
2019-04-17math: add double precision error handling functionsSzabolcs Nagy-0/+35
2019-04-17math: add single precision error handling functionsSzabolcs Nagy-0/+37
These are supposed to be used in tail call positions when handling special cases in new code. (fp exceptions may be raised "naturally" by the common code path if special casing is more effort.) This implements the error handling apis used in https://github.com/ARM-software/optimized-routines without errno setting.
2019-04-17math: add eval_as_float and eval_as_doubleSzabolcs Nagy-0/+17
Previously type casts or assignments were used for handling excess precision, which assumed standard C99 semantics, but since it's a rarely needed obscure detail, it's better to use explicit helper functions to document where we rely on this. It also helps if the code is used outside of the libc in non-C99 compilation mode: with the default excess precision handling of gcc, explicit inline asm barriers are needed for narrowing on FLT_EVAL_METHOD!=0 targets. I plan to use this in new code with the existing style that uses double_t and float_t as much as possible. One ugliness is that it is required for almost every return statement since that does not drop excess precision (the standard changed this in C11 annex F, but that does not help in non-standard compilation modes or with old compilers).
2019-04-17math: add fp_arch.h with fp_barrier and fp_force_evalSzabolcs Nagy-6/+90
C99 has ways to support fenv access, but compilers don't implement it and assume nearest rounding mode and no fp status flag access. (gcc has -frounding-math and then it does not assume nearest rounding mode, but it still assumes the compiled code itself does not change the mode. Even if the C99 mechanism was implemented it is not ideal: it requires all code in the library to be compiled with FENV_ACCESS "on" to make it usable in non-nearest rounding mode, but that limits optimizations more than necessary.) The math functions should give reasonable results in all rounding modes (but the quality may be degraded in non-nearest rounding modes) and the fp status flag settings should follow the spec, so fenv side-effects are important and code transformations that break them should be prevented. Unfortunately compilers don't give any help with this, the best we can do is to add fp barriers to the code using volatile local variables (they create a stack frame and undesirable memory accesses to it) or inline asm (gcc specific, requires target specific fp reg constraints, often creates unnecessary reg moves and multiple barriers are needed to express that an operation has side-effects) or extern call (only useful in tail-call position to avoid stack-frame creation and does not work with lto). We assume that in a math function if an operation depends on the input and the output depends on it, then the operation will be evaluated at runtime when the function is called, producing all the expected fenv side-effects (this is not true in case of lto and in case the operation is evaluated with excess precision that is not rounded away). So fp barriers are needed (1) to prevent the move of an operation within a function (in case it may be moved from an unevaluated code path into an evaluated one or if it may be moved across a fenv access), (2) force the evaluation of an operation for its side-effect when it has no input dependency (may be constant folded) or (3) when its output is unused. I belive that fp_barrier and fp_force_eval can take care of these and they should not be needed in hot code paths.
2019-04-17math: remove sun copyright from libm.hSzabolcs Nagy-23/+0
Nothing is left from the original fdlibm header nor from the bsd modifications to it other than some internal api declarations. Comments are dropped that may be copyrightable content.
2019-04-17math: add asuint, asuint64, asfloat and asdoubleSzabolcs Nagy-33/+15
Code generation for SET_HIGH_WORD slightly changes, but it only affects pow, otherwise the generated code is unchanged.