summaryrefslogtreecommitdiff
path: root/src/thread
AgeCommit message (Collapse)AuthorLines
2015-06-10implement arch-generic version of __unmapselfRich Felker-0/+29
this can be used to put off writing an asm version of __unmapself for new archs, or as a permanent solution on archs where it's not practical or even possible to run momentarily with no stack. the concept here is simple: the caller takes a lock on a global shared stack and uses it to make the munmap and exit syscalls. the only trick is unlocking, which must be done after the thread exits, and this is achieved by using the set_tid_address syscall to have the kernel zero and futex-wake the lock word as part of the exit syscall.
2015-05-25mark mips cancellable syscall code as codeRich Felker-0/+3
otherwise disassemblers treat it as data.
2015-05-16eliminate costly tricks to avoid TLS access for current locale stateRich Felker-6/+0
the code being removed used atomics to track whether any threads might be using a locale other than the current global locale, and whether any threads might have abstract 8-bit (non-UTF-8) LC_CTYPE active, a feature which was never committed (still pending). the motivations were to support early execution prior to setup of the thread pointer, to partially support systems (ancient kernels) where thread pointer setup is not possible, and to avoid high performance cost on archs where accessing the thread pointer may be very slow. since commit 19a1fe670acb3ab9ead0fe31859ca7d4fe40dd54, the thread pointer is always available, so these hacks are no longer needed. removing them greatly simplifies the affected code.
2015-05-16in i386 __set_thread_area, don't assume %gs register is initially zeroRich Felker-4/+9
commit f630df09b1fd954eda16e2f779da0b5ecc9d80d3 added logic to handle the case where __set_thread_area is called more than once by reusing the GDT slot already in the %gs register, and only setting up a new GDT slot when %gs is zero. this created a hidden assumption that %gs is zero when a new process image starts, which is true in practice on Linux, but does not seem to be documented ABI, and fails to hold under qemu app-level emulation. while it would in theory be possible to zero %gs in the entry point code, this code is shared between static and dynamic binaries, and dynamic binaries must not clobber the value of %gs already setup by the dynamic linker. the alternative solution implemented in this commit simply uses global data to store the GDT index that's selected. __set_thread_area should only be called in the initial thread anyway (subsequent threads get their thread pointer setup by __clone), but even if it were called by another thread, it would simply read and write back the same GDT index that was already assigned to the initial thread, and thus (in the x86 memory model) there is no data race.
2015-05-06fix stack protector crashes on x32 & powerpc due to misplaced TLS canaryRich Felker-1/+1
i386, x86_64, x32, and powerpc all use TLS for stack protector canary values in the default stack protector ABI, but the location only matched the ABI on i386 and x86_64. on x32, the expected location for the canary contained the tid, thus producing spurious mismatches (resulting in process termination) upon fork. on powerpc, the expected location contained the stdio_locks list head, so returning from a function after calling flockfile produced spurious mismatches. in both cases, the random canary was not present, and a predictable value was used instead, making the stack protector hardening much less effective than it should be. in the current fix, the thread structure has been expanded to have canary fields at all three possible locations, and archs that use a non-default location must define a macro in pthread_arch.h to choose which location is used. for most archs (which lack TLS canary ABI) the choice does not matter.
2015-05-02fix x32 __set_thread_area failure due to junk in upper bitsRich Felker-1/+1
the kernel does not properly clear the upper bits of the syscall argument, so we have to do it before the syscall.
2015-04-22minor optimization to pthread_spin_trylockRich Felker-2/+4
use CAS instead of swap since it's lighter for most archs, and keep EBUSY in the lock value so that the old value obtained by CAS can be used directly as the return value for pthread_spin_trylock.
2015-04-22optimize spin lock not to dirty cache line while spinningRich Felker-1/+1
2015-04-21fix mmap leak in sem_open failure path for link callRich Felker-0/+1
the leak was found by static analysis (reported by Alexander Monakov), not tested/observed, but seems to have occured both when failing due to O_EXCL, and in a race condition with O_CREAT but not O_EXCL where a semaphore by the same name was created concurrently.
2015-04-18make dlerror state and message thread-local and dynamically-allocatedRich Felker-0/+2
this fixes truncation of error messages containing long pathnames or symbol names. the dlerror state was previously required by POSIX to be global. the resolution of bug 97 relaxed the requirements to allow thread-safe implementations of dlerror with thread-local state and message buffer.
2015-04-17fix sh build regressions in asmRich Felker-1/+1
even hidden functions need @PLT symbol references; otherwise an absolute address is produced instead of a PC-relative one.
2015-04-17fix sh __set_thread_area uninitialized return valueRich Felker-1/+2
this caused the dynamic linker/startup code to abort when r0 happened to contain a negative value.
2015-04-14use hidden __tls_get_new for tls/tlsdesc lookup fallback casesRich Felker-1/+3
previously, the dynamic tlsdesc lookup functions and the i386 special-ABI ___tls_get_addr (3 underscores) function called __tls_get_addr when the slot they wanted was not already setup; __tls_get_addr would then in turn also see that it's not setup and call __tls_get_new. calling __tls_get_new directly is both more efficient and avoids the issue of calling a non-hidden (public API/ABI) function from asm. for the special i386 function, a weak reference to __tls_get_new is used since this function is not defined when static linking (the code path that needs it is unreachable in static-linked programs).
2015-04-14cleanup use of visibility attributes in pthread_cancel.cRich Felker-8/+9
applying the attribute to a weak_alias macro was a hack. instead use a separate declaration to apply the visibility, and consolidate declarations together to avoid having visibility mess all over the file.
2015-04-14fix inconsistent visibility for internal syscall symbolsRich Felker-0/+5
2015-04-14consistently use hidden visibility for cancellable syscall internalsRich Felker-30/+96
in a few places, non-hidden symbols were referenced from asm in ways that assumed ld-time binding. while these is no semantic reason these symbols need to be hidden, fixing the references without making them hidden was going to be ugly, and hidden reduces some bloat anyway. in the asm files, .global/.hidden directives have been moved to the top to unclutter the actual code.
2015-04-14fix inconsistent visibility for internal __tls_get_new functionRich Felker-3/+2
at the point of call it was declared hidden, but the definition was not hidden. for some toolchains this inconsistency produced textrels without ld-time binding.
2015-04-13remove remnants of support for running in no-thread-pointer modeRich Felker-11/+5
since 1.1.0, musl has nominally required a thread pointer to be setup. most of the remaining code that was checking for its availability was doing so for the sake of being usable by the dynamic linker. as of commit 71f099cb7db821c51d8f39dfac622c61e54d794c, this is no longer necessary; the thread pointer is now valid before any libc code (outside of dynamic linker bootstrap functions) runs. this commit essentially concludes "phase 3" of the "transition path for removing lazy init of thread pointer" project that began during the 1.1.0 release cycle.
2015-04-13allow i386 __set_thread_area to be called more than onceRich Felker-1/+5
previously a new GDT slot was requested, even if one had already been obtained by a previous call. instead extract the old slot number from GS and reuse it if it was already set. the formula (GS-3)/8 for the slot number automatically yields -1 (request for new slot) if GS is zero (unset).
2015-04-11remove mismatched arguments from vmlock function definitionsRich Felker-2/+2
commit f08ab9e61a147630497198fe3239149275c0a3f4 introduced these accidentally as remnants of some work I tried that did not work out.
2015-04-10apply vmlock wait to __unmapself in pthread_exitRich Felker-0/+4
2015-04-10redesign and simplify vmlock systemRich Felker-30/+18
this global lock allows certain unlock-type primitives to exclude mmap/munmap operations which could change the identity of virtual addresses while references to them still exist. the original design mistakenly assumed mmap/munmap would conversely need to exclude the same operations which exclude mmap/munmap, so the vmlock was implemented as a sort of 'symmetric recursive rwlock'. this turned out to be unnecessary. commit 25d12fc0fc51f1fae0f85b4649a6463eb805aa8f already shortened the interval during which mmap/munmap held their side of the lock, but left the inappropriate lock design and some inefficiency. the new design uses a separate function, __vm_wait, which does not hold any lock itself and only waits for lock users which were already present when it was called to release the lock. this is sufficient because of the way operations that need to be excluded are sequenced: the "unlock-type" operations using the vmlock need only block mmap/munmap operations that are precipitated by (and thus sequenced after) the atomic-unlock they perform while holding the vmlock. this allows for a spectacular lack of synchronization in the __vm_wait function itself.
2015-04-10optimize out setting up robust list with kernel when not neededRich Felker-6/+5
as a result of commit 12e1e324683a1d381b7f15dd36c99b37dd44d940, kernel processing of the robust list is only needed for process-shared mutexes. previously the first attempt to lock any owner-tracked mutex resulted in robust list initialization and a set_robust_list syscall. this is no longer necessary, and since the kernel's record of the robust list must now be cleared at thread exit time for detached threads, optimizing it out is more worthwhile than before too.
2015-04-10process robust list in pthread_exit to fix detached thread use-after-unmapRich Felker-26/+27
the robust list head lies in the thread structure, which is unmapped before exit for detached threads. this leaves the kernel unable to process the exiting thread's robust list, and with a dangling pointer which may happen to point to new unrelated data at the time the kernel processes it. userspace processing of the robust list was already needed for non-pshared robust mutexes in order to perform private futex wakes rather than the shared ones the kernel would do, but it was conditional on linking pthread_mutexattr_setrobust and did not bother processing the pshared mutexes in the list, which requires additional logic for the robust list pending slot in case pthread_exit is interrupted by asynchronous process termination. the new robust list processing code is linked unconditionally (inlined in pthread_exit), handles both private and shared mutexes, and also removes the kernel's reference to the robust list before unmapping and exit if the exiting thread is detached.
2015-03-16block all signals (even internal ones) in cancellation signal handlerRich Felker-1/+2
previously the implementation-internal signal used for multithreaded set*id operations was left unblocked during handling of the cancellation signal. however, on some archs, signal contexts are huge (up to 5k) and the possibility of nested signal handlers drastically increases the minimum stack requirement. since the cancellation signal handler will do its job and return in bounded time before possibly passing execution to application code, there is no need to allow other signals to interrupt it.
2015-03-11add aarch64 portSzabolcs Nagy-0/+69
This adds complete aarch64 target support including bigendian subarch. Some of the long double math functions are known to be broken otherwise interfaces should be fully functional, but at this point consider this port experimental. Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
2015-03-07fix regression in pthread_cond_wait with cancellation disabledRich Felker-0/+1
due to a logic error in the use of masked cancellation mode, pthread_cond_wait did not honor PTHREAD_CANCEL_DISABLE but instead failed with ECANCELED when cancellation was pending.
2015-03-04fix signed left-shift overflow in pthread_condattr_setpsharedRich Felker-1/+1
2015-03-03make all objects used with atomic operations volatileRich Felker-16/+18
the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,*p,f(*p)); where the latter is intended to show multiple loads of *p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.
2015-03-02suppress masked cancellation in pthread_joinRich Felker-1/+5
like close, pthread_join is a resource-deallocation function which is also a cancellation point. the intent of masked cancellation mode is to exempt such functions from failure with ECANCELED.
2015-03-02fix namespace issue in pthread_join affecting thrd_joinRich Felker-1/+2
pthread_testcancel is not in the ISO C reserved namespace and thus cannot be used here. use the namespace-protected version of the function instead.
2015-03-02factor cancellation cleanup push/pop out of futex __timedwait functionRich Felker-24/+21
previously, the __timedwait function was optionally a cancellation point depending on whether it was passed a pointer to a cleaup function and context to register. as of now, only one caller actually used such a cleanup function (and it may face removal soon); most callers either passed a null pointer to disable cancellation or a dummy cleanup function. now, __timedwait is never a cancellation point, and __timedwait_cp is the cancellable version. this makes the intent of the calling code more obvious and avoids ugly dummy functions and long argument lists.
2015-02-27fix failure of internal futex __timedwait to report ECANCELEDRich Felker-1/+1
as part of abstracting the futex wait, this function suppresses all futex error values which callers should not see using a whitelist approach. when the masked cancellation mode was added, the new ECANCELED error was not whitelisted. this omission caused the new pthread_cond_wait code using masked cancellation to exhibit a spurious wake (rather than acting on cancellation) when the request arrived after blocking on the cond var.
2015-02-23fix breakage in pthread_cond_wait due to typoRich Felker-1/+1
due to accidental use of = instead of ==, the error code was always set to zero in the signaled wake case for non-shared cv waits. suppressing ETIMEDOUT (the only possible wait error) is harmless and actually permitted in this case, but suppressing mutex errors could give the caller false information about the state of the mutex. commit 8741ffe625363a553e8f509dc3ca7b071bdbab47 introduced this regression and commit d9da1fb8c592469431c764732d09f7756340190e preserved it when reorganizing the code.
2015-02-22simplify cond var code now that cleanup handler is not neededRich Felker-86/+63
2015-02-22fix pthread_cond_wait cancellation raceRich Felker-5/+38
it's possible that signaling a waiter races with cancellation of that same waiter. previously, cancellation was acted upon, causing the signal to be consumed with no waiter returning. by using the new masked cancellation state, it's possible to refuse to act on the cancellation request and instead leave it pending. to ease review and understanding of the changes made, this commit leaves the unwait function, which was previously the cancellation cleanup handler, in place. additional simplifications could be made by removing it.
2015-02-21add new masked cancellation modeRich Felker-10/+16
this is a new extension which is presently intended only for experimental and internal libc use. interface and behavior details may change subject to feedback and experience from using it internally. the basic concept for the new PTHREAD_CANCEL_MASKED state is that the first cancellation point to observe the cancellation request fails with an errno value of ECANCELED rather than acting on cancellation, allowing the caller to process the status and choose whether/how to act upon it.
2015-02-20prepare cancellation syscall asm for possibility of __cancel returningRich Felker-11/+32
2015-02-16make pthread_exit responsible for disabling cancellationRich Felker-3/+2
this requirement is tucked away in XSH 2.9.5 Thread Cancellation under the heading Thread Cancellation Cleanup Handlers.
2015-02-09use the internal macro name FUTEX_PRIVATE in __waitSzabolcs Nagy-1/+1
the name was recently added for the setxid/synccall rework, so use the name now that we have it.
2015-02-03fix missing memory barrier in cancellation signal handlerRich Felker-0/+1
in practice this was probably a non-issue, because the necessary barrier almost certainly exists in kernel space -- implementing signal delivery without such a barrier seems impossible -- but for the sake of correctness, it should be done here too. in principle, without a barrier, it is possible that the thread to be cancelled does not see the store of its cancellation flag performed by another thread. this affects both the case where the signal arrives before entering the critical program counter range from __cp_begin to __cp_end (in which case both the signal handler and the inline check fail to see the value which was already stored) and the case where the signal arrives during the critical range (in which case the signal handler should be responsible for cancellation, but when it does not see the cancellation flag, it assumes the signal is spurious and refuses to act on it). in the fix, the barrier is placed only in the signal handler, not in the inline check at the beginning of the critical program counter range. if the signal handler runs before the critical range is entered, it will of course take no action, but its barrier will ensure that the inline check subsequently sees the store. if on the other hand the inline check runs first, it may miss seeing the store, but the subsequent signal handler in the critical range will act upon the cancellation request. this strategy avoids adding a memory barrier in the common, non-cancellation code path.
2015-01-15overhaul __synccall and fix AS-safety and other issues in set*idRich Felker-45/+138
multi-threaded set*id and setrlimit use the internal __synccall function to work around the kernel's wrongful treatment of these process properties as thread-local. the old implementation of __synccall failed to be AS-safe, despite POSIX requiring setuid and setgid to be AS-safe, and was not rigorous in assuring that all threads were caught. in a worst case, threads late in the process of exiting could retain permissions after setuid reported success, in which case attacks to regain dropped permissions may have been possible under the right conditions. the new implementation of __synccall depends on the presence of /proc/self/task and will fail if it can't be opened, but is able to determine that it has caught all threads, and does not use any locks except its own. it thereby achieves AS-safety simply by blocking signals to preclude re-entry in the same thread. with this commit, all known conformance and safety issues in set*id functions should be fixed.
2015-01-15suppress EINTR in sem_wait and sem_timedwaitRich Felker-1/+1
per POSIX, the EINTR condition is an optional error for these functions, not a mandatory one. since old kernels (pre-2.6.22) failed to honor SA_RESTART for the futex syscall, it's dangerous to trust EINTR from the kernel. thankfully POSIX offers an easy way out.
2014-11-22fix __aeabi_read_tp oversight in arm atomics/tls overhaulRich Felker-4/+0
calls to __aeabi_read_tp may be generated by the compiler to access TLS on pre-v6 targets. previously, this function was hard-coded to call the kuser helper, which would crash on kernels with kuser helper removed. to fix the problem most efficiently, the definition of __aeabi_read_tp is moved so that it's an alias for the new __a_gettp. however, on v7+ targets, code to initialize the runtime choice of thread-pointer loading code is not even compiled, meaning that defining __aeabi_read_tp would have caused an immediate crash due to using the default implementation of __a_gettp with a HCF instruction. fortunately there is an elegant solution which reduces overall code size: putting the native thread-pointer loading instruction in the default code path for __a_gettp, so that separate default/native code paths are not needed. this function should never be called before __set_thread_area anyway, and if it is called early on pre-v6 hardware, the old behavior (crashing) is maintained. ideally __aeabi_read_tp would not be called at all on v7+ targets anyway -- in fact, prior to the overhaul, the same problem existed, but it was never caught by users building for v7+ with kuser disabled. however, it's possible for calls to __aeabi_read_tp to end up in a v7+ binary if some of the object files were built for pre-v7 targets, e.g. in the case of static libraries that were built separately, so this case needs to be handled.
2014-11-19overhaul ARM atomics/tls for performance and compatibilityRich Felker-12/+1
previously, builds for pre-armv6 targets hard-coded use of the "kuser helper" system for atomics and thread-pointer access, resulting in binaries that fail to run (crash) on systems where this functionality has been disabled (as a security/hardening measure) in the kernel. additionally, builds for armv6 hard-coded an outdated/deprecated memory barrier instruction which may require emulation (extremely slow) on future models. this overhaul replaces the behavior for all pre-armv7 builds (both of the above cases) to perform runtime detection of the appropriate mechanisms for barrier, atomic compare-and-swap, and thread pointer access. detection is based on information provided by the kernel in auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture version encoded in AT_PLATFORM. direct use of the instructions is preferred when possible, since probing for the existence of the kuser helper page would be difficult and would incur runtime cost. for builds targeting armv7 or later, the runtime detection code is not compiled at all, and much more efficient versions of the non-cas atomic operations are provided by using ldrex/strex directly rather than wrapping cas.
2014-10-20manually "shrink wrap" fast path in pthread_onceRich Felker-8/+12
this change is a workaround for the inability of current compilers to perform "shrink wrapping" optimizations. in casual testing, it roughly doubled the performance of pthread_once when called on an already-finished once control object.
2014-10-13eliminate global waiters count in pthread_onceRich Felker-9/+13
2014-10-10fix missing barrier in pthread_once/call_once shortcut pathRich Felker-2/+6
these functions need to be fast when the init routine has already run, since they may be called very often from code which depends on global initialization having taken place. as such, a fast path bypassing atomic cas on the once control object was used to avoid heavy memory contention. however, on archs with weakly ordered memory, the fast path failed to ensure that the caller actually observes the side effects of the init routine. preliminary performance testing showed that simply removing the fast path was not practical; a performance drop of roughly 85x was observed with 20 threads hammering the same once control on a 24-core machine. so the new explicit barrier operation from atomic.h is used to retain the fast path while ensuring memory visibility. performance may be reduced on some archs where the barrier actually makes a difference, but the previous behavior was unsafe and incorrect on these archs. future improvements to the implementation of a_barrier should reduce the impact.
2014-09-07add C11 thread creation and related thread functionsRich Felker-7/+82
based on patch by Jens Gustedt. the main difficulty here is handling the difference between start function signatures and thread return types for C11 threads versus POSIX threads. pointers to void are assumed to be able to represent faithfully all values of int. the function pointer for the thread start function is cast to an incorrect type for passing through pthread_create, but is cast back to its correct type before calling so that the behavior of the call is well-defined. changes to the existing threads implementation were kept minimal to reduce the risk of regressions, and duplication of code that carries implementation-specific assumptions was avoided for ease and safety of future maintenance.
2014-09-06add C11 condition variable functionsJens Gustedt-0/+57
Because of the clear separation for private pthread_cond_t these interfaces are quite simple and direct.