|Age||Commit message (Collapse)||Author||Lines|
dtv_copy, canary2, and canary_at_end existed solely to match multiple
ABI and asm-accessed layouts simultaneously. now that pthread_arch.h
can be included before struct __pthread is defined, the struct layout
can depend on macros defined by pthread_arch.h.
the adjustment made is entirely a function of TLS_ABOVE_TP and
TP_OFFSET. aside from avoiding repetition of the TP_OFFSET value and
arithmetic, this change makes pthread_arch.h independent of the
definition of struct __pthread from pthread_impl.h. this in turn will
allow inclusion of pthread_arch.h to be moved to the top of
pthread_impl.h so that it can influence the definition of the
previously, arch files were very inconsistent about the type used for
the thread pointer. this change unifies the new __get_tp interface to
always use uintptr_t, which is the most correct when performing
arithmetic that may involve addresses outside the actual pointed-to
object (due to TP_OFFSET).
the only part of TP_ADJ that was not uniquely determined by
TLS_ABOVE_TP was the 0x7000 adjustment used mainly on mips and powerpc
this will allow the compiler to cache and reuse the result, meaning we
no longer have to take care not to load it more than once for the sake
of archs where the load may be expensive.
depends on commit 1c84c99913bf1cd47b866ed31e665848a0da84a2 for
correctness, since otherwise the compiler could hoist loads during
stage 3 of dynamic linking before the initial thread-pointer setup.
versions of clang all the way back to 3.1 lack the bug this was
purportedly working around.
In TLS variant I the TLS is above TP (or above a fixed offset from TP)
but on some targets there is a reserved gap above TP before TLS starts.
This matters for the local-exec tls access model when the offsets of
TLS variables from the TP are hard coded by the linker into the
executable, so the libc must compute these offsets the same way as the
linker. The tls offset of the main module has to be
If there is no TLS in the main module then the gap can be ignored
since musl does not use it and the tls access models of shared
libraries are not affected.
The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0
(i.e. TLS did not require large alignment) because the gap was
treated as a fixed offset from TP. Now the TP points at the end
of the pthread struct (which is aligned) and there is a gap above
it (which may also need alignment).
The fix required changing TP_ADJ and __pthread_self on affected
targets (aarch64, arm and sh) and in the tlsdesc asm the offset to
access the dtv changed too.
using the actual mcontext_t definition rather than an overlaid pointer
array both improves correctness/readability and eliminates some ugly
hacks for archs with 64-bit registers bit 32-bit program counter.
also fix UB due to comparison of pointers not in a common array
other archs use asm for the thread pointer load, so making that asm
volatile is sufficient to inform the compiler that it has a "side
effect" (crashing or giving the wrong result if the thread pointer was
not yet initialized) that prevents reordering. however, powerpc and
or1k have dedicated general purpose registers for the thread pointer
and did not need to use any asm to access it; instead, "local register
variables with a specified register" were used. however, there is no
specification for ordering constraints on this type of usage, and
presumably use of the thread pointer could be reordered across its
to impose an ordering, I have added empty volatile asm blocks that
produce the "local register variable with a specified register" as
an output constraint.
the TLS ABI spec for mips, powerpc, and some other (presently
unsupported) RISC archs has the return value of __tls_get_addr offset
by +0x8000 and the result of DTPOFF relocations offset by -0x8000. I
had previously assumed this part of the ABI was actually just an
implementation detail, since the adjustments cancel out. however, when
the local dynamic model is used for accessing TLS that's known to be
in the same DSO, either of the following may happen:
1. the -0x8000 offset may already be applied to the argument structure
passed to __tls_get_addr at ld time, without any opportunity for
2. __tls_get_addr may be used with a zero offset argument to obtain a
base address for the module's TLS, to which the caller then applies
immediate offsets for individual objects accessed using the local
dynamic model. since the immediate offsets have the -0x8000 adjustment
applied to them, the base address they use needs to include the
it would be possible, but more complex, to store the pointers in the
dtv array with the +0x8000 offset pre-applied, to avoid the runtime
cost of adding 0x8000 on each call to __tls_get_addr. this change
could be made later if measurements show that it would help.
i386, x86_64, x32, and powerpc all use TLS for stack protector canary
values in the default stack protector ABI, but the location only
matched the ABI on i386 and x86_64. on x32, the expected location for
the canary contained the tid, thus producing spurious mismatches
(resulting in process termination) upon fork. on powerpc, the expected
location contained the stdio_locks list head, so returning from a
function after calling flockfile produced spurious mismatches. in both
cases, the random canary was not present, and a predictable value was
used instead, making the stack protector hardening much less effective
than it should be.
in the current fix, the thread structure has been expanded to have
canary fields at all three possible locations, and archs that use a
non-default location must define a macro in pthread_arch.h to choose
which location is used. for most archs (which lack TLS canary ABI) the
choice does not matter.
based on patch by Richard Pennington, who initially reported the