summaryrefslogtreecommitdiff
path: root/arch/mips/pthread_arch.h
AgeCommit message (Collapse)AuthorLines
2021-08-12fix excessively slow TLS performance on some mips modelsRich Felker-2/+1
commit 6d99ad91e869aab35a4d76d34c3c9eaf29482bad introduced this regression as part of a larger change, based on an incorrect assumption that rdhwr being part of the mips r2 ISA level meant that the TLS register, known in the mips documentation as UserLocal, was unconditionally present on chips providing this ISA level and would not need trap-and-emulate. this turns out to be false. based on research by Stanislav Kljuhhin and Abilio Marques, who reported the problem as a performance regression on certain routers using OpenWRT vs older uclibc-based versions, it turns out the mips manuals document the UserLocal register as a feature that might or might not be implemented or enabled, reflected by a cpu capability bit in the CONFIG3 register, and that Linux checks for this and has to explicitly enable it on models that have it. thus, it's indeed possible that r2+ chips can lack the feature, bringing us back to the situation where Linux only has a fast trap-and-emulate path for the case where the destination register is $3. so, always read the thread pointer through $3. this may incur a gratuitous move to the desired final register on chips where it's not needed, but it really doesn't matter.
2020-08-27deduplicate __pthread_self thread pointer adjustment out of each archRich Felker-4/+4
the adjustment made is entirely a function of TLS_ABOVE_TP and TP_OFFSET. aside from avoiding repetition of the TP_OFFSET value and arithmetic, this change makes pthread_arch.h independent of the definition of struct __pthread from pthread_impl.h. this in turn will allow inclusion of pthread_arch.h to be moved to the top of pthread_impl.h so that it can influence the definition of the structure. previously, arch files were very inconsistent about the type used for the thread pointer. this change unifies the new __get_tp interface to always use uintptr_t, which is the most correct when performing arithmetic that may involve addresses outside the actual pointed-to object (due to TP_OFFSET).
2020-08-24deduplicate TP_ADJ logic out of each arch, replace with TP_OFFSETRich Felker-1/+1
the only part of TP_ADJ that was not uniquely determined by TLS_ABOVE_TP was the 0x7000 adjustment used mainly on mips and powerpc variants.
2018-10-16make thread-pointer-loading asm non-volatileRich Felker-2/+2
this will allow the compiler to cache and reuse the result, meaning we no longer have to take care not to load it more than once for the sake of archs where the load may be expensive. depends on commit 1c84c99913bf1cd47b866ed31e665848a0da84a2 for correctness, since otherwise the compiler could hoist loads during stage 3 of dynamic linking before the initial thread-pointer setup.
2018-06-02fix TLS layout of TLS variant I when there is a gap above TPSzabolcs Nagy-0/+1
In TLS variant I the TLS is above TP (or above a fixed offset from TP) but on some targets there is a reserved gap above TP before TLS starts. This matters for the local-exec tls access model when the offsets of TLS variables from the TP are hard coded by the linker into the executable, so the libc must compute these offsets the same way as the linker. The tls offset of the main module has to be alignup(GAP_ABOVE_TP, main_tls_align). If there is no TLS in the main module then the gap can be ignored since musl does not use it and the tls access models of shared libraries are not affected. The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0 (i.e. TLS did not require large alignment) because the gap was treated as a fixed offset from TP. Now the TP points at the end of the pthread struct (which is aligned) and there is a gap above it (which may also need alignment). The fix required changing TP_ADJ and __pthread_self on affected targets (aarch64, arm and sh) and in the tlsdesc asm the offset to access the dtv changed too.
2016-04-03add support for mips and mips64 r6 isaRich Felker-5/+4
mips32r6 and mips64r6 are actually new isas at both the asm source and opcode levels (pre-r6 code cannot run on r6) and thus need to be treated as a new subarch. the following changes are made, some of which yield code generation improvements for non-r6 targets too: - add subarch logic in configure script and reloc.h files for dynamic linker name. - suppress use of .set mips2 asm directives (used to allow mips2 atomic instructions on baseline mips1 builds; the kernel has to emulate them on mips1) except when actually needed. they cause wrong instruction encodings on r6, and pessimize inlining on at least some compilers. - only hard-code sync instruction encoding on mips1. - use "ZC" constraint instead of "m" constraint for llsc memory operands on r6, where the ll/sc instructions no longer accept full 16-bit offsets. - only hard-code rdhwr instruction encoding with .word on targets (pre-r2) where it may need trap-and-emulate by the kernel. otherwise, just use the instruction mnemonic, and allow an arbitrary destination register to be used.
2015-11-02properly access mcontext_t program counter in cancellation handlerRich Felker-1/+1
using the actual mcontext_t definition rather than an overlaid pointer array both improves correctness/readability and eliminates some ugly hacks for archs with 64-bit registers bit 32-bit program counter. also fix UB due to comparison of pointers not in a common array object.
2015-10-15add comment documenting hard-coded opcode for reading mips thread pointerRich Felker-0/+1
2015-06-25fix local-dynamic model TLS on mips and powerpcRich Felker-0/+2
the TLS ABI spec for mips, powerpc, and some other (presently unsupported) RISC archs has the return value of __tls_get_addr offset by +0x8000 and the result of DTPOFF relocations offset by -0x8000. I had previously assumed this part of the ABI was actually just an implementation detail, since the adjustments cancel out. however, when the local dynamic model is used for accessing TLS that's known to be in the same DSO, either of the following may happen: 1. the -0x8000 offset may already be applied to the argument structure passed to __tls_get_addr at ld time, without any opportunity for runtime relocations. 2. __tls_get_addr may be used with a zero offset argument to obtain a base address for the module's TLS, to which the caller then applies immediate offsets for individual objects accessed using the local dynamic model. since the immediate offsets have the -0x8000 adjustment applied to them, the base address they use needs to include the +0x8000 offset. it would be possible, but more complex, to store the pointers in the dtv[] array with the +0x8000 offset pre-applied, to avoid the runtime cost of adding 0x8000 on each call to __tls_get_addr. this change could be made later if measurements show that it would help.
2012-10-15add support for TLS variant I, presently needed for arm and mipsRich Felker-4/+8
despite documentation that makes it sound a lot different, the only ABI-constraint difference between TLS variants II and I seems to be that variant II stores the initial TLS segment immediately below the thread pointer (i.e. the thread pointer points to the end of it) and variant I stores the initial TLS segment above the thread pointer, requiring the thread descriptor to be stored below. the actual value stored in the thread pointer register also tends to have per-arch random offsets applied to it for silly micro-optimization purposes. with these changes applied, TLS should be basically working on all supported archs except microblaze. I'm still working on getting the necessary information and a working toolchain that can build TLS binaries for microblaze, but in theory, static-linked programs with TLS and dynamic-linked programs where only the main executable uses TLS should already work on microblaze. alignment constraints have not yet been heavily tested, so it's possible that this code does not always align TLS segments correctly on archs that need TLS variant I.
2012-09-07add clang-compatible thread-pointer code for mipsRich Felker-0/+4
clang does not presently support the "v" constraint we want to use to get the result from $3, and trying to use register...__asm__("$3") to do the same invokes serious compiler bugs. so for now, i'm working around the issue with an extra temp register and putting $3 in the clobber list instead of using it as output. when the bugs in clang are fixed, this issue should be revisited to generate smaller/faster code like what gcc gets.
2012-07-12mipsel (little endian) supportRich Felker-1/+1
the fields in the mcontext_t are long long (for no good reason) even on 32-bit mips, so the offset of the instruction pointer (as a word) varies depending on endianness.
2012-07-11initial version of mips (o32) port, based on work by Richard Pennington (rdp)Rich Felker-0/+8
basically, this version of the code was obtained by starting with rdp's work from his ellcc source tree, adapting it to musl's build system and coding style, auditing the bits headers for discrepencies with kernel definitions or glibc/LSB ABI or large file issues, fixing up incompatibility with the old binutils from aboriginal linux, and adding some new special cases to deal with the oddities of sigaction and pipe syscall interfaces on mips. at present, minimal test programs work, but some interfaces are broken or missing. threaded programs probably will not link.