|Age||Commit message (Collapse)||Author||Lines|
linux commit a49f4f81cb48925e8d7cbd9e59068f516e984144
arch: Wire up Landlock syscalls
Merge tag 'landlock_v34' of ... jmorris/linux-security
Landlock provides for unprivileged application sandboxing. The goal of
Landlock is to enable to restrict ambient rights (e.g. global filesystem
access) for a set of processes. Landlock is inspired by seccomp-bpf but
instead of filtering syscalls and their raw arguments, a Landlock rule
can restrict the use of kernel objects like file hierarchies, according
to the kernel semantic.
new syscall to change the properties of a mount or a mount tree using
file descriptors which the new mount api is based on, see
linux commit 2a1867219c7b27f928e2545782b86daaf9ad50bd
fs: add mount_setattr()
linux commit b0a0c2615f6f199a656ed8549d7dce625d77aa77
epoll: wire up syscall epoll_pwait2
linux commit 58169a52ebc9a733aeb5bea857bc5daa71a301bb
epoll: add syscall epoll_pwait2
epoll_wait with struct timespec timeout instead of int. no time32 variant.
mainly added to linux to allow a central process management service in
android to give MADV_COLD|PAGEOUT hints for other processes, see
linux commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
mm/madvise: introduce process_madvise() syscall: an external memory
linux commit 9b4feb630e8e9801603f3cab3a36369e3c1cf88d
arch: wire-up close_range()
linux commit 278a5fbaed89dacd04e9d052f4594ffd0e0585de
open: add close_range()
the linux faccessat syscall lacks a flag argument that is necessary
to implement the posix api, see
linux commit c8ffd8bcdd28296a198f237cc595148a8d4adfbe
vfs: add faccessat2 syscall
also added clone3 on sh and m68k, on sh it's still missing (not
yet wired up), but reserved so safe to add.
linux commit fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179
open: introduce openat2(2) syscall
linux commit 9a2cef09c801de54feecd912303ace5c27237f12
arch: wire up pidfd_getfd syscall
linux commit 8649c322f75c96e7ced2fec201e123b2b073bf09
pid: Implement pidfd_getfd syscall
linux commit e8bb2a2a1d51511e6b3f7e08125d52ec73c11139
m68k: Wire up clone3() syscall
prior to commit 685e40bb09f5f24a2af54ea09c97328808f76990, x86_64 was
correctly passing O_LARGEFILE to SYS_open; it was removed (defined to
0 in the public header, and changed to use the public definition) as
part of that change, probably out of a mistaken belief that it's not
however, on a mixed system with 32-bit and 64-bit binaries, it's
important that all files be opened with O_LARGEFILE, even if the
opening process is 64-bit, in case a descriptor is passed to a 32-bit
process. otherwise, attempts to access past 2GB in the 32-bit process
could produce EOVERFLOW.
most 64-bit archs added later got this right alread, except for
mips64. x32 was also affected. there are now fixed.
the adjustment made is entirely a function of TLS_ABOVE_TP and
TP_OFFSET. aside from avoiding repetition of the TP_OFFSET value and
arithmetic, this change makes pthread_arch.h independent of the
definition of struct __pthread from pthread_impl.h. this in turn will
allow inclusion of pthread_arch.h to be moved to the top of
pthread_impl.h so that it can influence the definition of the
previously, arch files were very inconsistent about the type used for
the thread pointer. this change unifies the new __get_tp interface to
always use uintptr_t, which is the most correct when performing
arithmetic that may involve addresses outside the actual pointed-to
object (due to TP_OFFSET).
the only part of TP_ADJ that was not uniquely determined by
TLS_ABOVE_TP was the 0x7000 adjustment used mainly on mips and powerpc
signal 7 is SIGEMT on Linux mips* ABI according to the man pages and
kernel. it's not clear where the wrong name came from but it dates
back to original mips commit.
on all mips variants, Linux did (and maybe still does) have some
syscall return paths that wrongly return both the error flag in r7 and
a negated error code in r2. in particular this happened for at least
some causes of ENOSYS.
add an extra check to only negate the error code if it's positive to
bug report and concept for patch by Andreas Dröscher.
effectivly revert commit ddc7c4f936c7a90781072f10dbaa122007e939d0
which was wrong; it caused a major regression on Linux versions prior
to 2.6.36. old kernels did not properly preserve r2 across syscall
restart, and instead restarted with the instruction right before
syscall, imposing a contract that the previous instruction must load
r2 from an immediate or a register (or memory) not clobbered by the
since other changes were made since, including removal of the struct
stat conversion that was replaced by separate struct kstat, this is
not a direct revert, only a functional one.
the "0"(r2) input constraint added back seems useless/erroneous, but
without it most gcc versions (seems to be all prior to 9.x) fail to
honor the output register binding for r2. this seems to be a variant
of gcc bug #87733. further changes should be made later if a better
workaround is found, but this one has been working since 2012. it
seems this issue was encountered but misidentified then, when it
inspired commit 4221f154ff29ab0d6be1e7beaa5ea2d1731bc58e.
the syscall numbers were reserved in v5.3 but not wired up on mips, see
linux commit 0671c5b84e9e0a6d42d22da9b5d093787ac1c5f3
MIPS: Wire up clone3 syscall
linux commit 7615d9e1780e26e0178c93c55b73309a5dc093d7
arch: wire-up pidfd_open()
linux commit 32fcb426ec001cb6d5a4a195091a8486ea77e2df
pid: add pidfd_open()
commit 4d3a162d001a93edd285fb6603a883c30ae553ba overlooked that the
mips64 reloc.h dependent on endian.h not only for setting the ABI ldso
name to match the byte order, but also for use of the byte swapping
macros. they are needed to override R_TYPE, R_SYM, and R_INFO, to
compensate for a mips "quirk" of always using big endian order for
symbol references in relocations.
part of that commit canot be reverted because the original code was
wrong: it's invalid to define _GNU_SOURCE or any feature test macro
in reloc.h, or anywhere except at the top of a source file. however,
thanks to commit 316730cdc7a330cddf288b4e5c1de5daa64e19f4, the feature
test macro is no longer needed to access the endian-swapping macros,
so simply bringing back the #include directive suffices.
now that all 32-bit archs have 64-bit time_t (and suseconds_t), the
arch-provided _Int64 macro (long or long long, as appropriate) can be
used to define them, and arch-specific definitions are no longer
these structures can now be defined generically in terms of endianness
and long size. previously, the 32-bit archs all shared a common
definition from the generic bits header, and each 64-bit arch had to
repeat the 64-bit version, with endian conditionals if the arch had
variants of each endianness.
I would prefer getting rid of the preprocessor conditionals for
padding and instead using unnamed bitfield members, like commit
9b2921bea1d5017832e1b45d1fd64220047a9802 did for struct timespec.
however, at present sendmsg, recvmsg, and recvmmsg need access to the
padding members by name to zero them. this could perhaps be cleaned up
in the future.
policy has long been that these definitions are purely a function of
whether long/pointer is 32- or 64-bit, and that they are not allowed
to vary per-arch. move the definition to the shared alltypes.h.in
fragment, using integer constant expressions in terms of sizeof to
vary the array dimensions appropriately. I'm not sure whether this is
more or less ugly than using preprocessor conditionals and two sets of
definitions here, but either way is a lot less ugly than repeating the
same thing for every arch.
LLONG_MAX is uniform for all archs we support and plenty of header and
code level logic assumes it is, so it does not make sense for limits.h
bits mechanism to pretend it's variable.
LONG_BIT can be defined in terms of LONG_MAX; there's no reason to put
it in bits.
by moving LONG_MAX definition to __LONG_MAX in alltypes.h and moving
LLONG_MAX out of bits, there are now no plain-C limits that are
defined in the bits header, so the bits header only needs to be
included in the POSIX or extended profiles. this allows the feature
test macro logic to be removed from the bits header, facilitating a
long-term goal of getting such logic out of bits.
having __LONG_MAX in alltypes.h will allow further generalization of
archs without a constant PAGESIZE no longer need bits/limits.h at all.
building on commit 97d35a552ec5b6ddf7923dd2f9a8eb973526acea,
__BYTE_ORDER is now available wherever alltypes.h is included. since
reloc.h is only used from src/internal/dynlink.h, it can be assumed
that __BYTE_ORDER is exposed. reloc.h is not permitted to be included
in other contexts, and generally, like most arch headers, lacks
inclusion guards that would allow such usage. the mips64 version
mistakenly included such guards; they are removed for consistency.
building on commit 97d35a552ec5b6ddf7923dd2f9a8eb973526acea,
__BYTE_ORDER is now available wherever alltypes.h is included.
endian.h should not be used since, in the future, it will expose
identifiers that are not in the reserved namespace for the headers
which were previously using it.
this change is motivated by the intersection of several factors.
presently, despite being a nonstandard header, endian.h is exposing
the unprefixed byte order macros and functions only if _BSD_SOURCE or
_GNU_SOURCE is defined. this is to accommodate use of endian.h from
other headers, including bits headers, which need to define structure
layout in terms of endianness. with time64 switch-over, even more
headers will need to do this.
at the same time, the resolution of Austin Group issue 162 makes
endian.h a standard header for POSIX-future, requiring that it expose
the unprefixed macros and the functions even in standards-conforming
profiles. changes to meet this new requirement would break existing
internal usage of endian.h by causing it to violate namespace where
instead, have the arch's alltypes.h define __BYTE_ORDER, either as a
fixed constant or depending on the right arch-specific predefined
macros for determining endianness. explicit literals 1234 and 4321 are
used instead of __LITTLE_ENDIAN and __BIG_ENDIAN so that there's no
danger of getting the wrong result if a macro is undefined and
implicitly evaluates to 0 at the preprocessor level.
the powerpc (32-bit) bits/endian.h being removed had logic for varying
endianness, but our powerpc arch has never supported that and has
always been big-endian-only. this logic is not carried over to the new
__BYTE_ORDER definition in alltypes.h.
now that commit f7f1079796abc6f97c69521d2334e9c7d3945dd8 removed the
legacy i386 conditional definition, va_list is in no way
arch-specific, and has no reason to be in the future. move it to the
shared part of alltypes.h.in
mips r6 (an incompatible isa from traditional mips) removes the hi and
lo registers used for mul/div results. older gcc versions accepted
them in the clobber list for asm, but their presence is incorrect and
breaks on later versions.
in the process of fixing this, the clobber list for 32-bit mips
syscalls has been deduplicated via a macro like on mips64 and n32.
new mount api syscalls were added, same numers on all targets, see
linux commit a07b20004793d8926f78d63eb5980559f7813404
vfs: syscall: Add open_tree(2) to reference or clone a mount
linux commit 2db154b3ea8e14b04fee23e3fdfd5e9d17fbc6ae
vfs: syscall: Add move_mount(2) to move mounts around
linux commit 24dcb3d90a1f67fe08c68a004af37df059d74005
vfs: syscall: Add fsopen() to prepare for superblock creation
linux commit ecdab150fddb42fe6a739335257949220033b782
vfs: syscall: Add fsconfig() for configuring and managing a context
linux commit 93766fbd2696c2c4453dd8e1070977e9cd4e6b6d
vfs: syscall: Add fsmount() to create a mount for a superblock
linux commit cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb
vfs: syscall: Add fspick() to select a superblock for reconfiguration
linux commit 9c8ad7a2ff0bfe58f019ec0abc1fb965114dde7d
uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]
linux commit d8076bdb56af5e5918376cd1573a6b0007fc1a89
uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]
without this, the SO_RCVTIMEO and SO_SNDTIMEO socket options would
stop working on pre-5.1 kernels after time_t is switched to 64-bit and
their values are changed to the new time64 versions.
new code is written such that it's statically unreachable on 64-bit
archs, and on existing 32-bit archs until the macro values are changed
to activate 64-bit time_t.
the definition of the IPC_64 macro controls the interface between libc
and the kernel through syscalls; it's not a public API. the meaning is
rather obscure. long ago, Linux's sysvipc *id_ds structures used
16-bit uids/gids and wrong types for a few other fields. this was in
the libc5 era, before glibc. the IPC_64 flag (64 is a misnomer; it's
more like 32) tells the kernel to use the modern[-ish] versions of the
the definition of IPC_64 has nothing to do with whether the arch is
32- or 64-bit. rather, due to either historical accident or
intentional obnoxiousness, the kernel only accepts and masks off the
0x100 IPC_64 flag conditional on CONFIG_ARCH_WANT_IPC_PARSE_VERSION,
i.e. for archs that want to provide, or that accidentally provided,
both. for archs which don't define this option, no masking is
performed and commands with the 0x100 bit set will fail as invalid. so
ultimately, the definition is just a matter of matching an arbitrary
switch defined per-arch in the kernel.
some of these were not exact duplicates, but had gratuitously
different naming for padding, or omitted the endian checks because the
arch is fixed-endian.
various padding fields in the generic bits/sem.h were defined in terms
of time_t as a cheap hack standing in for "kernel long", to allow x32
to use the generic version of the file. this was a really bad idea, as
it ended up getting copied into lots of arch-specific versions of the
bits file, and is a blocker to changing time_t to 64-bit on 32-bit
this commit adds an x32-specific version of the header, and changes
padding type back from time_t to long (currently the same type on all
archs but x32) in the generic header and all the others the hack got
now that we have a kstat structure decoupled from the public struct
stat, we can just use the broken kernel structures directly and let
the code in fstatat do the translation.
presently, all archs/ABIs have struct stat matching the kernel
stat type, except mips/mipsn32/mips64 which do conversion hacks in
syscall_arch.h to work around bugs in the kernel type. this patch
completely decouples them and adds a translation step to the success
path of fstatat. at present, this is just a gratuitous copying, but it
opens up multiple possibilities for future support for 64-bit time_t
on 32-bit archs and for cleaned-up/unified ABIs.
for clarity, the mips hacks are not yet removed in this commit, so the
mips kstat structs still correspond to the output of the hacks in
their syscall_arch.h files, not the raw kernel type. a subsequent
commit will fix this.
these were overlooked during review. bits headers are not allowed to
pull in additional headers (note: that rule is currently broken in
other places but just for endian.h). string.h has no place here
anyway, and including bits/alltypes.h without defining macros to
request types from it is a nop.
ever since inline syscalls were added for (o32) mips in commit
328810d32524e4928fec50b57e37e1bf330b2e40, the asm has nonsensically
loaded the syscall number, rather than taking $2 as an input
constraint to let the compiler load it. commit
cfc09b1ecf0c6981494fd73dffe234416f66af10 improved on this somewhat by
allowing a constant syscall number to propagate into an immediate, but
missed that the whole operation made no sense.
now, only $4, $5, $6, $8, and $9 are potential input-only registers.
$2 is always input and output, and $7 is both when it's an argument,
otherwise output-only. previously, $7 was treated as an input (with a
"1" constraint matching its output position) even when it was not an
input, which was arguably undefined behavior (asm input from
indeterminate value). this is corrected.
this patch is not purely non-functional changes, since before, $8 and
$9 were wrongly in the clobberlist for syscalls with fewer than 5 or 6
arguments. of course it's impossible for syscalls to have different
clobbers depending on their number of arguments. the clobberlist for
the recently-added 5- and 6-argument forms was correct, and for the 0-
to 4-argument forms was erroneously copied from the mips o32 ABI where
the additional arguments had to be passed on the stack.
in making this change, I reviewed the kernel sources, and $8 and $9
are always saved for 64-bit kernels since they're part of the syscall
argument list for n32 and n64 ABIs.
Commit 3517d74a5e04a377192d1f4882ad6c8dc22ce69a changed the token in
sys/ioctl.h from 0x01 to 1, so bits/termios.h no longer matches. Revert
the bits/termios.h change to keep the headers in sync.
This reverts commit 9eda4dc69c33852c97c6f69176bf45ffc80b522f.
syscall numbers are now synced up across targets (starting from 403 the
numbers are the same on all targets other than an arch specific offset)
IPC syscalls sem*, shm*, msg* got added where they were missing (except
for semop: only semtimedop got added), the new semctl, shmctl, msgctl
imply IPC_64, see
linux commit 0d6040d4681735dfc47565de288525de405a5c99
arch: add split IPC system calls where needed
new 64bit time_t syscall variants got added on 32bit targets, see
linux commit 48166e6ea47d23984f0b481ca199250e1ce0730a
y2038: add 64-bit time_t syscalls to all 32-bit architectures
new async io syscalls got added, see
linux commit 2b188cc1bb857a9d4701ae59aa7768b5124e262e
Add io_uring IO interface
linux commit edafccee56ff31678a091ddb7219aba9b28bc3cb
io_uring: add support for pre-mapped user IO buffers
a new syscall got added that uses the fd of /proc/<pid> as a stable
handle for processes: allows sending signals without pid reuse issues,
intended to eventually replace rt_sigqueueinfo, kill, tgkill and
linux commit 3eb39f47934f9d5a3027fe00d906a45fe3a15fad
signal: add pidfd_send_signal() syscall
on some targets (arm, m68k, s390x, sh) some previously missing syscall
numbers got added as well.
the numbers added in
add io_pgetevents and rseq syscall numbers from linux v4.18
n32 and n64 ABIs add new argument registers vs o32, so that passing on
the stack is not necessary, so it's not clear why the 5- and
6-argument versions were special-cased to begin with; it seems to have
been pattern-copying from arch/mips (o32).
i've treated the new argument registers like the first 4 in terms of
clobber status (non-clobbered). hopefully this is correct.
io_pgetevents is new in linux commit
rseq is new in linux commit
this will allow the compiler to cache and reuse the result, meaning we
no longer have to take care not to load it more than once for the sake
of archs where the load may be expensive.
depends on commit 1c84c99913bf1cd47b866ed31e665848a0da84a2 for
correctness, since otherwise the compiler could hoist loads during
stage 3 of dynamic linking before the initial thread-pointer setup.
these were overlooked in the declarations overhaul work because they
are not properly declared, and the current framework even allows their
declared types to vary by arch. at some point this should be cleaned
up, but I'm not sure what the right way would be.
sys/ptrace.h is target specific, use bits/ptrace.h to add target
specific macro definitions.
these macros are kept in the generic sys/ptrace.h even though some
targets don't support them:
so no macro definition got removed in this patch on any target. only
s390x has a numerically conflicting macro definition (PTRACE_SINGLEBLOCK).
the PT_ aliases follow glibc headers, otherwise the definitions come
from linux uapi headers except ones that are skipped in glibc and
there is no real kernel support (s390x PTRACE_*_AREA) or need special
type definitions (mips PTRACE_*_WATCH_*) or only relevant for linux
2.4 compatibility (PTRACE_OLDSETOPTIONS).
adapted from patch by Matthias Schiffer.
new in linux commit 256211f2b0b251e532d1899b115e374feb16fa7a
In TLS variant I the TLS is above TP (or above a fixed offset from TP)
but on some targets there is a reserved gap above TP before TLS starts.
This matters for the local-exec tls access model when the offsets of
TLS variables from the TP are hard coded by the linker into the
executable, so the libc must compute these offsets the same way as the
linker. The tls offset of the main module has to be
If there is no TLS in the main module then the gap can be ignored
since musl does not use it and the tls access models of shared
libraries are not affected.
The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0
(i.e. TLS did not require large alignment) because the gap was
treated as a fixed offset from TP. Now the TP points at the end
of the pthread struct (which is aligned) and there is a gap above
it (which may also need alignment).
The fix required changing TP_ADJ and __pthread_self on affected
targets (aarch64, arm and sh) and in the tlsdesc asm the offset to
access the dtv changed too.