Age | Commit message (Collapse) | Author | Lines |
|
this commit should make no codegen change for existing archs, but is a
prerequisite for new archs including riscv32. the wait4 emulation
backend provides both cancellable and non-cancellable variants because
waitpid is required to be a cancellation point, but all of our other
uses are not, and most of them cannot be.
based on patch by Stefan O'Rear.
|
|
commit 0a05eace163cee9b08571d2ff9d90f5e82d9c228 implemented AT_EACCESS
for faccessat with a horrible hack, creating a child process to change
switch uid/gid and perform the access probe without making potentially
irreversible changes to the caller's credentials. this was due to the
syscall lacking a flags argument.
linux 5.8 introduced a new syscall, SYS_faccessat2, fixing this
deficiency. use it if any flags are passed, and fallback to the old
strategy on ENOSYS. continue using the old syscall when there are no
flags.
|
|
commit f9fb20b42da0e755d93de229a5a737d79a0e8f60 switched from using a
pipe for the result to conveying it via the child process exit status.
Alexander Monakov pointed out that the latter could fail if the
application is not expecting faccessat to produce a child and performs
a wait operation with __WCLONE or __WALL, and that it is not clear
whether it's guaranteed to work when SIGCHLD's disposition has been
set to SIG_IGN.
in addition, that commit introduced a bug that caused EACCES to be
produced instead of EBUSY due to an exit path that was overlooked when
the error channel was changed, and introduced a spurious retry loop
around the wait operation.
|
|
now that we're waiting for the exit status of the child process, the
result can be conveyed in the exit status rather than via a pipe.
since the error value might not fit in 7 bits, a table is used to
translate possible meaningful error values to small integers.
|
|
I mistakenly assumed that clone without a signal produced processes
that would not become zombies; however, waitpid with __WCLONE is
required to release their pids.
|
|
as usual, this is needed to avoid fd leaks. as a better solution, the
use of fds could possibly be replaced with mmap and a futex.
|
|
this fixes an issue reported by Daniel Thau whereby faccessat with the
AT_EACCESS flag did not work in cases where the process is running
suid or sgid but without root privileges. per POSIX, when the process
does not have "appropriate privileges", setuid changes the euid, not
the real uid, and the target uid must be equal to the current real or
saved uid; if this condition is not met, EPERM results. this caused
the faccessat child process to fail.
using the setreuid syscall rather than setuid works. POSIX leaves it
unspecified whether setreuid can set the real user id to the effective
user id on processes without "appropriate privileges", but Linux
allows this; if it's not allowed, there would be no way for this
function to work.
|
|
clone will pass the return value of the start function to SYS_exit
anyway; there's no need to call the syscall directly.
|
|
the child process's stack may be insufficient size to support a signal
frame, and there is no reason these signal handlers should run in the
child anyway.
|
|
this is another case of the kernel syscall failing to support flags
where it needs to, leading to horrible workarounds in userspace. this
time the workaround requires changing uid/gid, and that's not safe to
do in the current process. in the worst case, kernel resource limits
might prevent recovering the original values, and then there would be
no way to safely return. so, use the safe but horribly inefficient
alternative: forking. clone is used instead of fork to suppress
signals from the child.
fortunately this worst-case code is only needed when effective and
real ids mismatch, which mainly happens in suid programs.
|
|
|
|
|