Age | Commit message (Collapse) | Author | Lines |
|
these are presently extensions, thus named with _np to match glibc and
other implementations that provide them; however they are likely to be
standardized in the future without the _np suffix as a result of
Austin Group issue 1208. if so, both names will be kept as aliases.
|
|
as reported by Tavian Barnes, a dup2 file action for the internal pipe
fd used by posix_spawn could cause it to remain open after execve and
allow the child to write an artificial error into it, confusing the
parent. POSIX allows internal use of file descriptors by the
implementation, with undefined behavior for poking at them, so this is
not a conformance problem, but it seems preferable to diagnose and
prevent the error when we can do so easily.
catch attempts to apply a dup2 action to the internal pipe fd and
emulate EBADF for it instead.
|
|
libc.h was intended to be a header for access to global libc state and
related interfaces, but ended up included all over the place because
it was the way to get the weak_alias macro. most of the inclusions
removed here are places where weak_alias was needed. a few were
recently introduced for hidden. some go all the way back to when
libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented)
cancellation points had to include it.
remaining spurious users are mostly callers of the LOCK/UNLOCK macros
and files that use the LFS64 macro to define the awful *64 aliases.
in a few places, new inclusion of libc.h is added because several
internal headers no longer implicitly include libc.h.
declarations for __lockfile and __unlockfile are moved from libc.h to
stdio_impl.h so that the latter does not need libc.h. putting them in
libc.h made no sense at all, since the macros in stdio_impl.h are
needed to use them correctly anyway.
|
|
commits leading up to this one have moved the vast majority of
libc-internal interface declarations to appropriate internal headers,
allowing them to be type-checked and setting the stage to limit their
visibility. the ones that have not yet been moved are mostly
namespace-protected aliases for standard/public interfaces, which
exist to facilitate implementing plain C functions in terms of POSIX
functionality, or C or POSIX functionality in terms of extensions that
are not standardized. some don't quite fit this description, but are
"internally public" interfacs between subsystems of libc.
rather than create a number of newly-named headers to declare these
functions, and having to add explicit include directives for them to
every source file where they're needed, I have introduced a method of
wrapping the corresponding public headers.
parallel to the public headers in $(srcdir)/include, we now have
wrappers in $(srcdir)/src/include that come earlier in the include
path order. they include the public header they're wrapping, then add
declarations for namespace-protected versions of the same interfaces
and any "internally public" interfaces for the subsystem they
correspond to.
along these lines, the wrapper for features.h is now responsible for
the definition of the hidden, weak, and weak_alias macros. this means
source files will no longer need to include any special headers to
access these features.
over time, it is my expectation that the scope of what is "internally
public" will expand, reducing the number of source files which need to
include *_impl.h and related headers down to those which are actually
implementing the corresponding subsystems, not just using them.
|
|
previously, a common __posix_spawnx backend was used that accepted an
additional argument for the execve variant to call in the child. this
moderately bloated up the posix_spawn function, shuffling arguments
between stack and/or registers to call a 7-argument function from a
6-argument one.
instead, tuck the exec function pointer in an unused part of the
(large) pthread_spawnattr_t structure, and have posix_spawnp duplicate
the attributes and fill in a pointer to __execvpe. the net code size
change is minimal, but the weight is shifted to the "heavier" function
which already pulls in more dependencies.
as a bonus, we get rid of an external symbol (__posix_spawnx) that had
no really good place for a declaration because it shouldn't have
existed to begin with.
|
|
the resolution to Austin Group issue #411 defined new semantics for
the posix_spawn dup2 file action in the (previously useless) case
where src and dest fd are equal. future issues will require the dup2
file action to remove the close-on-exec flag. without this change,
passing fds to a child with posix_spawn while avoiding fd-leak races
in a multithreaded parent required a complex dance with temporary fds.
based on patch by Petr Skocik. changes were made to preserve the
80-column formatting of the function and to remove code that became
unreachable as a result of the new functionality.
|
|
execvpe stack-allocates a buffer used to hold the full path
(combination of a PATH entry and the program name)
while searching through $PATH, so at least
NAME_MAX+PATH_MAX is needed.
The stack size can be made conditionally smaller
(the current 1024 appears appropriate)
should this larger size be burdensome in those situations.
|
|
this functionality has been adopted for inclusion in the next issue of
POSIX as the result of Austin Group issue #1044.
based on patch by Daurnimator.
|
|
the write function is a cancellation point and accesses thread-local
state belonging to the calling thread in the parent process. since
cancellation is blocked for the duration of posix_spawn, this is
probably safe, but it's fragile and unnecessary. making the syscall
directly is just as easy and clearly safe.
|
|
the resolution of austin group issue #370 removes the requirement that
posix_spawn fail when the close file action is performed on an
already-closed fd. since there are no other meaningful errors for
close, just ignoring the return value completely is the simplest fix.
|
|
|
|
such archs are expected to omit definitions of the SYS_* macros for
syscalls their kernels lack from arch/$ARCH/bits/syscall.h. the
preprocessor is then able to select the an appropriate implementation
for affected functions. two basic strategies are used on a
case-by-case basis:
where the old syscalls correspond to deprecated library-level
functions, the deprecated functions have been converted to wrappers
for the modern function, and the modern function has fallback code
(omitted at the preprocessor level on new archs) to make use of the
old syscalls if the new syscall fails with ENOSYS. this also improves
functionality on older kernels and eliminates the incentive to program
with deprecated library-level functions for the sake of compatibility
with older kernels.
in other situations where the old syscalls correspond to library-level
functions which are not deprecated but merely lack some new features,
such as the *at functions, the old syscalls are still used on archs
which support them. this may change at some point in the future if or
when fallback code is added to the new functions to make them usable
(possibly with reduced functionality) on old kernels.
|
|
open is handled specially because it is used from so many places, in
so many variants (2 or 3 arguments, setting errno or not, and
cancellable or not). trying to do it as a function would not only
increase bloat, but would also risk subtle breakage.
this is the first step towards supporting "new" archs where linux
lacks "old" syscalls.
|
|
this is a requirement in the specification that was overlooked.
|
|
the trick here is that sigaction can track for us which signals have
ever had a signal handler set for them, and only those signals need to
be considered for reset. this tracking mask may have false positives,
since it is impossible to remove bits from it without race conditions.
false negatives are not possible since the mask is updated with atomic
operations prior to making the sigaction syscall.
implementation-internal signals are set to SIG_IGN rather than SIG_DFL
so that a signal raised in the parent (e.g. calling pthread_cancel on
the thread executing pthread_spawn) does not have any chance make it
to the child, where it would cause spurious termination by signal.
this change reduces the minimum/typical number of syscalls in the
child from around 70 to 4 (including execve). this should greatly
improve the performance of posix_spawn and other interfaces which use
it (popen and system).
to facilitate these changes, sigismember is also changed to return 0
rather than -1 for invalid signals, and to return the actual status of
implementation-internal signals. POSIX allows but does not require an
error on invalid signal numbers, and in fact returning an error tends
to confuse applications which wrongly assume the return value of
sigismember is boolean.
|
|
failures prior to the exec attempt were reported correctly, but on
exec failure, the return value contained junk.
|
|
this is both a minor scheduling optimization and a workaround for a
difficult-to-fix bug in qemu app-level emulation.
from the scheduling standpoint, it makes no sense to schedule the
parent thread again until the child has exec'd or exited, since the
parent will immediately block again waiting for it.
on the qemu side, as regular application code running on an underlying
libc, qemu cannot make arbitrary clone syscalls itself without
confusing the underlying implementation. instead, it breaks them down
into either fork-like or pthread_create-like cases. it was treating
the code in posix_spawn as pthread_create-like, due to CLONE_VM, which
caused horribly wrong behavior: CLONE_FILES broke the synchronization
mechanism, CLONE_SIGHAND broke the parent's signals, and CLONE_THREAD
caused the child's exec to end the parent -- if it hadn't already
crashed. however, qemu special-cases CLONE_VFORK and emulates that
with fork, even when CLONE_VM is also specified. this also gives
incorrect semantics for code that really needs the memory sharing, but
posix_spawn does not make use of the vm sharing except to avoid
momentary double commit charge.
programs using posix_spawn (including via popen) should now work
correctly under qemu app-level emulation.
|
|
for the duration of the vm-sharing clone used by posix_spawn, all
signals are blocked in the parent process, including
implementation-internal signals. since __synccall cannot do anything
until successfully signaling all threads, the fact that signals are
blocked automatically yields the necessary safety.
aside from debloating and general simplification, part of the
motivation for removing the explicit lock is to simplify the
synchronization logic of __synccall in hopes that it can be made
async-signal-safe, which is needed to make setuid and setgid, which
depend on __synccall, conform to the standard. whether this will be
possible remains to be seen.
|
|
read should never return anything but 0 or sizeof ec here, but if it
does, we want to treat any other return as "success". then the caller
will get back the pid and is responsible for waiting on it when it
immediately exits.
|
|
the proposed change was described in detail in detail previously on
the mailing list. in short, vfork is unsafe because:
1. the compiler could make optimizations that cause the child to
clobber the parent's local vars.
2. strace is buggy and allows the vforking parent to run before the
child execs when run under strace.
the new design uses a close-on-exec pipe instead of vfork semantics to
synchronize the parent and child so that the parent does not return
before the child has finished using its arguments (and now, also its
stack). this also allows reporting exec failures to the caller instead
of giving the caller a child that mysteriously exits with status 127
on exec error.
basic testing has been performed on both the success and failure code
paths. further testing should be done.
|
|
__release_ptc() is only valid in the parent; if it's performed in the
child, the lock will be unlocked early then double-unlocked later,
corrupting the lock state.
|
|
|
|
since we target systems without overcommit, special care should be
taken that system() and popen(), like posix_spawn(), do not fail in
processes whose commit charges are too high to allow ordinary forking.
this in turn requires special precautions to ensure that the parent
process's signal handlers do not end up running in the shared-memory
child, where they could corrupt the state of the parent process.
popen has also been updated to use pipe2, so it does not have a
fd-leak race in multi-threaded programs. since pipe2 is missing on
older kernels, (non-atomic) emulation has been added.
some silly bugs in the old code should be gone too.
|
|
usage of vfork creates a situation where a process of lower privilege
may momentarily have write access to the memory of a process of higher
privilege.
consider the case of a multi-threaded suid program which is calling
posix_spawn in one thread while another thread drops the elevated
privileges then runs untrusted (relative to the elevated privilege)
code as the original invoking user. this untrusted code can then
potentially modify the data the child process will use before calling
exec, for example changing the pathname or arguments that will be
passed to exec.
note that if vfork is implemented as fork, the lock will not be held
until the child execs, but since memory is not shared it does not
matter.
|
|
vfork is implemented as the fork syscall (with no atfork handlers run)
on archs where it is not available, so this change does not introduce
any change in behavior or regression for such archs.
|
|
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
|
|
|
|
|
|
|
|
|
|
file actions are not yet implemented, but everything else should be
mostly complete and roughly correct.
|