musl - musl - an implementation of the standard library for Linux-based systems

Age	Commit message (Collapse)	Author	Lines
2023-02-28	getservbyport_r: fix wrong result if getnameinfo fails with EAI_OVERFLOW	Alexey Izbyshev	-0/+2
	EAI_OVERFLOW should be propagated as ERANGE to inform the caller about the need to expand the buffer.
2023-02-28	getservbyport_r: fix out-of-bounds buffer read	Alexey Izbyshev	-1/+1
	If the buffer passed to getservbyport_r is just enough to store two pointers after aligning it, getnameinfo is called with buflen == 0 (which means that service name is not needed) and trivially succeeds. Then, strtol is called on the address just past the buffer end, and if it doesn't happen to find the port number there, getservbyport_r spuriously succeeds and returns the same bad address to the caller. Fix this by ensuring that buflen is at least 1 when passed to getnameinfo.
2023-02-28	getifaddrs: fix UB via taking address of null pointer union dereference	Alexey Izbyshev	-7/+7
	getifaddrs computes &ctx->first->ifa even if ctx->first is NULL. While this shouldn't be possible on the success path because the loopback interface is hardcoded into the kernel, this is still possible on the error path (for example, if __rtnetlink_enumerate couldn't create a socket due to exceeding the fd limit).
2023-02-28	accept4: don't fall back to accept if we got unknown flags	Alexey Izbyshev	-0/+4
	accept4 emulation via accept ignores unknown flags, so it can spuriously succeed instead of failing (or succeed without doing the action implied by an unknown flag if it's added in a future kernel). Worse, unknown flags trigger the fallback code even on modern kernels if the real accept4 syscall returns EINVAL, because this is indistinguishable from socketcall returning EINVAL due to lack of accept4 support. Fix this by always failing with EINVAL if unknown flags are present and the syscall is missing or failed with EINVAL.
2023-02-27	fix potential read past end of buffer in getnameinfo host name lookup	Alexey Izbyshev	-0/+1
	This is completely analoguous to commit 633183b5d1c2. Similar code called from __lookup_name is not affected because it checks that the line contains the host name surrounded by blanks.
2023-02-27	dns: fix workaround for systems defaulting to ipv6-only sockets	Alexey Izbyshev	-15/+16
	When IPv6 nameservers are present, __res_msend_rc attempts to disable IPV6_V6ONLY socket option to ensure that it can communicate with IPv4 nameservers (if they are present too) via IPv4-mapped IPv6 addresses. However, this option can't be disabled on bound sockets, so setsockopt always fails.
2023-02-27	dns: handle early eof in tcp fallback	Alexey Izbyshev	-1/+1
	A zero returned from recvmsg is currently treated as if some data were received, so if a DNS server closes its TCP socket before sending the full answer, __res_msend_rc will spin until the timeout elapses because POLLIN event will be reported on each poll. Fix this by treating an early EOF as an error.
2023-02-27	prevent CNAME/PTR parsing from reading data past the response end	Alexey Izbyshev	-7/+7
	DNS parsing callbacks pass the response buffer end instead of the actual response end to dn_expand, so a malformed DNS response can use message compression to make dn_expand jump past the response end and attempt to parse uninitialized parts of that buffer, which might succeed and return garbage.
2023-02-27	fix out-of-bounds reads in __dns_parse	Alexey Izbyshev	-3/+3
	There are several issues with range checks in this function: * The question section parsing loop can read up to two out-of-bounds bytes before doing the range check and bailing out. * The answer section parsing loop, in addition to the same issue as above, uses the wrong length in the range check that doesn't prevent OOB reads when computing len later. * The len range check before calling the callback is off by 10. Also, p+len can overflow in a (probably theoretical) case when p is within 2^16 from UINTPTR_MAX. Because __dns_parse is used only with stack-allocated buffers, such small overreads can't result in a segfault. The first two also don't affect the function result, but the last one may result in getaddrinfo incorrectly succeeding and returning up to 10 bytes past the response buffer as a part of the IP address, and in (canon) name returned by getaddrinfo/getnameinfo being affected by memory past the response buffer (because dn_expand might interpret it as a pointer).
2023-02-12	dns: prefer monotonic clock for timeouts	A. Wilcox	-1/+2
	Before this commit, DNS timeouts always used CLOCK_REALTIME, which could produce spurious timeouts or delays if wall time changed for whatever reason. Now we try CLOCK_MONOTONIC and only fall back to CLOCK_REALTIME when it is unavailable.
2023-02-12	fix return value of wcs{,n}cmp for extreme wchar_t values	Gabriel Ravier	-2/+2
	As a result of using simple subtraction to implement the return values for wcscmp and wcsncmp, integer overflow can occur (producing undefined behavior, and in practice, a wrong comparison result). This does not occur for meaningful character values (21-bit range) but the functions are specified to work on arbitrary wchar_t arrays. This patch replaces the subtraction with a little bit of code that orders the characters correctly, returning -1 if the character from the first string is smaller than the one from the second, 0 if they are equal and 1 if the character from the first string is larger than the one from the second.
2023-02-12	math: fix undefined shift in logf	Szabolcs Nagy	-1/+1
	A signed int shift overflowed when computing a constant mask, use hex literal instead. This is unlikely to cause actual issues unless the code was compiled with ubsan or similar instrumentation specifically to catch this. The stripped libc.so is unchanged on x86_64. Reported by q66 on irc.
2023-02-12	inet_pton: fix uninitialized memory use for IPv4-mapped IPv6 addresses	Alexey Izbyshev	-0/+1
	When a dot is encountered, the loop counter is incremented before exiting the loop, but the corresponding ip array element is left uninitialized, so the subsequent memmove (if "::" was seen) and the loop copying ip to the output buffer will operate on an uninitialized uint16_t. The uninitialized data never directly influences the control flow and is overwritten on successful return by the second half of the parsed IPv4 address. But it's better to fix this to avoid unexpected transformations by a sufficiently smart compiler and reports from UB-detection tools.
2023-02-12	hsearch: fix null pointer arithmetic UB	Szabolcs Nagy	-2/+2
	htab->__tab->entries pointer may be 0 so delay using it in arithmetics. this did not cause any known issue other than with ubsan instrumentation.
2023-02-12	increase sendmsg internal buffer to support SCM_MAX_FD	Colin Cross	-2/+5
	The kernel defines a limit on the number of fds that can be passed through an SCM_RIGHTS ancillary message as SCM_MAX_FD. The value was 255 before kernel 2.6.38 (after that it is 253), and an SCM_RIGHTS ancillary message with 255 fds requires 1040 bytes, slightly more than the current 1024 byte internal buffer in sendmsg. 1024 is an arbitrary size, so increase it to match the the arbitrary size limit in the kernel. This fixes tests that are verifying they support up to SCM_MAX_FD fds.
2023-02-12	mq_notify: block all (application) signals in the worker thread	Rich Felker	-0/+5
	until the mq notification event arrives, it is mandatory that signals be blocked. otherwise, a signal can be received, and its handler executed, in a thread which does not yet exist on the abstract machine. after the point of the event arriving, having signals blocked is not a conformance requirement but a QoI requirement. while the application can unblock any signals it wants unblocked in the event handler thread, if they did not start out blocked, it could not block them without a race window where they are momentarily unblocked, and this would preclude controlled delivery or other forms of acceptance (sigwait, etc.) anywhere in the application.
2023-02-12	mq_notify: join worker thread before returning in error path	Rich Felker	-2/+5
	this avoids leaving behind transient resource consumption whose cleanup is subject to scheduling behavior.
2023-02-12	mq_notify: rework to fix use-after-close/double-close bugs	Rich Felker	-8/+15
	in the error path where the mq_notify syscall fails, the initiating thread may have closed the socket before the worker thread calls recv on it. even in the absence of such a race, if the recv call failed, e.g. due to seccomp policy blocking it, the worker thread could proceed to close, producing a double-close condition. this can all be simplified by moving the mq_notify syscall into the new thread, so that the error case does not require pthread_cancel. now, the initiating thread only needs to read back the error status after waiting for the worker thread to consume its arguments.
2023-02-11	mq_notify: use semaphore instead of barrier to sync args consumption	Rich Felker	-5/+9
	semaphores are a much lighter primitive, and more idiomatic with current usage in the code base.
2023-02-11	fix pthread_detach inadvertently acting as cancellation point in race case	Rich Felker	-2/+6
	disabling cancellation around the pthread_join call seems to be the safest and logically simplest fix. i believe it would also be possible to just perform the unmap directly here after __tl_sync, removing the dependency on pthread_join, but such an approach duplicately encodes a lot more implementation assumptions.
2023-02-11	powerpc-sf longjmp clobbering of val argument	Rich Felker	-4/+4
	the logic to check hwcap for SPE register file inadvertently clobbered the val argument before use. switch to a different work register so this doesn't happen.
2023-02-09	riscv64: add vfork	Pedro Falcato	-0/+12
	Implement vfork() using clone(CLONE_VM \| CLONE_VFORK \| ...).
2023-02-09	fix wrong sigaction syscall ABI on mips*, or1k, microblaze, riscv64	Rich Felker	-38/+9
	we wrongly defined a dummy SA_RESTORER flag on these archs, despite the kernel interface not actually having such a feature. on archs which lack SA_RESTORER, the kernel sigaction structure also lacks the restorer function pointer member, which means the signal mask appears at a different offset. the kernel was thereby interpreting the bits of the code address as part of the signal set to be masked while handling the signal. this patch removes the erroneous SA_RESTORER definitions from archs which do not have it, makes access to the member conditional on whether SA_RESTORER is defined for the arch, and removes the now-unused asm for the affected archs. because there are reportedly versions of qemu-user which also use the wrong ABI here, the old ksigaction struct size is preserved with an unused member at the end. this is harmless and mitigates the risk of such a bug turning into a buffer overflow onto the sigaction function's stack.
2023-01-18	fix debugger tracking of shared libraries on mips with PIE main program	Rich Felker	-0/+4
	mips has its own mechanisms for DT_DEBUG because it makes _DYNAMIC read-only, and the original mechanism, DT_MIPS_RLD_MAP, was PIE-incompatible. DT_MIPS_RLD_MAP_REL was added to remedy this, but we never implemented support for it. add it now using the same idioms for mips-specific ldso logic.
2022-12-17	use libc-internal malloc for pthread_atfork	Rich Felker	-0/+5
	while no lock is held here making it a lock-order issue, replacement malloc is likely to want to use pthread_atfork, possibly making the call to malloc infinitely recursive. even if not, there is no reason to prefer an application-provided malloc here.
2022-12-14	prevent invalid reads of nl_arg in printf_core	Markus Wichmann	-6/+8
	printf_core() runs twice, and during its first run, nl_arg is uninitialized and must not be read. It gets initialized at the end of the first run. Conversely, nl_type does not need to be set during the second run, as its useful life has ended at that point, since the only time it is read is during that exact same initialization. Therefore we can simply alternate the assignments. p and w do still need to get values assigned to them, since at least one line in the same if-statement depends on that, but they can be dummy values. arg does not need to be assigned, since in the first run, we encounter a continue statement before using the argument.
2022-12-13	semaphores: fix missed wakes from ABA bug in waiter count logic	Rich Felker	-12/+19
	because the has-waiters state in the semaphore value futex word is only representable when the value is zero (the special value -1 represents "0 with potential new waiters"), it's lost if intervening operations make the semaphore value positive again. this creates an ABA issue in sem_post, whereby the post uses a stale waiters count rather than re-evaluating it, skipping the futex wake if the stale count was zero. the fix here is based on a proposal by Alexey Izbyshev, with minor changes to eliminate costly new spurious wake syscalls. the basic idea is to replace the special value -1 with a sticky waiters bit (repurposing the sign bit) preserved under both wait and post. any post that takes place with the waiters bit set will perform a futex wake. to be useful, the waiters bit needs to be removable, and to remove it safely, we perform a broadcast wake instead of a normal single-task wake whenever removing the bit. this lets any un-accounted-for waiters wake and re-add the waiters bit if they still need it. there are multiple possible choices for when to perform this broadcast, but the optimal choice seems to be doing it whenever the observed waiters count is less than two (semantically, this means exactly one, but we might see a stale count of zero). in this case, the expected number of threads to be woken is one, with exactly the same cost as a non-broadcast wake.
2022-11-12	pthread_atfork: fix return value on malloc failure	Alexey Izbyshev	-1/+2
	POSIX requires pthread_atfork to report errors via its return value, not via errno. The only specified error is ENOMEM.
2022-11-07	fix strverscmp comparison of digit sequence with non-digits	Rich Felker	-3/+3
	the rule that longest digit sequence not beginning with a zero is greater only applies when both sequences being compared are non-degenerate. this is spelled out explicitly in the man page, which may be deemed authoritative for this nonstandard function: "If one or both of these is empty, then return what strcmp(3) would have returned..." we were wrongly treating any sequence of digits not beginning with a zero as greater than a non-digit in the other string.
2022-11-05	fix async thread cancellation stack alignment	Rich Felker	-1/+6
	if async cancellation is enabled and acted upon, the stack pointer is not necessarily pointing to a __syscall_cp_asm stack frame. the contents of the stack being wrong don't really matter, but if the stack pointer is not suitably aligned, the procedure call ABI is violated when calling back into C code via __cancel, and pthread_exit, cancellation cleanup handlers, TSD destructors, etc. may malfunction or crash. for the async cancel case, just call __cancel directly like we did prior to commit 102f6a01e249ce4495f1119ae6d963a2a4a53ce5. restore the signal mask prior to doing this since the cancellation handler runs with all signals blocked.
2022-10-20	fix return value of gethostby{name[2],addr} with no result but no error	Rich Felker	-2/+2
	commit f081d5336a80b68d3e1bed789cc373c5c3d6699b fixed gethostbyname[2]_r to treat negative results as a non-error, leaving gethostbyname[2] wrongly returning a pointer to the unfilled result buffer rather than a null pointer. since, as documented with commit fe82bb9b921be34370e6b71a1c6f062c20999ae0, the caller of gethostby{name[2],addr}_r can always rely on the result pointer being set, use that consistently rather than trying to duplicate logic about whether we have a result or not in gethostby{name[2],addr}.
2022-10-19	clean up dns_parse_callback	Rich Felker	-13/+13
	the only functional change here should be that MAXADDRS is only checked for RRs that provide address results, so that a CNAME which appears after an excessive number of address RRs does not get ignored. I'm not aware of any servers that order the RRs this way, and it may even be forbidden to do so, but I prefer having the callback logic not be order dependent. other than that, the motivation for this change is that the A and AAAA cases were mostly duplicate code that could be combined as a single code path.
2022-10-19	dns response handling: don't treat too many addresses as an error	Rich Felker	-1/+1
	returning -1 rather than 0 from the parse function causes __dns_parse to bail out and return an error. presently, name_from_dns does not check the return value anyway, so this does not matter, but if it ever started treating this as an error, lookups with large numbers of addresses would break. this is a consequence of adding TCP support and extending the buffer size used in name_from_dns.
2022-10-19	dns response handling: ignore presence of wrong-type RRs	Rich Felker	-2/+8
	reportedly there is nameserver software with question-rewriting "functionality" which gives A answers when AAAA is queried. since we made no effort to validate that the answer RR type actually corresponds to the question asked, it was possible (depending on flags, etc.) for these answers to leak through, which the caller might not be prepared for. indeed, our implementation of gethostbyname2_r makes an assumption that the resulting addresses are in the family requested, and will misinterpret the results if they don't. commit 45ca5d3fcb6f874bf5ba55d0e9651cef68515395 already noted in fixing CVE-2017-15650 that this could happen, but did nothing to validate that the RR type of the answer matches the question; it just enforced the limit on number of results to preclude overflow. presently, name_from_dns ignores the return value of __dns_parse, so it doesn't really matter whether we return 0 (ignoring the RR) or -1 (parse-ending error) upon encountering the mismatched RR. if that ever changes, though, ignoring irrelevant answer RRs sounds like the semantically correct thing to do, so for now let's return 0 from the callback when this happens.
2022-10-19	fix missing synchronization of pthread TSD keys with MT-fork	Rich Felker	-0/+12
	commit 167390f05564e0a4d3fcb4329377fd7743267560 seems to have overlooked the presence of a lock here, probably because it was one of the exceptions not using LOCK() but a rwlock. as such, it can't be added to the generic table of locks to take, so add an explicit atfork function for the pthread keys table. the order it is called does not particularly matter since nothing else in libc but pthread_exit interacts with keys.
2022-10-19	fgets: avoid arithmetic overflow when n==INT_MIN is passed	Rich Felker	-2/+3
	performing n-- is not a safe operation for arbitrary signed input n. only perform the decrement in the code path where the initial n is greater than 1, and adjust the condition in the n<=1 code path to compensate for it not having been decremented.
2022-10-19	fix AS-safety of close when aio is in use and fd map is expanded	Rich Felker	-0/+6
	the aio operations that lead to calling __aio_get_queue with the possibility to expand the fd map are not AS-safe, but if they are interrupted by a signal handler, the signal handler may call close, which is required to be AS-safe. due to __aio_get_queue taking the write lock without blocking signals, such a call to close from a signal handler could deadlock. change __aio_get_queue to block signals if it needs to obtain a write lock, and restore when finished.
2022-10-19	fix use of uninitialized dummy_fut in aio_suspend	Alexey Izbyshev	-1/+1
	aio_suspend waits on a dummy futex in the corner case when the array of requests contains NULL pointers only. But the value of this futex was left uninitialized, so if it happens to be non-zero, aio_suspend degrades to spinning instead of blocking.
2022-10-19	fix potential deadlock between multithreaded fork and aio	Rich Felker	-4/+21
	as reported by Alexey Izbyshev, there is a lock order inversion deadlock between the malloc lock and aio maplock at MT-fork time: _Fork attempts to take the aio maplock while fork already has the malloc lock, but a concurrent aio operation holding the maplock may attempt to allocate memory. move the __aio_atfork calls in the parent from _Fork to fork, and reorder the lock before most other locks, since nothing else depends on aio(). this leaves us with the possibility that the child will not be able to obtain the read lock, if _Fork is used directly and happens concurrent with an aio operation. however, in that case, the child context is an async signal context that cannot call any further aio functions, so all we need is to ensure that close does not attempt to perform any aio cancellation. this can be achieved just by nulling out the map pointer. () even if other functions call close, they will only need a read lock, not a write lock, and read locks being recursive ensures they can obtain it. moreover, the number of read references held is bounded by something like twice the number of live threads, meaning that the read lock count cannot saturate.
2022-10-19	fix potential unsynchronized access to killlock state at thread exit	Rich Felker	-6/+10
	as reported by Alexey Izbyshev, when the second-to-last thread exits causing a return to single-threaded (no locks needed) state, it creates a situation where the last remaining thread may obtain the killlock that's already held by the exiting thread. this means it may erroneously use the tid of the exiting thread, and may corrupt the lock state due to double-unlock. commit 8d81ba8c0bc6fe31136cb15c9c82ef4c24965040, which (re)introduced the switch back to single-threaded state, documents the intent that the first lock after switching back should provide the necessary synchronization. this is correct, but only works if the switch back is made after there is no further need for synchronization with locks (other than the thread list lock, which can't be bypassed) held by the exiting thread. in order to hit the bug, the remaining thread must first take a different lock, causing it to perform an actual lock one last time, consume the need_locks==-1 state, and transition to need_locks==0. after that, the next attempt to lock the exiting thread's killlock will bypass locking. fix this by reordering the unlocking of killlock at thread exit time, along with changes to the state protected by it, to occur earlier, before the switch to single-threaded state. there are really no constraints on where it's done, except that it occur after there is no longer any possibility of application code executing in the exiting thread, so do it as early as possible.
2022-10-19	fix potential deadlock in dlerror buffer handling at thread exit	Rich Felker	-19/+18
	ever since commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af introduced the thread list lock, this has been wrong. initially, it was wrong via calling free from the context with the thread list lock held. commit aa5a9d15e09851f7b4a1668e9dbde0f6234abada deferred the unsafe free but added a lock, which was also unsafe. in particular, it could deadlock if code holding freebuf_queue_lock was interrupted by a signal handler that takes the thread list lock. commit 4d5aa20a94a2d3fae3e69289dc23ecafbd0c16c4 observed that there was a lock here but failed to notice that it's invalid. there is no easy solution to this problem with locks; any attempt at solving it while still using locks would require the lock to be an AS-safe one (blocking signals on each access to the dlerror buffer list to check if there's deferred free work to be done) which would be excessively costly, and there are also lock order considerations with respect to how the lock would be handled at fork. instead, just use an atomic list.
2022-10-19	disable MADV_FREE usage in mallocng	Rich Felker	-1/+3
	the entire intent of using madvise/MADV_FREE on freed slots is to improve system performance by avoiding evicting cache of useful data, or swapping useless data to disk, by marking any whole pages in the freed slot as discardable by the kernel. in particular, unlike unmapping the memory or replacing it with a PROT_NONE region, use of MADV_FREE does not make any difference to memory accounting for commit charge purposes, and so does not increase the memory available to other processes in a non-overcommitted environment. however, various measurements have shown that inordinate amounts of time are spent performing madvise syscalls in processes which frequently allocate and free medium sized objects in the size range roughly between PAGESIZE and MMAP_THRESHOLD, to the point that the net effect is almost surely significant performance degredation. so, turn it off. the code, which has some nontrivial logic for efficiently determining whether there is a whole-page range to apply madvise to, is left in place so that it can easily be re-enabled if desired, or later tuned to only apply to certain sizes or to use additional heuristics.
2022-10-19	remove LFS64 symbol aliases; replace with dynamic linker remapping	Rich Felker	-121/+0
	originally the namespace-infringing "large file support" interfaces were included as part of glibc-ABI-compat, with the intent that they not be used for linking, since our off_t is and always has been unconditionally 64-bit and since we usually do not aim to support nonstandard interfaces when there is an equivalent standard interface. unfortunately, having the symbols present and available for linking caused configure scripts to detect them and attempt to use them without declarations, producing all the expected ill effects that entails. as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made to prevent this, using macros to redirect the LFS64 names to the standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE. however, this has turned out to be a source of further problems, especially since g++ defines _GNU_SOURCE by default. in particular, the presence of these names as macros breaks a lot of valid code. this commit removes all the LFS64 symbols and replaces them with a mechanism in the dynamic linker symbol lookup failure path to retry with the spurious "64" removed from the symbol name. in the future, if/when the rest of glibc-ABI-compat is moved out of libc, this can be removed.
2022-10-19	dns query core: detect udp truncation at recv time	Rich Felker	-4/+13
	we already attempt to preclude this case by having res_send use a sufficiently large temporary buffer even if the caller did not provide one as large as or larger than the udp dns max of 512 bytes. however, it's possible that the caller passed a custom-crafted query packet using EDNS0, e.g. to get detailed DNSSEC results, with a larger udp size allowance. I have also seen claims that there are some broken nameservers in the wild that do not honor the dns udp limit of 512 and send large answers without the TC bit set, when the query was not using EDNS. we generally don't aim to support broken nameservers, but in this case both problems, if the latter is even real, have a common solution: using recvmsg instead of recvfrom so we can examine the MSG_TRUNC flag.
2022-10-19	getaddrinfo dns lookup: use larger answer buffer to handle long CNAMEs	Rich Felker	-3/+5
	the size of 512 is not sufficient to get at least one address in the worst case where the name is at or near max length and resolves to a CNAME at or near max length. prior to tcp fallback, there was nothing we could do about this case anyway, but now it's fixable. the new limit 768 is chosen so as to admit roughly the number of addresses with a worst-case CNAME as could fit for a worst-case name that's not a CNAME in the old 512-byte limit. outside of this worst-case, the number of addresses that might be obtained is increased. MAXADDRS (48) was originally chosen as an upper bound on the combined number of A and AAAA records that could fit in 512-byte packets (31 and 17, respectively). it is not increased at this time. so as to prevent a situation where the A records consume almost all of these slots (at 768 bytes, a "best-case" name can fit almost 47 A records), the order of parsing is swapped to process AAAA first. this ensures roughly half of the slots are available to each address family.
2022-09-22	dns: implement tcp fallback in __res_msend query core	Rich Felker	-2/+117
	tcp fallback was originally deemed unwanted and unnecessary, since we aim to return a bounded-size result from getaddrinfo anyway and normally plenty of address records fit in the 512-byte udp dns limit. however, this turned out to have several problems: - some recursive nameservers truncate by omitting all the answers, rather than sending as many as can fit. - a pathological worst-case CNAME for a worst-case name can fill the entire 512-byte space with just the two names, leaving no room for any addresses. - the res_* family of interfaces allow querying of non-address records such as TLSA (DANE), TXT, etc. which can be very large. for many of these, it's critical that the caller see the whole RRset. also, res_send/res_query are specified to return the complete, untruncated length so that the caller can retry with an appropriately-sized buffer. determining this is not possible without tcp. so, it's time to add tcp fallback. the fallback strategy implemented here uses one tcp socket per question (1 or 2 questions), initiated via tcp fastopen when possible. the connection is made to the nameserver that issued the truncated answer. right now, fallback happens unconditionally when truncation is seen. this can, and may later be, relaxed for queries made by the getaddrinfo system, since it will only use a bounded number of results anyway. retry is not attempted again after failure over tcp. the logic could easily be adapted to do that, but it's of questionable value, since the tcp stack automatically handles retransmission and the successs answer with TC=1 over udp strongly suggests that the nameserver has the full answer ready to give. further retry is likely just "take longer to fail".
2022-09-22	res_send: use a temp buffer if caller's buffer is under 512 bytes	Rich Felker	-1/+9
	for extremely small buffer sizes, the DNS query core in __res_msend may malfunction completely, being unable to get even the headers to determine the response code. but there is also a problem for reasonable sizes under 512 bytes: __res_msend is unable to determine if the udp answer was truncated at the recv layer, in which case it may be incomplete, and res_send is then unable to honor its contract to return the length of the full, non-truncated answer. at present, res_send does not honor that contract anyway when the full answer would exceed 512 bytes, since there is no tcp fallback, but this change at least makes it consistent in a context where this is the only "full answer" to be had.
2022-09-21	adapt res_msend DNS query core for working with multiple sockets	Rich Felker	-6/+11
	this is groundwork for TCP fallback support, but does not itself change behavior in any way.
2022-09-20	getaddrinfo: add EAI_NODATA error code to distinguish NODATA vs NxDomain	Rich Felker	-6/+12
	this was apparently omitted long ago out of a lack of understanding of its importance and the fact that POSIX doesn't specify it. despite not being officially standardized, however, it turns out that at least AIX, glibc, NetBSD, OpenBSD, QNX, and Solaris document and support it. in certain usage cases, such as implementing a DNS gateway on top of the stub resolver interfaces, it's necessary to distinguish the case where a name does not exit (NxDomain) from one where it exists but has no addresses (or other records) of the requested type (NODATA). in fact, even the legacy gethostbyname API had this distinction, which we were previously unable to support correctly because the backend lacked it. apart from fixing an important functionality gap, adding this distinction helps clarify to users how search domain fallback works (falling back in cases corresponding to EAI_NONAME, not in ones corresponding to EAI_NODATA), a topic that has been a source of ongoing confusion and frustration. as a result of this change, EAI_NONAME is no longer a valid universal error code for getaddrinfo in the case where AI_ADDRCONFIG has suppressed use of all address families. in order to return an accurate result in this case, getaddrinfo is modified to still perform at least one lookup. this will almost surely fail (with a network error, since there is no v4 or v6 network to query DNS over) unless a result comes from the hosts file or from ip literal parsing, but in case it does succeed, the result is replaced by EAI_NODATA. glibc has a related error code, EAI_ADDRFAMILY, that could be used for the AI_ADDRCONFIG case and certain NODATA cases, but distinguishing them properly in full generality seems to require additional DNS queries that are otherwise not useful. on glibc, it is only used for ip literals with mismatching family, not for DNS or hosts file results where the name has addresses only in the opposite family. since this seems misleading and inconsistent, and since EAI_NODATA already covers the semantic case where the "name" exists but doesn't have any addresses in the requested family, we do not adopt EAI_ADDRFAMILY at this time. this could be changed at some point if desired, but the logic for getting all the corner cases with AI_ADDRCONFIG right is slightly nontrivial.
2022-09-19	fix error cases in gethostbyaddr_r	Rich Felker	-2/+3
	EAI_MEMORY is not possible (but would not provide errno if it were) and EAI_FAIL does not provide errno. treat the latter as EBADMSG to match how it's handled in gethostbyname2_r (it indicates erroneous or failure response from the nameserver).