|Age||Commit message (Collapse)||Author||Lines|
in order to produce FILE objects to pass to the intscan/floatscan
backends without any (prohibitively costly) extra buffering layer, the
strto* functions set the FILE's rend (read end) buffer pointer to an
invalid value at the end of the address space, or SIZE_MAX/2 past the
beginning of the string. this led to undefined behavior comparing and
subtracting the end pointer with the buffer position pointer (rpos).
the comparison issue is easily eliminated by using != instead of <.
however the subtractions require nontrivial changes:
previously, f->shcnt stored the count that would have been read if
consuming the whole buffer, which required an end pointer for the
buffer. the purpose for this was that it allowed reading it and adding
rpos-rend at any time to get the actual count so far, and required no
adjustment at the time of __shgetc (actual function call) since the
call would only happen when reaching the end of the buffer.
to get rid of the dependency on rend, instead offset shcnt by buf-rpos
(start of buffer) at the time of last __shlim/__shgetc call. this
makes for slightly more work in __shgetc the function, but for the
inline macro it's still just as easy to compute the current count.
since the scan helper interfaces used here are a big hack, comments
are added to document their contracts and what's going on with their
libc.h was intended to be a header for access to global libc state and
related interfaces, but ended up included all over the place because
it was the way to get the weak_alias macro. most of the inclusions
removed here are places where weak_alias was needed. a few were
recently introduced for hidden. some go all the way back to when
libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented)
cancellation points had to include it.
remaining spurious users are mostly callers of the LOCK/UNLOCK macros
and files that use the LFS64 macro to define the awful *64 aliases.
in a few places, new inclusion of libc.h is added because several
internal headers no longer implicitly include libc.h.
declarations for __lockfile and __unlockfile are moved from libc.h to
stdio_impl.h so that the latter does not need libc.h. putting them in
libc.h made no sense at all, since the macros in stdio_impl.h are
needed to use them correctly anyway.
maintainer's note: the key observation here is that the compared
element is the first slot of the second ceil(half) of the array, and
thus can be removed for further comparison when it does not match, so
that we descend into the second ceil(half)-1 rather than ceil(half)
elements. this change ensures that nel strictly decreases with each
iteration, so that the case of != but nel==1 does not need to be
general policy is that all source files defining a public API or an
ABI mechanism referenced by a public header should include the public
header that declares the interface, so that the compiler or analysis
tools can check the consistency of the declarations. Alexander Monakov
pointed out a number of violations of this principle a few years back.
fix them now.
the (obsolete) standard allows either 0 or 1 for the decimal point
location in this case, but since the number of zero digits returned in
the output string (in this implementation) is one more than the number
of digits the caller requested, it makes sense for the decimal point
to be logically "after" the first digit. in a sense, this change goes
with the previous commit which fixed the value of the decimal point
location for non-zero inputs.
these functions are obsolete and have no modern standard. the text in
SUSv2 is highly ambiguous, specifying that "negative means to the left
of the returned digits", which suggested to me that 0 would mean to
the right of the first digit. however, this does not agree with
historic practice, and the Linux man pages are more clear, specifying
that a negative value means "that the decimal point is to the left of
the start of the string" (in which case, 0 would mean the start of the
string, in accordance with historic practice).
these odd names are actually generated by mess in glibc's stdlib.h, so
any glibc-linked program using strtol needs them to run against musl.
this is a cheat since the _l versions take an extra argument, but
since these functions are only here for ABI purposes, it doesn't
really matter as long as the ABI matches. if the non-__-prefixed
versions are eventually made public, they should proabably be real
functions rather than hacks like this.
this header evolved to facilitate the extremely lazy practice of
omitting explicit includes of the necessary headers in individual
stdio source files; not only was this sloppy, but it also increased
now, stdio_impl.h is only including the headers it needs for its own
use; any further headers needed by source files are included directly
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
not heavily tested but these functions appear to work correctly
shunget cannot unget eof status, causing wcstol to leave endptr
pointing to the wrong place when scanning, for example, L"0x". cheap
fix is to make the read function provide an infinite stream of bogus
characters rather than eof. really this is something of a design flaw
in how the shgetc system is used for strto* and wcsto*; in the long
term, I believe multi-character unget should be scrapped and replaced
with a function that can subtract from the f->shcnt counter.
the immediate benefit is a significant debloating of the float parsing
code by moving the responsibility for keeping track of the number of
characters read to a different module.
by linking shgetc with the stdio buffer logic, counting logic is
defered to buffer refill time, keeping the calls to shgetc fast and
in the future, shgetc will also be useful for integrating the new
float code with scanf, which needs to not only count the characters
consumed, but also limit the number of characters read based on field
shgetc may also become a useful tool for simplifying the integer
this version is intended to be fully conformant to the ISO C, POSIX,
and IEEE standards for conversion of decimal/hex floating point
strings to float, double, and long double (ld64 or ld80 only at
present) values. in particular, all results are intended to be rounded
correctly according to the current rounding mode. further, this
implementation aims to set the floating point underflow, overflow, and
inexact flags to reflect the conversion performed.
a moderate amount of testing has been performed (by nsz and myself)
prior to integration of the code in musl, but it still may have bugs.
so far, only strto(d|ld|f) use the new code. scanf integration will be
done as a separate commit, and i will add implementations of the wide
character functions later.
thanks to the hard work of Szabolcs Nagy (nsz), identifying the best
(from correctness and license standpoint) implementations from freebsd
and openbsd and cleaning them up! musl should now fully support c99
float and long double math functions, and has near-complete complex
math support. tgmath should also work (fully on gcc-compatible
compilers, and mostly on any c99 compiler).
based largely on commit 0376d44a890fea261506f1fc63833e7a686dca19 from
nsz's libm git repo, with some additions (dummy versions of a few
missing long double complex functions, etc.) by me.
various cleanups still need to be made, including re-adding (if
they're correct) some asm functions that were dropped.
these have not been heavily tested, but they should work as described
in the old standards. probably broken for non-finite values...
patch by Pascal Cuoq (with minor tweaks to comments)
this was the cause of crashes in printf when attempting to print
floating point values.
1. my interpretation of subject sequence definition was wrong. adjust
parser to conform to the standard.
2. some code for handling tail overflow case was missing (forgot to
finish writing it).
3. typo (= instead of ==) caused ERANGE to wrongly behave like EINVAL
stopping without letting the parser see a stop character prevented
getting a result. so treat all high chars as the null character and
pass them into the parser.
also eliminated ugly tmp var using compound literals.
this fixes a number of bugs in integer parsing due to lazy haphazard
wrapping, as well as some misinterpretations of the standard. the new
parser is able to work character-at-a-time or on whole strings, making
it easy to support the wide functions without unbounded space for
conversion. it will also be possible to update scanf to use the new
Smoothsort is an adaptive variant of heapsort. This version was
written by Valentin Ochs (apo) specifically for inclusion in musl. I
worked with him to get it working in O(1) memory usage even with giant
array element widths, and to optimize it heavily for size and speed.
It's still roughly 4 times as large as the old heap sort
implementation, but roughly 20 times faster given an almost-sorted
array of 1M elements (20 being the base-2 log of 1M), i.e. it really
does reduce O(n log n) to O(n) in the mostly-sorted case. It's still
somewhat slower than glibc's Introsort for random input, but now
considerably faster than glibc when the input is already sorted, or
0e10000000000000000000000000000000 was setting ERANGE
exponent char e/p was considered part of the match even if not
followed by a valid decimal value
"1e +10" was parsed as "1e+10"
hex digits were misinterpreted as 0..5 instead of 10..15
sadly the C language does not specify any such implicit conversion, so
this is not a matter of just fixing warnings (as gcc treats it) but
actual errors. i would like to revisit a number of these changes and
possibly revise the types used to reduce the number of casts required.
this is actually a workaround for a bug in gcc, whereby it asserts
inequality of the keys being compared...