summaryrefslogtreecommitdiff
path: root/src/stdio/fgetwc.c
AgeCommit message (Collapse)AuthorLines
2017-11-20make fgetwc handling of encoding errors consistent with/without bufferRich Felker-14/+14
previously, fgetwc left all but the first byte of an illegal sequence unread (available for subsequent calls) when reading out of the FILE buffer, but dropped all bytes contibuting to the error when falling back to reading a byte at a time. neither behavior was ideal. in the buffered case, each malformed character produced one error per byte, rather than one per character. in the unbuffered case, consuming the last byte that caused the transition from "incomplete" to "invalid" state potentially dropped (and produced additional spurious encoding errors for) the next valid character. to handle both cases uniformly without duplicate code, revise the buffered case to only cover situations where a complete and valid character is present in the buffer, and fall back to byte-at-a-time for all other cases. this allows using mbtowc (stateless) instead of mbrtowc, which may slightly improve performance too. when an encoding error has been hit in the byte-at-a-time case, leave the final byte that produced the error unread (via ungetc) except in the case of single-byte errors (for UTF-8, bytes c0, c1, f5-ff, and continuation bytes with no lead byte). single-byte errors are fully consumed so as not to leave the caller in an infinite loop repeating the same error. none of these changes are distinguished from a conformance standpoint, since the file position is unspecified after encoding errors. they are intended merely as QoI/consistency improvements.
2017-11-18fix fgetwc when decoding a character that crosses buffer boundarySzabolcs Nagy-0/+1
Update the buffer position according to the bytes consumed into st when decoding an incomplete character at the end of the buffer.
2015-06-16byte-based C locale, phase 2: stdio and iconv (multibyte callers)Rich Felker-3/+12
this patch adjusts libc components which use the multibyte functions internally, and which depend on them operating in a particular encoding, to make the appropriate locale changes before calling them and restore the calling thread's locale afterwards. activating the byte-based C locale without these changes would cause regressions in stdio and iconv. in the case of iconv, the current implementation was simply using the multibyte functions as UTF-8 conversions. setting a multibyte UTF-8 locale for the duration of the iconv operation allows the code to continue working. in the case of stdio, POSIX requires that FILE streams have an encoding rule bound at the time of setting wide orientation. as long as all locales, including the C locale, used the same encoding, treating high bytes as UTF-8, there was no need to store an encoding rule as part of the stream's state. a new locale field in the FILE structure points to the locale that should be made active during fgetwc/fputwc/ungetwc on the stream. it cannot point to the locale active at the time the stream becomes oriented, because this locale could be mutable (the global locale) or could be destroyed (locale_t objects produced by newlocale) before the stream is closed. instead, a pointer to the static C or C.UTF-8 locale object added in commit commit aeeac9ca5490d7d90fe061ab72da446c01ddf746 is used. this is valid since categories other than LC_CTYPE will not affect these functions.
2015-06-13fix idiom for setting stdio stream orientation to wideRich Felker-1/+1
the old idiom, f->mode |= f->mode+1, was adapted from the idiom for setting byte orientation, f->mode |= f->mode-1, but the adaptation was incorrect. unless the stream was alreasdy set byte-oriented, this code incremented f->mode each time it was executed, which would eventually lead to overflow. it could be fixed by changing it to f->mode |= 1, but upcoming changes will require slightly more work at the time of wide orientation, so it makes sense to just call fwide. as an optimization in the single-character functions, fwide is only called if the stream is not already wide-oriented.
2012-11-08clean up stdio_impl.hRich Felker-0/+2
this header evolved to facilitate the extremely lazy practice of omitting explicit includes of the necessary headers in individual stdio source files; not only was this sloppy, but it also increased build time. now, stdio_impl.h is only including the headers it needs for its own use; any further headers needed by source files are included directly where needed.
2011-07-30add proper fuxed-based locking for stdioRich Felker-1/+0
previously, stdio used spinlocks, which would be unacceptable if we ever add support for thread priorities, and which yielded pathologically bad performance if an application attempted to use flockfile on a key file as a major/primary locking mechanism. i had held off on making this change for fear that it would hurt performance in the non-threaded case, but actually support for recursive locking had already inflicted that cost. by having the internal locking functions store a flag indicating whether they need to perform unlocking, rather than using the actual recursive lock counter, i was able to combine the conditionals at unlock time, eliminating any additional cost, and also avoid a nasty corner case where a huge number of calls to ftrylockfile could cause deadlock later at the point of internal locking. this commit also fixes some issues with usage of pthread_self conflicting with __attribute__((const)) which resulted in crashes with some compiler versions/optimizations, mainly in flockfile prior to pthread_create.
2011-03-28major stdio overhaul, using readv/writev, plus other changesRich Felker-2/+2
the biggest change in this commit is that stdio now uses readv to fill the caller's buffer and the FILE buffer with a single syscall, and likewise writev to flush the FILE buffer and write out the caller's buffer in a single syscall. making this change required fundamental architectural changes to stdio, so i also made a number of other improvements in the process: - the implementation no longer assumes that further io will fail following errors, and no longer blocks io when the error flag is set (though the latter could easily be changed back if desired) - unbuffered mode is no longer implemented as a one-byte buffer. as a consequence, scanf unreading has to use ungetc, to the unget buffer has been enlarged to hold at least 2 wide characters. - the FILE structure has been rearranged to maintain the locations of the fields that might be used in glibc getc/putc type macros, while shrinking the structure to save some space. - error cases for fflush, fseek, etc. should be more correct. - library-internal macros are used for getc_unlocked and putc_unlocked now, eliminating some ugly code duplication. __uflow and __overflow are no longer used anywhere but these macros. switch to read or write mode is also separated so the code can be better shared, e.g. with ungetc. - lots of other small things.
2011-02-14fix some pointer signedness issues (this was invalid C)Rich Felker-2/+2
2011-02-12initial check-in, version 0.5.0v0.5.0Rich Felker-0/+51