summaryrefslogtreecommitdiff
path: root/src/stdio/getc.h
AgeCommit message (Collapse)AuthorLines
2018-10-18further optimize getc/putc when locking is neededRich Felker-5/+5
check whether the lock is free before loading the calling thread's tid. if so, just use a dummy tid value that cannot compare equal to any actual thread id (because it's one bit wider). this also avoids the need to save the tid and pass it to locking_getc or locking_putc, reducing register pressure. this change might slightly hurt the case where the caller already holds the lock, but it does not affect the single-threaded case, and may significantly improve the multi-threaded case, especially on archs where loading the thread pointer is disproportionately expensive like early mips and arm ISA levels. but even on i386 it helps, at least on some machines; I measured roughly a 10-15% improvement.
2018-10-17optimize hot paths of getc with manual shrink-wrappingRich Felker-0/+22
with these changes, in a program that has not created any threads besides the main thread and that has not called f[try]lockfile, getc performs indistinguishably from getc_unlocked. this was measured on several i386 and x86_64 models, and should hold on other archs too simply by the properties of the code generation. the case where the caller already holds the lock (via flockfile) is improved significantly as well (40-60% reduction in time on machines tested) and the case where locking is needed is improved somewhat (roughly 10%). the key technique used here is forcing the non-hot path out-of-line and enabling it to be a tail call. a static noinline function (conditional on __GNUC__) is used rather than the extern hiddens used elsewhere for this purpose, so that the compiler can choose non-default calling conventions, making it possible to tail-call to a callee that takes more arguments than the caller on archs where arguments are passed on the stack or must have space reserved on the stack for spilling the. the tid could just be reloaded via the thread pointer in locking_getc, but that would be ridiculously expensive on some archs where thread pointer load requires a trap or syscall.