summaryrefslogtreecommitdiff
path: root/src/malloc
AgeCommit message (Collapse)AuthorLines
2020-06-03remove stale document from malloc src directoryRich Felker-22/+0
this was an unfinished draft document present since the initial check-in, that was never intended to ship in its current form. remove it as part of reorganizing for replacement of the allocator.
2020-06-03rewrite bump allocator to fix corner cases, decouple from expand_heapRich Felker-17/+72
this affects the bump allocator used when static linking in programs that don't need allocation metadata due to not using realloc, free, etc. commit e3bc22f1eff87b8f029a6ab31f1a269d69e4b053 refactored the bump allocator to share code with __expand_heap, used by malloc, for the purpose of fixing the case (mainly nommu) where brk doesn't work. however, the geometric growth behavior of __expand_heap is not actually well-suited to the bump allocator, and can produce significant excessive memory usage. in particular, by repeatedly requesting just over the remaining free space in the current mmap-allocated area, the total mapped memory will be roughly double the nominal usage. and since the main user of the no-brk mmap fallback in the bump allocator is nommu, this excessive usage is not just virtual address space but physical memory. in addition, even on systems with brk, having a unified size request to __expand_heap without knowing whether the brk or mmap backend would get used made it so the brk could be expanded twice as far as needed. for example, with malloc(n) and n-1 bytes available before the current brk, the brk would be expanded by n bytes rounded up to page size, when expansion by just one page would have sufficed. the new implementation computes request size separately for the cases where brk expansion is being attempted vs using mmap, and also performs individual mmap of large allocations without moving to a new bump area and throwing away the rest of the old one. this greatly reduces the need for geometric area size growth and limits the extent to which free space at the end of one bump area might be unusable for future allocations. as a bonus, the resulting code size is somewhat smaller than the combined old version plus __expand_heap.
2020-06-02move malloc_impl.h from src/internal to src/mallocRich Felker-0/+43
this reflects that it is no longer intended for consumption outside of the malloc implementation.
2020-06-02fix unbounded heap expansion race in mallocRich Felker-152/+87
this has been a longstanding issue reported many times over the years, with it becoming increasingly clear that it could be hit in practice. under concurrent malloc and free from multiple threads, it's possible to hit usage patterns where unbounded amounts of new memory are obtained via brk/mmap despite the total nominal usage being small and bounded. the underlying cause is that, as a fundamental consequence of keeping locking as fine-grained as possible, the state where free has unbinned an already-free chunk to merge it with a newly-freed one, but has not yet re-binned the combined chunk, is exposed to other threads. this is bad even with small chunks, and leads to suboptimal use of memory, but where it really blows up is where the already-freed chunk in question is the large free region "at the top of the heap". in this situation, other threads momentarily see a state of having almost no free memory, and conclude that they need to obtain more. as far as I can tell there is no fix for this that does not harm performance. the fix made here forces all split/merge of free chunks to take place under a single lock, which also takes the place of the old free_lock, being held at least momentarily at the time of free to determine whether there are neighboring free chunks that need merging. as a consequence, the pretrim, alloc_fwd, and alloc_rev operations no longer make sense and are deleted. simplified merging now takes place inline in free (__bin_chunk) and realloc. as commented in the source, holding the split_merge_lock precludes any chunk transition from in-use to free state. for the most part, it also precludes change to chunk header sizes. however, __memalign may still modify the sizes of an in-use chunk to split it into two in-use chunks. arguably this should require holding the split_merge_lock, but that would necessitate refactoring to expose it externally, which is a mess. and it turns out not to be necessary, at least assuming the existing sloppy memory model malloc has been using, because if free (__bin_chunk) or realloc sees any unsynchronized change to the size, it will also see the in-use bit being set, and thereby can't do anything with the neighboring chunk that changed size.
2020-05-22restore lock-skipping for processes that return to single-threaded stateRich Felker-1/+4
the design used here relies on the barrier provided by the first lock operation after the process returns to single-threaded state to synchronize with actions by the last thread that exited. by storing the intent to change modes in the same object used to detect whether locking is needed, it's possible to avoid an extra (possibly costly) memory load after the lock is taken.
2020-05-22don't use libc.threads_minus_1 as relaxed atomic for skipping locksRich Felker-1/+1
after all but the last thread exits, the next thread to observe libc.threads_minus_1==0 and conclude that it can skip locking fails to synchronize with any changes to memory that were made by the last-exiting thread. this can produce data races. on some archs, at least x86, memory synchronization is unlikely to be a problem; however, with the inline locks in malloc, skipping the lock also eliminated the compiler barrier, and caused code that needed to re-check chunk in-use bits after obtaining the lock to reuse a stale value, possibly from before the process became single-threaded. this in turn produced corruption of the heap state. some uses of libc.threads_minus_1 remain, especially for allocation of new TLS in the dynamic linker; otherwise, it could be removed entirely. it's made non-volatile to reflect that the remaining accesses are only made under lock on the thread list. instead of libc.threads_minus_1, libc.threaded is now used for skipping locks. the difference is that libc.threaded is permanently true once an additional thread has been created. this will produce some performance regression in processes that are mostly single-threaded but occasionally creating threads. in the future it may be possible to bring back the full lock-skipping, but more care needs to be taken to produce a safe design.
2018-09-12split internal lock API out of libc.h, creating lock.hRich Felker-1/+1
this further reduces the number of source files which need to include libc.h and thereby be potentially exposed to libc global state and internals. this will also facilitate further improvements like adding an inline fast-path, if we want to do so later.
2018-09-12reduce spurious inclusion of libc.hRich Felker-1/+0
libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
2018-09-12hide dependency-triggering pointer object in malloc_usable_size.cRich Felker-2/+2
2018-09-12rework malloc_usable_size to use malloc_impl.hRich Felker-9/+1
2018-09-12move __memalign declaration to malloc_impl.hRich Felker-4/+2
the malloc-implementation-private header is the only right place for this, because, being in the reserved namespace, __memalign is not interposable and thus not valid to use anywhere else. anything outside of the malloc implementation must call an appropriate-namespace public function (aligned_alloc or posix_memalign).
2018-09-12move declarations for malloc internals to malloc_impl.hRich Felker-6/+2
2018-04-19reintroduce hardening against partially-replaced allocatorRich Felker-5/+10
commit 618b18c78e33acfe54a4434e91aa57b8e171df89 removed the previous detection and hardening since it was incorrect. commit 72141795d4edd17f88da192447395a48444afa10 already handled all that remained for hardening the static-linked case. in the dynamic-linked case, have the dynamic linker check whether malloc was replaced and make that information available. with these changes, the properties documented in commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732 are restored: if calloc is not provided, it will behave as malloc+memset, and any of the memalign-family functions not provided will fail with ENOMEM.
2018-04-19return chunks split off by memalign using __bin_chunk instead of freeRich Felker-7/+5
this change serves multiple purposes: 1. it ensures that static linking of memalign-family functions will pull in the system malloc implementation, thereby causing link errors if an attempt is made to link the system memalign functions with a replacement malloc (incomplete allocator replacement). 2. it eliminates calls to free that are unpaired with allocations, which are confusing when setting breakpoints or tracing execution. as a bonus, making __bin_chunk external may discourage aggressive and unnecessary inlining of it.
2018-04-19using malloc implementation types/macros/idioms for memalignRich Felker-20/+22
the generated code should be mostly unchanged, except for explicit use of C_INUSE in place of copying the low bits from existing chunk headers/footers. these changes also remove mild UB due to dubious arithmetic on pointers into imaginary size_t[] arrays.
2018-04-19move malloc implementation types and macros to an internal headerRich Felker-37/+1
2018-04-19revert detection of partially-replaced allocatorRich Felker-15/+6
commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732 included checks to make calloc fallback to memset if used with a replaced malloc that didn't also replace calloc, and the memalign family fail if free has been replaced. however, the checks gave false positives for replacement whenever malloc or free resolved to a PLT entry in the main program. for now, disable the checks so as not to leave libc in a broken state. this means that the properties documented in the above commit are no longer satisfied; failure to replace calloc and the memalign family along with malloc is unsafe if they are ever called. the calloc checks were correct but useless for static linking. in both cases (simple or full malloc), calloc and malloc are in a source file together, so replacement of one but not the other would give linking errors. the memalign-family check was useful for static linking, but broken for dynamic as described above, and can be replaced with a better link-time check.
2018-04-18allow interposition/replacement of allocator (malloc)Rich Felker-23/+30
replacement is subject to conditions on the replacement functions. they may only call functions which are async-signal-safe, as specified either by POSIX or as an implementation-defined extension. if any allocator functions are replaced, at least malloc, realloc, and free must be provided. if calloc is not provided, it will behave as malloc+memset. any of the memalign-family functions not provided will fail with ENOMEM. in order to implement the above properties, calloc and __memalign check that they are using their own malloc or free, respectively. choice to check malloc or free is based on considerations of supporting __simple_malloc. in order to make this work, calloc is split into separate versions for __simple_malloc and full malloc; commit ba819787ee93ceae94efd274f7849e317c1bff58 already did most of the split anyway, and completing it saves an extra call frame. previously, use of -Bsymbolic-functions made dynamic interposition impossible. now, we are using an explicit dynamic-list, so add allocator functions to the list. most are not referenced anyway, but all are added for completeness.
2018-04-17remove unused __brk function/source fileRich Felker-7/+0
commit e3bc22f1eff87b8f029a6ab31f1a269d69e4b053 removed all references to __brk.
2018-04-17comment __malloc_donate overflow logicRich Felker-0/+3
2018-04-17ldso, malloc: implement reclaim_gaps via __malloc_donateAlexander Monakov-18/+43
Split 'free' into unmap_chunk and bin_chunk, use the latter to introduce __malloc_donate and use it in reclaim_gaps instead of calling 'free'.
2018-04-17malloc: fix an over-allocation bugAlexander Monakov-4/+4
Fix an instance where realloc code would overallocate by OVERHEAD bytes amount. Manually arrange for reuse of memcpy-free-return exit sequence.
2018-04-11optimize malloc0Alexander Monakov-6/+23
Implementation of __malloc0 in malloc.c takes care to preserve zero pages by overwriting only non-zero data. However, malloc must have already modified auxiliary heap data just before and beyond the allocated region, so we know that edge pages need not be preserved. For allocations smaller than one page, pass them immediately to memset. Otherwise, use memset to handle partial pages at the head and tail of the allocation, and scan complete pages in the interior. Optimize the scanning loop by processing 16 bytes per iteration and handling rest of page via memset as soon as a non-zero byte is found.
2018-01-09revise the definition of multiple basic locks in the codeJens Gustedt-1/+1
In all cases this is just a change from two volatile int to one.
2017-07-04fix undefined behavior in freeAlexander Monakov-2/+3
2017-06-15handle mremap failure in realloc of mmap-serviced allocationsRich Felker-1/+2
mremap seems to always fail on nommu, and on some non-Linux implementations of the Linux syscall API, it at least fails to increase allocation size, and may fail to move (i.e. defragment) the existing mapping when shrinking it too. instead of failing realloc or leaving an over-sized allocation that may waste a large amount of memory, fallback to malloc-memcpy-free if mremap fails.
2016-12-17use lookup table for malloc bin index instead of float conversionSzabolcs Nagy-2/+12
float conversion is slow and big on soft-float targets. The lookup table increases code size a bit on most hard float targets (and adds 60byte rodata), performance can be a bit slower because of position independent data access and cpu internal state dependence (cache, extra branches), but the overall effect should be minimal (common, small size allocations should be unaffected).
2016-01-31fix malloc_usable_size for NULL inputSzabolcs Nagy-1/+1
the linux man page specifies malloc_usable_size(0) to return 0 and this is the semantics other implementations follow (jemalloc). reported by Alexander Monakov.
2015-11-04remove external linkage from __simple_malloc definitionRich Felker-1/+1
this function is used only as a weak definition for malloc, for static linking in programs which do not call realloc or free. since it had external linkage and was thereby exported in libc.so's dynamic symbol table, --gc-sections was unable to drop it. this was merely an oversight; there's no reason for it to be external, so make it static.
2015-08-07mitigate blow-up of heap size under malloc/free contentionRich Felker-14/+14
during calls to free, any free chunks adjacent to the chunk being freed are momentarily held in allocated state for the purpose of merging, possibly leaving little or no available free memory for other threads to allocate. under this condition, other threads will attempt to expand the heap rather than waiting to use memory that will soon be available. the race window where this happens is normally very small, but became huge when free chooses to use madvise to release unused physical memory, causing unbounded heap size growth. this patch drastically shrinks the race window for unwanted heap expansion by performing madvise with the bin lock held and marking the bin non-empty in the binmask before making the expensive madvise syscall. testing by Timo Teräs has shown this approach to be a suitable mitigation. more invasive changes to the synchronization between malloc and free would be needed to completely eliminate the problem. it's not clear whether such changes would improve or worsen typical-case performance, or whether this would be a worthwhile direction to take malloc development.
2015-06-22fix regression/typo that disabled __simple_malloc when calloc is usedRich Felker-1/+1
commit ba819787ee93ceae94efd274f7849e317c1bff58 introduced this regression. since the __malloc0 weak alias was not properly provided by __simple_malloc, use of calloc forced the full malloc to be linked.
2015-06-22fix calloc when __simple_malloc implementation is usedRich Felker-12/+15
previously, calloc's implementation encoded assumptions about the implementation of malloc, accessing a size_t word just prior to the allocated memory to determine if it was obtained by mmap to optimize out the zero-filling. when __simple_malloc is used (static linking a program with no realloc/free), it doesn't matter if the result of this check is wrong, since all allocations are zero-initialized anyway. but the access could be invalid if it crosses a page boundary or if the pointer is not sufficiently aligned, which can happen for very small allocations. this patch fixes the issue by moving the zero-fill logic into malloc.c with the full malloc, as a new function named __malloc0, which is provided by a weak alias to __simple_malloc (which always gives zero-filled memory) when the full malloc is not in use.
2015-06-14refactor malloc's expand_heap to share with __simple_mallocRich Felker-81/+126
this extends the brk/stack collision protection added to full malloc in commit 276904c2f6bde3a31a24ebfa201482601d18b4f9 to also protect the __simple_malloc function used in static-linked programs that don't reference the free function. it also extends support for using mmap when brk fails, which full malloc got in commit 5446303328adf4b4e36d9fba21848e6feb55fab4, to __simple_malloc. since __simple_malloc may expand the heap by arbitrarily large increments, the stack collision detection is enhanced to detect interval overlap rather than just proximity of a single address to the stack. code size is increased a bit, but this is partly offset by the sharing of code between the two malloc implementations, which due to linking semantics, both get linked in a program that needs the full malloc with realloc/free support.
2015-06-09in malloc, refuse to use brk if it grows into stackRich Felker-1/+9
the linux/nommu fdpic ELF loader sets up the brk range to overlap entirely with the main thread's stack (but growing from opposite ends), so that the resulting failure mode for malloc is not to return a null pointer but to start returning pointers to memory that overlaps with the caller's stack. needless to say this extremely dangerous and makes brk unusable. since it's non-trivial to detect execution environments that might be affected by this kernel bug, and since the severity of the bug makes any sort of detection that might yield false-negatives unsafe, we instead check the proximity of the brk to the stack pointer each time the brk is to be expanded. both the main thread's stack (where the real known risk lies) and the calling thread's stack are checked. an arbitrary gap distance of 8 MB is imposed, chosen to be larger than linux default main-thread stack reservation sizes and larger than any reasonable stack configuration on nommu. the effeciveness of this patch relies on an assumption that the amount by which the brk is being grown is smaller than the gap limit, which is always true for malloc's use of brk. reliance on this assumption is why the check is being done in malloc-specific code and not in __brk.
2015-03-04remove useless check of bin match in mallocRich Felker-1/+1
this re-check idiom seems to have been copied from the alloc_fwd and alloc_rev functions, which guess a bin based on non-synchronized memory access to adjacent chunk headers then need to confirm, after locking the bin, that the chunk is actually in the bin they locked. the check being removed, however, was being performed on a chunk obtained from the already-locked bin. there is no race to account for here; the check could only fail in the event of corrupt free lists, and even then it would not catch them but simply continue running. since the bin_index function is mildly expensive, it seems preferable to remove the check rather than trying to convert it into a useful consistency check. casual testing shows a 1-5% reduction in run time.
2015-03-04fix init race that could lead to deadlock in malloc init codeRich Felker-39/+14
the malloc init code provided its own version of pthread_once type logic, including the exact same bug that was fixed in pthread_once in commit 0d0c2f40344640a2a6942dda156509593f51db5d. since this code is called adjacent to expand_heap, which takes a lock, there is no reason to have pthread_once-type initialization. simply moving the init code into the interval where expand_heap already holds its lock on the brk achieves the same result with much less synchronization logic, and allows the buggy code to be eliminated rather than just fixed.
2015-03-03make all objects used with atomic operations volatileRich Felker-6/+6
the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,*p,f(*p)); where the latter is intended to show multiple loads of *p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.
2014-08-25add malloc_usable_size function and non-stub malloc.hRich Felker-0/+17
this function is needed for some important practical applications of ABI compatibility, and may be useful for supporting some non-portable software at the source level too. I was hesitant to add a function which imposes any constraints on malloc internals; however, it turns out that any malloc implementation which has realloc must already have an efficient way to determine the size of existing allocations, so no additional constraint is imposed. for now, some internal malloc definitions are duplicated in the new source file. if/when malloc is refactored to put them in a shared internal header file, these could be removed. since malloc_usable_size is conventionally declared in malloc.h, the empty stub version of this file was no longer suitable. it's updated to provide the standard allocator functions, nonstandard ones (even if stdlib.h would not expose them based on the feature test macros in effect), and any malloc-extension functions provided (currently, only malloc_usable_size).
2014-04-02avoid malloc failure for small requests when brk can't be extendedRich Felker-1/+23
this issue mainly affects PIE binaries and execution of programs via direct invocation of the dynamic linker binary: depending on kernel behavior, in these cases the initial brk may be placed at at location where it cannot be extended, due to conflicting adjacent maps. when brk fails, mmap is used instead to expand the heap. in order to avoid expensive bookkeeping for managing fragmentation by merging these new heap regions, the minimum size for new heap regions increases exponentially in the number of regions. this limits the number of regions, and thereby the number of fixed fragmentation points, to a quantity which is logarithmic with respect to the size of virtual address space and thus negligible. the exponential growth is tuned so as to avoid expanding the heap by more than approximately 50% of its current total size.
2013-12-12include cleanups: remove unused headers and add feature test macrosSzabolcs Nagy-1/+0
2013-10-05slightly optimize __brk for sizeRich Felker-1/+1
there is no reason to check the return value for setting errno, since brk never returns errors, only the new value of the brk (which may be the same as the old, or otherwise differ from the requested brk, on failure). it may be beneficial to eventually just eliminate this file and make the syscalls inline in malloc.c.
2013-10-05fix failure of malloc to set errno on heap (brk) exhaustionRich Felker-0/+1
I wrongly assumed the brk syscall would set errno, but on failure it returns the old value of the brk rather than an error code.
2013-09-20fix potential deadlock bug in libc-internal locking logicRich Felker-8/+7
if a multithreaded program became non-multithreaded (i.e. all other threads exited) while one thread held an internal lock, the remaining thread would fail to release the lock. the the program then became multithreaded again at a later time, any further attempts to obtain the lock would deadlock permanently. the underlying cause is that the value of libc.threads_minus_1 at unlock time might not match the value at lock time. one solution would be returning a flag to the caller indicating whether the lock was taken and needs to be unlocked, but there is a simpler solution: using the lock itself as such a flag. note that this flag is not needed anyway for correctness; if the lock is not held, the unlock code is harmless. however, the memory synchronization properties associated with a_store are costly on some archs, so it's best to avoid executing the unlock code when it is unnecessary.
2013-07-23remove redundant check in memalignRich Felker-1/+1
the case where mem was already aligned is handled earlier in the function now.
2013-07-23fix heap corruption bug in memalignRich Felker-1/+3
this bug was caught by the new footer-corruption check in realloc and free. if the block returned by malloc was already aligned to the desired alignment, memalign's logic to split off the misaligned head was incorrect; rather than writing to a point inside the allocated block, it was overwriting the footer of the previous block on the heap with the value 1 (length 0 plus an in-use flag). fortunately, the impact of this bug was fairly low. (this is probably why it was not caught sooner.) due to the way the heap works, malloc will never return a block whose previous block is free. (doing so would be harmful because it would increase fragmentation with no benefit.) the footer is actually not needed for in-use blocks, except that its in-use bit needs to remain set so that it does not get merged with free blocks, so there was no harm in it being set to 1 instead of the correct value. however, there is one case where this bug could have had an impact: in multi-threaded programs, if another thread freed the previous block after memalign's call to malloc returned, but before memalign overwrote the previous block's footer, the resulting block in the free list could be left in a corrupt state. I have not analyzed the impact of this bad state and whether it could lead to more serious malfunction.
2013-07-19harden realloc/free to detect simple overflowsRich Felker-0/+6
the sizes in the header and footer for a chunk should always match. if they don't, the program has definitely invoked undefined behavior, and the most likely cause is a simple overflow, either of a buffer in the block being freed or the one just below it. crashing here should not only improve security of buggy programs, but also aid in debugging, since the crash happens in a context where you have a pointer to the likely-overflowed buffer.
2013-07-04move core memalign code from aligned_alloc to __memalignRich Felker-49/+55
there are two motivations for this change. one is to avoid gratuitously depending on a C11 symbol for implementing a POSIX function. the other pertains to the documented semantics. C11 does not define any behavior for aligned_alloc when the length argument is not a multiple of the alignment argument. posix_memalign on the other hand places no requirements on the length argument. using __memalign as the implementation of both, rather than trying to implement one in terms of the other when their documented contracts differ, eliminates this confusion.
2013-07-04move alignment check from aligned_alloc to posix_memalignRich Felker-1/+2
C11 has no requirement that the alignment be a multiple of sizeof(void*), and in fact seems to require any "valid alignment supported by the implementation" to work. since the alignment of char is 1 and thus a valid alignment, an alignment argument of 1 should be accepted.
2012-12-07page-align initial brk value used by malloc in shared libcRich Felker-1/+5
this change fixes an obscure issue with some nonstandard kernels, where the initial brk syscall returns a pointer just past the end of bss rather than the beginning of a new page. in that case, the dynamic linker has already reclaimed the space between the end of bss and the page end for use by malloc, and memory corruption (allocating the same memory twice) will occur when malloc again claims it on the first call to brk.
2012-12-06fix invalid read in aligned_allocRich Felker-2/+3
in case of mmap-obtained chunks, end points past the end of the mapping and reading it may fault. since the value is not needed until after the conditional, move the access to prevent invalid reads.