summaryrefslogtreecommitdiff
path: root/kernel/rcu/tree_plugin.h
AgeCommit message (Collapse)AuthorLines
2016-03-31rcu: Consolidate expedited GP code into exp_funnel_lock()Paul E. McKenney-3/+0
This commit pulls the grace-period-start counter adjustment and tracing from synchronize_rcu_expedited() and synchronize_sched_expedited() into exp_funnel_lock(), thus eliminating some code duplication. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31rcu: Consolidate expedited GP tracing into rcu_exp_gp_seq_snap()Paul E. McKenney-2/+0
This commit moves some duplicate code from synchronize_rcu_expedited() and synchronize_sched_expedited() into rcu_exp_gp_seq_snap(). This doesn't save lines of code, but does eliminate a "tell me twice" issue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31rcu: Consolidate expedited GP code into rcu_exp_wait_wake()Paul E. McKenney-8/+2
Currently, synchronize_rcu_expedited() and rcu_sched_expedited() have significant duplicate code. This commit therefore consolidates some of this code into rcu_exp_wake(), which is now renamed to rcu_exp_wait_wake() in recognition of its added responsibilities. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31rcu: Enforce expedited-GP fairness via funnel wait queuePaul E. McKenney-11/+5
The current mutex-based funnel-locking approach used by expedited grace periods is subject to severe unfairness. The problem arises when a few tasks, making a path from leaves to root, all wake up before other tasks do. A new task can then follow this path all the way to the root, which needlessly delays tasks whose grace period is done, but who do not happen to acquire the lock quickly enough. This commit avoids this problem by maintaining per-rcu_node wait queues, along with a per-rcu_node counter that tracks the latest grace period sought by an earlier task to visit this node. If that grace period would satisfy the current task, instead of proceeding up the tree, it waits on the current rcu_node structure using a pair of wait queues provided for that purpose. This decouples awakening of old tasks from the arrival of new tasks. If the wakeups prove to be a bottleneck, additional kthreads can be brought to bear for that purpose. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31rcu: Add expedited-grace-period event tracingPaul E. McKenney-0/+3
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31rcu: Add funnel-locking tracing for expedited grace periodsPaul E. McKenney-0/+3
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31rcu: Fix synchronize_rcu_expedited() header commentPaul E. McKenney-7/+13
This commit brings the synchronize_rcu_expedited() function's header comment into line with the new implementation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-15Merge commit 'fixes.2015.02.23a' into core/rcuIngo Molnar-14/+13
Conflicts: kernel/rcu/tree.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-25rcu: Use simple wait queues where possible in rcutreePaul Gortmaker-13/+13
As of commit dae6e64d2bcfd ("rcu: Introduce proper blocking to no-CBs kthreads GP waits") the RCU subsystem started making use of wait queues. Here we convert all additions of RCU wait queues to use simple wait queues, since they don't need the extra overhead of the full wait queue features. Originally this was done for RT kernels[1], since we would get things like... BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt Pid: 8, comm: rcu_preempt Not tainted Call Trace: [<ffffffff8106c8d0>] __might_sleep+0xd0/0xf0 [<ffffffff817d77b4>] rt_spin_lock+0x24/0x50 [<ffffffff8106fcf6>] __wake_up+0x36/0x70 [<ffffffff810c4542>] rcu_gp_kthread+0x4d2/0x680 [<ffffffff8105f910>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff810c4070>] ? rcu_gp_fqs+0x80/0x80 [<ffffffff8105eabb>] kthread+0xdb/0xe0 [<ffffffff8106b912>] ? finish_task_switch+0x52/0x100 [<ffffffff817e0754>] kernel_thread_helper+0x4/0x10 [<ffffffff8105e9e0>] ? __init_kthread_worker+0x60/0x60 [<ffffffff817e0750>] ? gs_change+0xb/0xb ...and hence simple wait queues were deployed on RT out of necessity (as simple wait uses a raw lock), but mainline might as well take advantage of the more streamline support as well. [1] This is a carry forward of work from v3.10-rt; the original conversion was by Thomas on an earlier -rt version, and Sebastian extended it to additional post-3.10 added RCU waiters; here I've added a commit log and unified the RCU changes into one, and uprev'd it to match mainline RCU. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-25rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lockDaniel Wagner-3/+13
rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently, this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will not enable the IRQs. lockdep is happy. By switching over using swait this is not true anymore. swake_up_all() enables the IRQs while processing the waiters. __do_softirq() can now run and will eventually call rcu_process_callbacks() which wants to grap nrp->lock. Let's move the rcu_nocb_gp_cleanup() call outside the lock before we switch over to swait. If we would hold the rnp->lock and use swait, lockdep reports following: ================================= [ INFO: inconsistent lock state ] 4.2.0-rc5-00025-g9a73ba0 #136 Not tainted --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes: (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0 {IN-SOFTIRQ-W} state was registered at: [<ffffffff81109b9f>] __lock_acquire+0xd5f/0x21e0 [<ffffffff8110be0f>] lock_acquire+0xdf/0x2b0 [<ffffffff81841cc9>] _raw_spin_lock_irqsave+0x59/0xa0 [<ffffffff81136991>] rcu_process_callbacks+0x141/0x3c0 [<ffffffff810b1a9d>] __do_softirq+0x14d/0x670 [<ffffffff810b2214>] irq_exit+0x104/0x110 [<ffffffff81844e96>] smp_apic_timer_interrupt+0x46/0x60 [<ffffffff81842e70>] apic_timer_interrupt+0x70/0x80 [<ffffffff810dba66>] rq_attach_root+0xa6/0x100 [<ffffffff810dbc2d>] cpu_attach_domain+0x16d/0x650 [<ffffffff810e4b42>] build_sched_domains+0x942/0xb00 [<ffffffff821777c2>] sched_init_smp+0x509/0x5c1 [<ffffffff821551e3>] kernel_init_freeable+0x172/0x28f [<ffffffff8182cdce>] kernel_init+0xe/0xe0 [<ffffffff8184231f>] ret_from_fork+0x3f/0x70 irq event stamp: 76 hardirqs last enabled at (75): [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60 hardirqs last disabled at (76): [<ffffffff8184116f>] _raw_spin_lock_irq+0x1f/0x90 softirqs last enabled at (0): [<ffffffff810a8df2>] copy_process.part.26+0x602/0x1cf0 softirqs last disabled at (0): [< (null)>] (null) other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(rcu_node_1); <Interrupt> lock(rcu_node_1); *** DEADLOCK *** 1 lock held by rcu_preempt/8: #0: (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0 stack backtrace: CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136 Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014 0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0 0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b 0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f Call Trace: [<ffffffff818379e0>] dump_stack+0x4f/0x7b [<ffffffff8110813b>] print_usage_bug+0x1db/0x1e0 [<ffffffff8102fa4f>] ? save_stack_trace+0x2f/0x50 [<ffffffff811087ad>] mark_lock+0x66d/0x6e0 [<ffffffff81107790>] ? check_usage_forwards+0x150/0x150 [<ffffffff81108898>] mark_held_locks+0x78/0xa0 [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff81108a28>] trace_hardirqs_on_caller+0x168/0x220 [<ffffffff81108aed>] trace_hardirqs_on+0xd/0x10 [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60 [<ffffffff810fd1c7>] swake_up_all+0xb7/0xe0 [<ffffffff811386e1>] rcu_gp_kthread+0xab1/0xeb0 [<ffffffff811089bf>] ? trace_hardirqs_on_caller+0xff/0x220 [<ffffffff81841341>] ? _raw_spin_unlock_irq+0x41/0x60 [<ffffffff81137c30>] ? rcu_barrier+0x20/0x20 [<ffffffff810d2014>] kthread+0x104/0x120 [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260 [<ffffffff8184231f>] ret_from_fork+0x3f/0x70 [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260 Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-23RCU: Privatize rcu_node::lockBoqun Feng-13/+13
In patch: "rcu: Add transitivity to remaining rcu_node ->lock acquisitions" All locking operations on rcu_node::lock are replaced with the wrappers because of the need of transitivity, which indicates we should never write code using LOCK primitives alone(i.e. without a proper barrier following) on rcu_node::lock outside those wrappers. We could detect this kind of misuses on rcu_node::lock in the future by adding __private modifier on rcu_node::lock. To privatize rcu_node::lock, unlock wrappers are also needed. Replacing spinlock unlocks with these wrappers not only privatizes rcu_node::lock but also makes it easier to figure out critical sections of rcu_node. This patch adds __private modifier to rcu_node::lock and makes every access to it wrapped by ACCESS_PRIVATE(). Besides, unlock wrappers are added and raw_spin_unlock(&rnp->lock) and its friends are replaced with those wrappers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23rcu: Remove useless rcu_data_p when !PREEMPT_RCUChen Gang-1/+0
The related warning from gcc 6.0: In file included from kernel/rcu/tree.c:4630:0: kernel/rcu/tree_plugin.h:810:40: warning: ‘rcu_data_p’ defined but not used [-Wunused-const-variable] static struct rcu_data __percpu *const rcu_data_p = &rcu_sched_data; ^~~~~~~~~~ Also remove always redundant rcu_data_p in tree.c. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', ↵Paul E. McKenney-28/+22
'list.2015.12.04b' and 'torture.2015.12.05a' into HEAD doc.2015.12.05a: Documentation updates exp.2015.12.07a: Expedited grace-period updates fixes.2015.12.07a: Miscellaneous fixes list.2015.12.04b: Linked-list updates torture.2015.12.05a: Torture-test updates
2015-12-07rcu: Eliminate unused rcu_init_one() argumentPaul E. McKenney-1/+1
Now that the rcu_state structure's ->rda field is compile-time initialized, there is no need to pass the per-CPU rcu_data structure into rcu_init_one(). This commit therefore eliminates this now-unused parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04rcu: Stop disabling interrupts in scheduler fastpathsPaul E. McKenney-8/+6
We need the scheduler's fastpaths to be, well, fast, and unnecessarily disabling and re-enabling interrupts is not necessarily consistent with this goal. Especially given that there are regions of the scheduler that already have interrupts disabled. This commit therefore moves the call to rcu_note_context_switch() to one of the interrupts-disabled regions of the scheduler, and removes the now-redundant disabling and re-enabling of interrupts from rcu_note_context_switch() and the functions it calls. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested by Peter Zijlstra. ]
2015-12-04rcu: Avoid tick_nohz_active checks on NOCBs CPUsPaul E. McKenney-5/+2
Currently, rcu_prepare_for_idle() checks for tick_nohz_active, even on individual NOCBs CPUs, unless all CPUs are marked as NOCBs CPUs at build time. This check is pointless on NOCBs CPUs because they never have any callbacks posted, given that all of their callbacks are handed off to the corresponding rcuo kthread. There is a check for individually designated NOCBs CPUs, but it pointelessly follows the check for tick_nohz_active. This commit therefore moves the check for individually designated NOCBs CPUs up with the check for CONFIG_RCU_NOCB_CPU_ALL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04rcu: Fix obsolete rcu_bootup_announce_oddness() commentPaul E. McKenney-2/+1
This function no longer has #ifdefs, so this commit removes the header comment calling them out. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04rcu: Remove lock-acquisition loop from rcu_read_unlock_special()Paul E. McKenney-12/+6
Several releases have come and gone without the warning triggering, so remove the lock-acquisition loop. Retain the WARN_ON_ONCE() out of sheer paranoia. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04rcu: Add rcu_normal kernel parameter to suppress expeditingPaul E. McKenney-0/+6
Although expedited grace periods can be quite useful, and although their OS jitter has been greatly reduced, they can still pose problems for extreme real-time workloads. This commit therefore adds a rcu_normal kernel boot parameter (which can also be manipulated via sysfs) to suppress expedited grace periods, that is, to treat requests for expedited grace periods as if they were requests for normal grace periods. If both rcu_expedited and rcu_normal are specified, rcu_normal wins. This means that if you are relying on expedited grace periods to speed up boot, you will want to specify rcu_expedited on the kernel command line, and then specify rcu_normal via sysfs once boot completes. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-11-23rcu: Add transitivity to remaining rcu_node ->lock acquisitionsPaul E. McKenney-1/+1
The rule is that all acquisitions of the rcu_node structure's ->lock must provide transitivity: The lock is not acquired that frequently, and sorting out exactly which required it and which did not would be a maintenance nightmare. This commit therefore supplies the needed transitivity to the remaining ->lock acquisitions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-11-23rcu: Create transitive rnp->lock acquisition functionsPeter Zijlstra-12/+6
Providing RCU's memory-ordering guarantees requires that the rcu_node tree's locking provide transitive memory ordering, which the Linux kernel's spinlocks currently do not provide unless smp_mb__after_unlock_lock() is used. Having a separate smp_mb__after_unlock_lock() after each and every lock acquisition is error-prone, hard to read, and a bit annoying, so this commit provides wrapper functions that pull in the smp_mb__after_unlock_lock() invocations. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEADPaul E. McKenney-193/+234
exp.2015.10.07a: Reduce OS jitter of RCU-sched expedited grace periods. fixes.2015.10.06a: Miscellaneous fixes.
2015-10-07rcu: Enable stall warnings for synchronize_rcu_expedited()Paul E. McKenney-2/+1
This commit redirects synchronize_rcu_expedited()'s wait to synchronize_sched_expedited_wait(), thus enabling RCU CPU stall warnings. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07rcu: Add online/offline info to expedited stall warning messagePaul E. McKenney-0/+31
This commit makes the RCU CPU stall warning message print online/offline indications immediately after the CPU number. A "O" indicates global offline, a "." global online, and a "o" indicates RCU believes that the CPU is offline for the current grace period and "." otherwise, and an "N" indicates that RCU believes that the CPU will be offline for the next grace period, and "." otherwise, all right after the CPU number. So for CPU 10, you would normally see "10-...:" indicating that everything believes that the CPU is online. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07rcu: Consolidate expedited CPU selectionPaul E. McKenney-60/+1
Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus() are identical aside from the the argument to smp_call_function_single(), this commit consolidates them with a functional argument. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06rcu: Add online/offline info to stall warning messagePaul E. McKenney-2/+6
This commit makes the RCU CPU stall warning message print online/offline indications immediately after a hyphen following the CPU number. A "O" indicates that the global CPU-hotplug system believes that the CPU is online, a "o" that RCU perceived the CPU to be online at the beginning of the current expedited grace period, and an "N" that RCU currently believes that it will perceive the CPU as being online at the beginning of the next expedited grace period, with "." otherwise for all three indications. So for CPU 10, you would normally see "10-OoN:" indicating that everything believes that the CPU is online. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06rcu: Use rcu_callback_t in call_rcu*() and friendsBoqun Feng-1/+1
As we now have rcu_callback_t typedefs as the type of rcu callbacks, we should use it in call_rcu*() and friends as the type of parameters. This could save us a few lines of code and make it clear which function requires an rcu callbacks rather than other callbacks as its argument. Besides, this can also help cscope to generate a better database for code reading. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-09-20rcu: Make ->cpu_no_qs be a union for aggregate ORPaul E. McKenney-3/+3
This commit converts the rcu_data structure's ->cpu_no_qs field to a union. The bytewise side of this union allows individual access to indications as to whether this CPU needs to find a quiescent state for a normal (.norm) and/or expedited (.exp) grace period. The setwise side of the union allows testing whether or not a quiescent state is needed at all, for either type of grace period. For now, only .norm is used. A later commit will introduce the expedited usage. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Invert passed_quiesce and rename to cpu_no_qsPaul E. McKenney-3/+3
This commit inverts the sense of the rcu_data structure's ->passed_quiesce field and renames it to ->cpu_no_qs. This will allow a later commit to use an "aggregate OR" operation to test expedited as well as normal grace periods without added overhead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Rename qs_pending to core_needs_qsPaul E. McKenney-1/+1
An upcoming commit needs to invert the sense of the ->passed_quiesce rcu_data structure field, so this commit is taking this opportunity to clarify things a bit by renaming ->qs_pending to ->core_needs_qs. So if !rdp->core_needs_qs, then this CPU need not concern itself with quiescent states, in particular, it need not acquire its leaf rcu_node structure's ->lock to check. Otherwise, it needs to report the next quiescent state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Use single-stage IPI algorithm for RCU expedited grace periodPaul E. McKenney-46/+250
The current preemptible-RCU expedited grace-period algorithm invokes synchronize_sched_expedited() to enqueue all tasks currently running in a preemptible-RCU read-side critical section, then waits for all the ->blkd_tasks lists to drain. This works, but results in both an IPI and a double context switch even on CPUs that do not happen to be running in a preemptible RCU read-side critical section. This commit implements a new algorithm that causes less OS jitter. This new algorithm IPIs all online CPUs that are not idle (from an RCU perspective), but refrains from self-IPIs. If a CPU receiving this IPI is not in a preemptible RCU read-side critical section (or is just now exiting one), it pushes quiescence up the rcu_node tree, otherwise, it sets a flag that will be handled by the upcoming outermost rcu_read_unlock(), which will then push quiescence up the tree. The expedited grace period must of course wait on any pre-existing blocked readers, and newly blocked readers must be queued carefully based on the state of both the normal and the expedited grace periods. This new queueing approach also avoids the need to update boost state, courtesy of the fact that blocked tasks are no longer ever migrated to the root rcu_node structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Consolidate tree setup for synchronize_rcu_expedited()Paul E. McKenney-84/+18
This commit replaces sync_rcu_preempt_exp_init1(() and sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug() and sync_exp_reset_tree(), which will also be used by synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which contains code specific to synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Move rcu_report_exp_rnp() to allow consolidationPaul E. McKenney-66/+0
This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(), sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so that later commits can make synchronize_sched_expedited() use them. The non-code-movement portion of this commit tags rcu_report_exp_rnp() as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wqPaul E. McKenney-4/+2
Now that there is an ->expedited_wq waitqueue in each rcu_state structure, there is no need for the sync_rcu_preempt_exp_wq global variable. This commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq. It also initializes ->expedited_wq only once at boot instead of at the start of each expedited grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-08-04Merge branches 'fixes.2015.07.22a' and 'initexp.2015.08.04a' into HEADPaul E. McKenney-102/+15
fixes.2015.07.22a: Miscellaneous fixes. initexp.2015.08.04a: Initialization and expedited updates. (Single branch due to conflicts.)
2015-07-22rcu: Don't disable CPU hotplug during OOM notifiersPaul E. McKenney-2/+0
RCU's rcu_oom_notify() disables CPU hotplug in order to stabilize the list of online CPUs, which it traverses. However, this is completely pointless because smp_call_function_single() will quietly fail if invoked on an offline CPU. Because the count of requests is incremented in the rcu_oom_notify_cpu() function that is remotely invoked, everything works nicely even in the face of concurrent CPU-hotplug operations. Furthermore, in recent kernels, invoking get_online_cpus() from an OOM notifier can result in deadlock. This commit therefore removes the call to get_online_cpus() and put_online_cpus() from rcu_oom_notify(). Reported-by: Marcin Ślusarz <marcin.slusarz@gmail.com> Reported-by: David Rientjes <rientjes@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-07-22rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()Paul E. McKenney-4/+4
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for consistency with the WARN() series of macros. This also requires inverting the sense of the conditional, which this commit also does. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>
2015-07-22rcu: Fix obsolete priority-boosting commentPaul E. McKenney-2/+1
Tasks are no longer migrated to the root rcu_node, so there is no longer any need for a boost kthread for the root rcu_node, and there no longer is such a kthread. This commit therefore fixes the comment in rcu_boost_kthread()'s header to reflect this new reality. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Consolidate last open-coded expedited memory barrierPaul E. McKenney-1/+0
One of the requirements on RCU grace periods is that if there is a causal chain of operations that starts after one grace period and ends before another grace period, then the two grace periods must be serialized. There has been (and might still be) code that relies on this, for example, certain types of reference-counting code that does a call_rcu() within an RCU callback function. This requirement is why there is an smp_mb() at the end of both synchronize_sched_expedited() and synchronize_rcu_expedited(). However, this is the only smp_mb() in these functions, so it would be nicer to consolidate it into rcu_exp_gp_seq_end(). This commit does just that. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loopPaul E. McKenney-26/+10
This commit gets rid of synchronize_rcu_expedited()'s mutex_trylock() polling loop in favor of the funnel-locking scheme that was abstracted from synchronize_sched_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Make synchronize_rcu_expedited() use sequence-counter schemePaul E. McKenney-10/+6
Although synchronize_rcu_expedited() uses a sequence-counter scheme, it is based on a single increment per grace period, which means that tasks piggybacking off of concurrent grace periods may be forced to wait longer than necessary. This commit therefore applies the new sequence-count functions developed for synchronize_sched_expedited() to speed things up a bit and to consolidate the sequence-counter implementation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Remove CONFIG_RCU_CPU_STALL_INFOPaul E. McKenney-45/+0
The CONFIG_RCU_CPU_STALL_INFO has been default-y for a couple of releases with no complaints, so it is time to eliminate this Kconfig option entirely, so that the long-form RCU CPU stall warnings cannot be disabled. This commit does just that. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Stop disabling CPU hotplug in synchronize_rcu_expedited()Paul E. McKenney-23/+2
The fact that tasks could be migrated from leaf to root rcu_node structures meant that synchronize_rcu_expedited() had to disable CPU hotplug. However, tasks now stay put, so this commit removes the CPU-hotplug disabling from synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Simplify arithmetic to calculate number of RCU nodesAlexander Gordeev-2/+2
This update makes arithmetic to calculate number of RCU nodes more straight and easy to read. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-06-22Merge branch 'timers-core-for-linus' of ↵Linus Torvalds-9/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather largish update for everything time and timer related: - Cache footprint optimizations for both hrtimers and timer wheel - Lower the NOHZ impact on systems which have NOHZ or timer migration disabled at runtime. - Optimize run time overhead of hrtimer interrupt by making the clock offset updates smarter - hrtimer cleanups and removal of restrictions to tackle some problems in sched/perf - Some more leap second tweaks - Another round of changes addressing the 2038 problem - First step to change the internals of clock event devices by introducing the necessary infrastructure - Allow constant folding for usecs/msecs_to_jiffies() - The usual pile of clockevent/clocksource driver updates The hrtimer changes contain updates to sched, perf and x86 as they depend on them plus changes all over the tree to cleanup API changes and redundant code, which got copied all over the place. The y2038 changes touch s390 to remove the last non 2038 safe code related to boot/persistant clock" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) clocksource: Increase dependencies of timer-stm32 to limit build wreckage timer: Minimize nohz off overhead timer: Reduce timer migration overhead if disabled timer: Stats: Simplify the flags handling timer: Replace timer base by a cpu index timer: Use hlist for the timer wheel hash buckets timer: Remove FIFO "guarantee" timers: Sanitize catchup_timer_jiffies() usage hrtimer: Allow hrtimer::function() to free the timer seqcount: Introduce raw_write_seqcount_barrier() seqcount: Rename write_seqcount_barrier() hrtimer: Fix hrtimer_is_queued() hole hrtimer: Remove HRTIMER_STATE_MIGRATE selftest: Timers: Avoid signal deadlock in leap-a-day timekeeping: Copy the shadow-timekeeper over the real timekeeper last clockevents: Check state instead of mode in suspend/resume path selftests: timers: Add leap-second timer edge testing to leap-a-day.c ntp: Do leapsecond adjustment in adjtimex read path time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge ntp: Introduce and use SECS_PER_DAY macro instead of 86400 ...
2015-06-19timer: Reduce timer migration overhead if disabledThomas Gleixner-2/+0
Eric reported that the timer_migration sysctl is not really nice performance wise as it needs to check at every timer insertion whether the feature is enabled or not. Further the check does not live in the timer code, so we have an extra function call which checks an extra cache line to figure out that it is disabled. We can do better and store that information in the per cpu (hr)timer bases. I pondered to use a static key, but that's a nightmare to update from the nohz code and the timer base cache line is hot anyway when we select a timer base. The old logic enabled the timer migration unconditionally if CONFIG_NO_HZ was set even if nohz was disabled on the kernel command line. With this modification, we start off with migration disabled. The user visible sysctl is still set to enabled. If the kernel switches to NOHZ migration is enabled, if the user did not disable it via the sysctl prior to the switch. If nohz=off is on the kernel command line, migration stays disabled no matter what. Before: 47.76% hog [.] main 14.84% [kernel] [k] _raw_spin_lock_irqsave 9.55% [kernel] [k] _raw_spin_unlock_irqrestore 6.71% [kernel] [k] mod_timer 6.24% [kernel] [k] lock_timer_base.isra.38 3.76% [kernel] [k] detach_if_pending 3.71% [kernel] [k] del_timer 2.50% [kernel] [k] internal_add_timer 1.51% [kernel] [k] get_nohz_timer_target 1.28% [kernel] [k] __internal_add_timer 0.78% [kernel] [k] timerfn 0.48% [kernel] [k] wake_up_nohz_cpu After: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-27Merge branches 'array.2015.05.27a', 'doc.2015.05.27a', 'fixes.2015.05.27a', ↵Paul E. McKenney-56/+67
'hotplug.2015.05.27a', 'init.2015.05.27a', 'tiny.2015.05.27a' and 'torture.2015.05.27a' into HEAD array.2015.05.27a: Remove all uses of RCU-protected array indexes. doc.2015.05.27a: Docuemntation updates. fixes.2015.05.27a: Miscellaneous fixes. hotplug.2015.05.27a: CPU-hotplug updates. init.2015.05.27a: Initialization/Kconfig updates. tiny.2015.05.27a: Updates to Tiny RCU. torture.2015.05.27a: Torture-testing updates.
2015-05-27rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAFPaul E. McKenney-3/+3
This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined. The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT_LEAF depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT_LEAF unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUTPaul E. McKenney-3/+3
This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27rcu: Convert CONFIG_RCU_FANOUT_EXACT to boot parameterPaul E. McKenney-1/+1
The CONFIG_RCU_FANOUT_EXACT Kconfig parameter is used primarily (and perhaps only) by rcutorture to verify that RCU works correctly in specific rcu_node combining-tree configurations. It therefore does not make much sense have this as a question to people attempting to configure their kernels. So this commit creates an rcutree.rcu_fanout_exact= boot parameter that rcutorture can use, and eliminates the original CONFIG_RCU_FANOUT_EXACT Kconfig parameter. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>