1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

The prio_tree.c code indexes vmas using 3 different indexes:
* heap_index = vm_pgoff + vm_size_in_pages : end_vm_pgoff
* radix_index = vm_pgoff : start_vm_pgoff
* size_index = vm_size_in_pages
A regular radixprioritysearchtree indexes vmas using only heap_index and
radix_index. The conditions for indexing are:
* >heap_index >= >left>heap_index &&
>heap_index >= >right>heap_index
* if (>heap_index == >left>heap_index)
then >radix_index < >left>radix_index;
* if (>heap_index == >right>heap_index)
then >radix_index < >right>radix_index;
* nodes are hashed to left or right subtree using radix_index
similar to a pure binary radix tree.
A regular radixprioritysearchtree helps to store and query
intervals (vmas). However, a regular radixprioritysearchtree is only
suitable for storing vmas with different radix indices (vm_pgoff).
Therefore, the prio_tree.c extends the regular radixprioritysearchtree
to handle many vmas with the same vm_pgoff. Such vmas are handled in
2 different ways: 1) All vmas with the same radix _and_ heap indices are
linked using vm_set.list, 2) if there are many vmas with the same radix
index, but different heap indices and if the regular radixprioritysearch
tree cannot index them all, we build an overflowsubtree that indexes such
vmas using heap and size indices instead of heap and radix indices. For
example, in the figure below some vmas with vm_pgoff = 0 (zero) are
indexed by regular radixprioritysearchtree whereas others are pushed
into an overflowsubtree. Note that all vmas in an overflowsubtree have
the same vm_pgoff (radix_index) and if necessary we build different
overflowsubtrees to handle each possible radix_index. For example,
in figure we have 3 overflowsubtrees corresponding to radix indices
0, 2, and 4.
In the final tree the first few (prio_tree_root>index_bits) levels
are indexed using heap and radix indices whereas the overflowsubtrees below
those levels (i.e. levels prio_tree_root>index_bits + 1 and higher) are
indexed using heap and size indices. In overflowsubtrees the size_index
is used for hashing the nodes to appropriate places.
Now, an example prio_tree:
vmas are represented [radix_index, size_index, heap_index]
i.e., [start_vm_pgoff, vm_size_in_pages, end_vm_pgoff]
level prio_tree_root>index_bits = 3

_
0 [0,7,7] 
/ \ 
   Regular
/ \  radix priority
1 [1,6,7] [4,3,7]  search tree
/ \ / \ 
     heapandradix
/ \ / \  indexed
2 [0,6,6] [2,5,7] [5,2,7] [6,1,7] 
/ \ / \ / \ / \ 
3 [0,5,5] [1,5,6] [2,4,6] [3,4,7] [4,2,6] [5,1,6] [6,0,6] [7,0,7] 
/ / / _
/ / / _
4 [0,4,4] [2,3,5] [4,1,5] 
/ / / 
5 [0,3,3] [2,2,4] [4,0,4]  Overflowsubtrees
/ / 
6 [0,2,2] [2,1,3]  heapandsize
/ /  indexed
7 [0,1,1] [2,0,2] 
/ 
8 [0,0,0] 
_
Note that we use prio_tree_root>index_bits to optimize the height
of the heapandradix indexed tree. Since prio_tree_root>index_bits is
set according to the maximum end_vm_pgoff mapped, we are sure that all
bits (in vm_pgoff) above prio_tree_root>index_bits are 0 (zero). Therefore,
we only use the first prio_tree_root>index_bits as radix_index.
Whenever index_bits is increased in prio_tree_expand, we shuffle the tree
to make sure that the first prio_tree_root>index_bits levels of the tree
is indexed properly using heap and radix indices.
We do not optimize the height of overflowsubtrees using index_bits.
The reason is: there can be many such overflowsubtrees and all of
them have to be suffled whenever the index_bits increases. This may involve
walking the whole prio_tree in prio_tree_insert>prio_tree_expand code
path which is not desirable. Hence, we do not optimize the height of the
heapandsize indexed overflowsubtrees using prio_tree>index_bits.
Instead the overflow subtrees are indexed using full BITS_PER_LONG bits
of size_index. This may lead to skewed subtrees because most of the
higher significant bits of the size_index are likely to be be 0 (zero). In
the example above, all 3 overflowsubtrees are skewed. This may marginally
affect the performance. However, processes rarely map many vmas with the
same start_vm_pgoff but different end_vm_pgoffs. Therefore, we normally
do not require overflowsubtrees to index all vmas.
From the above discussion it is clear that the maximum height of
a prio_tree can be prio_tree_root>index_bits + BITS_PER_LONG.
However, in most of the common cases we do not need overflowsubtrees,
so the tree height in the common cases will be prio_tree_root>index_bits.
It is fair to mention here that the prio_tree_root>index_bits
is increased on demand, however, the index_bits is not decreased when
vmas are removed from the prio_tree. That's tricky to do. Hence, it's
left as a home work problem.
