This assumes you already have a basic knowledge of segment trees.
For this discussion, I want to show everyone that persistence can actually be made simple just by changing our way of thinking.
Introduction
Usually, persistence is treated as a “really hard” kind of data structure. After all, it’s like a dream data structure with robust version control. It’s the crowd favorite for long challenges because of its fame to as the “hardest to implement”, especially when appended with truly hard concepts in other topics. In reality, persistence can actually be made simple, as it is literally only one step away from the non-persistent version.
Background
Why does the idea of a persistent data structure seem so hard for most programmers? I think this is because most of us started with imperative programming languages like C++, Java, and Python, where changing values of variables is the norm. We are familiar with having the power to mutate a value anytime we want. In contrast, most of us cringe whenever we deal with immutables. Hail C++ mutable std::string
, screw Java’s immutable String
. Nothing beats plain simple assignment operator (s[i] = c
) compared to the read-only (s.charAt(i)
). Most, if not all of us, grew with the mutable mindset. We live and breathe mutability in our code. It is our intuitive art.
Immutable Programming
But when you think about it, persistence is all about staying immutable. Being persistent means we discard our powers of mutability. Just like how Java’s String
class is immutable, we avoid changing individual values directly. Though at first this seems dumb and inefficient, when you think about it it’s the straightforward way to do version control. Since original content stays the same, having a new version of the content won’t damage the version of the original. For immutables, the way to “change” is to “create”. The truth is persistence == immutability
, hence we can summarize the act of staying persistent with one golden rule.
GOLDEN RULE OF PERSISTENCE
Create new nodes instead of changing them.
That’s it? Yes, that’s it! The way to be persistent is to stop using the assignment operator, and instead focus on recreation of nodes during a change. We return new nodes instead of mutating. That’s the only additional step we need to transform from mutable to persistent. In REMOVING our capability to be mutable, we force ourselves to CREATE a new way to do version control. In effect, original content can stay the same.
The secret is that there is no secret
Persistence is actually straightforward. There’s no need to fancy things around. Just replace all assignment operators with a new node()
expression or similar. It is no exaggeration to say that it is a mere few lines of code away from the mutable version. The “god data structure” everyone thinks is so complicated is in truth something with removed powers than the original. But with this irony of a removed power, we instead gain a new power called efficient version control.
IMPLEMENTATION
Let’s demonstrate an example of adding persistence to a segment tree. Suppose we want to solve the range sum query problem, but with a persistent data structure. We could implement it the usual segment tree way as follows:
Build (usual segment tree)
#define M ((L+R)/2)
int st[4*n];
void build(int arr[], int p=1, int L=0, int R=n-1) {
if (L == R) st[p] = arr[L]; // construct as leaf
else { // construct as parent
build(arr, 2*p, L, M);
build(arr, 2*p+1, M+1, R);
st[p] = st[2*p] + st[2*p+1];
}
}
Now how do we make this persistent? What we do is that we replace every assignment operator st[p] = ...
with a new()
operator. See my example below for the details.
Build (persistent segment tree)
#define M ((L+R)/2)
int build(int arr[], int L=0, int R=n-1) {
if (L == R) return newleaf(arr[L]); // construct as leaf
else return newparent(build(arr, L, M), build(arr, M+1, R)); // construct as parent
}
// Usage: int root = build(arr, 0, n - 1);
We abstract the idea of “change” via “creation”, as per the golden rule of persistence. Isn’t it really simple? In fact, in some cases, we can achieve shorter code than the mutable version if we do it this way! Of course, this depends on the problem, but it is not wrong to say that it can be achievable.
Creating New Nodes
If you think about it, the difference between the non-persistent and the persistent version is that the former mutates an array called st[]
while the latter returns a new node at every build. You can implement this a couple of ways. A famous way is to use a node class. But for me, the way I usually implement it is by allocating a pooled array:
int l[SIZE], r[SIZE], st[SIZE], NODES = 0;
int newleaf(int value) {
int p = ++NODES;
l[p] = r[p] = 0; // null
st[p] = value;
return p;
}
int newparent(int lef, int rig) {
int p = ++NODES;
l[p] = lef;
r[p] = rig;
st[p] = st[lef] + st[rig]; // immediately update value from children
return p;
}
There are two operations, newleaf()
and newparent()
. We either construct nodes as a leaf or as a parent, where being a parent already pulls values from the children. In the same way as the mutable version, we have an array called st[]
that stores the segment tree values depending on our queries. Each node also has pointer to the left and the right children stored at l[]
and r[]
respectively. For the size, I usually allocate around SIZE \approx 8N \lceil log N \rceil. In practice I do this because it’s faster than using classes. But of course, it all depends on your groove
What about updating nodes? When there’s no persistence, we usually directly manipulate our segment tree array st[]
like so:
Point Update (usual segment tree)
void update(int i, int x, int p=1, int L=0, int R=n-1) {
if (L == R) {st[p] = x; return;}
if (i <= M) update(i, x, 2*p, L, M);
else update(i, x, 2*p+1, M+1, R);
st[p] = st[2*p] + st[2*p+1];
}
If we want to add persistent update, we simply need to create new nodes via our newleaf()
or newparent()
functions instead of the usual assignment operators:
Point Update (persistent segment tree)
// return an int, a pointer to the new root of the tree
int update(int i, int x, int p, int L=0, int R=n-1) {
if (L == R) return newleaf(st[p] + x);
if (i <= M) return newparent(update(i, x, l[p], L, M), r[p]);
else return newparent(l[p], update(i, x, r[p], M + 1, R));
}
// Usage:
// int new_version_root = update(i, x, root);
// Both roots are valid, you can query from both of them!
The only change you need is wrapping the change in a new node at every update. If we merely consider persistence as a wrapper, we can make the implementation clean and concise.
Range Copy (persistent segment tree)
(Bonus) My favorite operation on a persistent tree is the range copy. This is the ultimate version control technique one can pull off with regards to reverting a range back to a certain version. Imagine doing a range reset back to initial values - what a breakthrough! Although I find it rare to see problems that need range copy (e.g. OAK), it’s one of those operations that can come in handy. Fortunately, for a persistent tree, this can easily be achieved a few lines of code:
// revert range [a:b] of p
int rangecopy(int a, int b, int p, int revert, int L=0, int R=n-1) {
if (b < L || R < a) return p; // keep version
if (a <= L && R <= b) return revert; // reverted version
return newparent(rangecopy(a, b, l[p], l[revert], L, M),
rangecopy(a, b, r[p], r[revert], M+1, R));
}
// Usage: (revert a range [a:b] back to an old version)
// int reverted_root = rangecopy(a, b, root, old_version_root);
We pass in two roots: the current tree root and the old version root. We traverse both side-by-side, and replace only the relevant nodes during our traversal. This meek five-lined function is the foundation of efficient version-control, which is but one of the many operations you can do with a persistent segment tree.
Complexity
In a usual tree, we change at most O(log N) nodes during an update/query. Therefore, for a persistent tree, we create at most O(log N) nodes as well.
Summary
Persistence is a powerful tool, but can also be made simple. After reading this, I hope that you have come to understand persistent data structures in new light. Trust me, the conversion process is really simple since we just convert all assignment operators to new
leaves or parents.
Persistence is not only for segment trees - you can in fact use this technique for other data structures, such as BBSTs. By understanding persistence more generally, I purposefully omitted range query since I want you to try and code that on your own. By now, I hope you can find it easier to come up with your own code for various kinds of queries, and even for lazy propagation! Feel free to comment on this thread to open discussion of such techniques.
As for other resources, you can check out Anudeep’s blog or the Wiki page for a more in-depth explanation of persistent trees, explained with problems to solve. With lots of practice, you can be ready hike your way to conquer next month’s long challenge. Remember to stay persistent, and happy coding
UPD: added usage
UPD2: added list of problems in the comments below!