PROBLEM LINK:
Author: Misha Chorniy
Tester: Hiroto Sekido
Editorialist: Kevin Atienza
DIFFICULTY:
Super Hard
PREREQUISITES:
Segment trees, Range trees, Selfbalancing binary trees, 2D Fenwick trees, sqrt decomposition
PROBLEM:
Given an array A consisting of N positive integers, you have to process Q queries. There are five types of queries:

1 l r
: Let S be the sorted set of elements with indices between l and r. You need to find:
where S_i is the $i$th smallest element of S.

2 x y
: Assign the value y to element at position x. 
3 x
: Delete the element at position x from the array. 
4 x y
: Insert y after element at position x. If x = 0, then y is inserted before the first element. 
5 l r
: Count the number of distinct numbers in array between indices l and r.
QUICK EXPLANATION:
 Instead of actually deleting elements, replace deleted elements with dummy “empty” or “null” values.
 Preprocess the queries so that we can preallocate the whole array, and newly inserted values are inserted in the correct positions already, anticipating future insertions and deletions.
 Preprocess the array so that range “power sum” queries for the exponents 0 to 3 can be computed quickly. For a given l, r and e, the power sum S(l,r,e) is the sum of the $e$th powers of distinct elements from l to r. This should also take into account the dummy “empty” values. This part is somewhat wellknown and is described in more detail below. Each such query can be answered in O(\sqrt{N}) or O(\log^2 N) time.
 The answer to a type 1 query
1 l r
is the following:
where S_e = S(l,r,e). 1/6 can be computed using modular arithmetic.
 The answer to a type 5 query
5 l r
is simply S_0.
EXPLANATION:
There are multiple parts to the solution to this problem. We first describe a solution for a simplified version of the problem, and then gradually support additional types of queries.
No updates
When answering “range query” problems like this, it’s sometimes helpful to ignore the updates first, i.e. try to answer the problem assuming there are no update queries. In our case, let’s assume that there are only type 1 and 5 queries. It’s clear that type 5 queries are easier, so let’s handle them first.
Given a subrange [l,r] of an array A, how many distinct elements are there? The answer could be anywhere from 1 to rl+1, depending on the duplicated elements in the subrange. We wish to count each distinct value only once, i.e. any value that appears again in the subrange must not be counted again.
When is a particular value A_i counted? Obviously, we must have l \le i \le r. But in order not to doublecount, we must ensure that this value is a new one, i.e. it hasn’t appeared before. In other words, let p_i be the largest index such that p_i < i and A_{p_i} = A_i, or p_i = 0 if such a p_i doesn’t exist. Then it must be the case that p_i < l. Extending this to every element in the subrange, we see that the number of distinct elements in A[l\ldots r] is the number of indices i such that:
 l \le i \le r
 0 \le p_i < l
If for every 1 \le i \le N, we plot a point (i,p_i) in the 2D plane, then this is simply the number of points in the rectangle [l,r] imes [0,l1]. But this is the standard problem of 2D range queries, which are solvable with sqrtdecomposition or range trees. (we’ll discuss these techniques in more detail later) This is great, because we can now answer type 5 queries!
Now, what about type 1 queries? At first glance, the formula seems convoluted, and there seems no way to simplify the sum. However, upon closer inspection, we can actually understand that this is simply the sum of all products of triplets of distinct elements in the range. Such a sum can be simplified into the following:
where S_i is the sum of the $i$th powers of all distinct elements in the range.
To understand this, note that S_1^3 is the sum of products of all triples, but it includes every triple six times (once for each permutation), and includes some bad triples with duplicate values. The remaining terms are simply adjustments to accommodate for these bad triples. For example, the sum S_1S_2 is the sum of products of all triples containing at least one duplicated value, and S_3 is the sum of products of all triples containing a triplicate value.
Therefore, to answer a type 1 query, we must be able to answer range queries of the type: “what is the sum of the $e$th powers of all distinct elements in the range?” where e can be anything from 1 to 3. But we can modify the above approach to answer this! (in fact, the previous approach solves this in the special case e = 0) For a fixed e, we assign the value A_i^e in each point (i,p_i), and now the query becomes: compute the sum of values in the rectangle [l,r] imes [0,l1]. The techniques mentioned above will still work with simple modifications
Point updates
Now let’s tackle point updates, i.e. queries of type 2. Suppose we want to set A_i to the value v, and that its previous value is v_{ ext{old}}. To do so, we need to look deeper into the structure of the points (i,p_i) in the 2D plane.
Fix a distinct value v, and let i_1 < i_2 < \ldots < i_k be the distinct indices i such that A_i = v (let’s call it v's index sequence). Then we have the following points: (i_1,0), (i_2,i_1), (i_3,i_2), \ldots, (i_k,i_{k1}).
When setting the value of A_i from v_{ ext{old}} to v, we need to change some points. First, we need to find the index of i in the index sequence of v_{ ext{old}}, say i_r. Then when the value of A_{i_r} is replaced, the points (i_r,i_{r1}) and (i_{r+1},i_r) will have to be removed. Furthermore, we also need to add the point (i_{r+1},i_{r1}).
Now, when setting the value of A_i to v, we need to find out where the index i will be placed in the index sequence of v. Let’s say i_{r1} < i < i_r for some r. Then we need to remove the point (i_r, i_{r1}) and add the points (i, i_{r1}) and (i_r, i).
This means that there are only at most 6 points that need to be updated: 3 removed, 3 added (maybe even less when the index i is the first and last in the corresponding index sequence). Our sqrtdecomposition or 2D range tree approaches can be updated to accommodate these kinds of updates (which we’ll also discuss later below).
There’s also the question of how to maintain the index sequences of all distinct values. But this can be done by maintaining a binary search tree for every distinct value.
Deletions
Now, let’s try handling deletions. Suppose we want to remove A_i from the sequence. If we simply remove this value from the sequence, then the indices of all later elements of the sequence will need to be adjusted (for example A_{i+1} becomes A_i, etc.). But this means updating up to O(N) points, which would be too slow to pass the time limit. Thus, we need to find another way.
The key idea is to simply not remove A_i from the sequence, and instead replace it with a dummy “null” value. We consider this “null” value to be just like any other value, but the difference is that it does not have its own index sequence (and thus no points). Using this, we don’t have to update the indices of the later elements of the sequence!
Unfortunately, this means that the indices of range queries [l,r] will need to be updated (because some of the indices are now null values). Specifically, each index i needs to be replaced with a new index, which we denote by f(i), which is the index of the $i$th nonnull value of the sequence. Fortunately, such an index can be computed by maintaining a segment tree structure for the null values. Specifically, we construct a tree with N leaves, where the $i$th leaf contains a 1 if A_i is not null, and 0 otherwise. Each internal node contains the sum of the values in the leaves in the subtree rooted at the node. Replacing the $i$th index of the sequence with a null (or nonnull) value corresponds to replacing the value of the $i$th leaf in this segment tree, and updating the corresponding internal nodes. To compute f(i) given i, one only needs a single traversal down the tree to the appropriate leaf (utilizing the partial sums in the internal nodes).
Both operations run in O(\log N) time, which means this structure is not the bottleneck of our solution.
Insertions
Insertions have the same problems as deletions when done naïvely: if we insert a value at a certain position, we need to update the indices of the later elements, and potentially update up to O(N) points. The key is to implement the insertions so that:
 the whole array is allocated beforehand
 each inserted value is inserted in the correct position, anticipating future insertions and deletions.
For simplicity, instead of starting with an array of size O(N), we can simply start with an empty array, and perform O(N) insertions before the Q queries (so now we have N+Q queries, where the first N are insertions). We preallocate an array of size N+Q, initially all null, and the key is to determine for each insertion the correct index where it should be placed so that future insertions can be accommodated without requiring a reallocation of the array. These correct indices can be precomputed by first simulating all insertions and deletions with a selfbalancing segment tree (for example, an AVL tree modified to become a segment tree), with the following guidelines:
 All deletions are performed by marking the corresponding leaf that it has been “deleted”. (this is needed so we can compute the function f(i) quickly)
 During each insertion, we store the index of the insertion in the newly added leaf. (i.e. its index in the sequence of operations)
 After finishing all operations, we perform an inorder traversal of the tree (though we only care about the order of the leaves). Let’s say the $i$th leaf was created during the $j$th insertion. Then when the $j$th insertion is actually performed, we simply replace the $i$th value with the correct nonnull value.
This precomputation runs in O((N+Q)\log(N+Q)). After that, we simply allocate an array of size N+Q, initially all null, and no more reallocation is required during the processing of the actual queries!
Rectangle queries
Finally, all that remains is to actually perform the rectangle queries efficiently. We’ll clarify our task here: starting with a set of points (initially empty), we need to perform the following operations:
 Insert a new point (x,y) to the set, assigning a value v to it.
 Delete an existing point (x,y) from the set.
 Given a rectangle [a,b] imes [c,d] and an exponent 0\le e\le 3, find the sum of the $e$th powers of values of all points in the rectangle.
There are multiple (standard) approaches for this problem.
2D range tree
Note that this can be done with an O(N' \log N') preprocessing and O(\log^2 N') query time (where N' := O(N + Q) is the number of operations) by using a 2D range tree fitted for updates: Since a 2D range tree is just a tree of segment trees, we use segment trees that support point insertions/deletions and range sums. In each leaf of a segment tree, we store four values, (1,v,v^2,v^3), and in each internal node, we store four values: the sums of the values of each node for every exponent.
sqrt decomposition
You can also use the all mighty sqrt decomposition by noticing that each inserted point (x,y) satisfies 0 \le x, y \le N. Therefore, we can instead think of the space [0,N] imes [0,N] as a 2D matrix. Now, we try to decompose this matrix into blocks of dimensions K imes K, so that there are N/K imes N/K blocks (for some parameter K which we’ll determine later). We also store, for each exponent 0\le e\le 3, an (N/K+1) imes (N/K+1) table of 2D partial sums: S_e(i,j) is the sum of all values from the first i rows and first j columns of blocks. Such a table can be computed in O(N + (N/K)^2) time.
To implement the operations:
 To implement a rectangle query (a,b,c,d,e), first we decompose our rectangle [a,b] imes [c,d] into four “quadrants”:
Thus, we replace every rectangle query with four “quadrant queries”, so we can assume from here on that a = c = 0.
Next, we let $b'$ and $d'$ be the largest multiple of $K$ that are $\le b$ and $\le d$, respectively. Then the answer would be $S_e(b'/K,d'/K)$ plus the values in the points in the thin strips $[b',b] imes [0,d]$ and $[0,b'] imes [d',d]$. But the later points can be enumerated efficiently by taking note of the fact that **there is at most one point in every row and column at any moment in time** (why?). Therefore, there are at most $K$ points that need to be enumerated that way.
The running time of each query is $O(K)$.
 To implement an insertion or deletion, we do not update anything; instead we simply lazily store the insertion and deletion operations we did so far in a list L (so that an insertion/deletion runs in O(1) time). When doing this, we need to update our “query” operation to look into this array and update the sum in O(L) time, so the new running time of a query is O(K + L).
Unfortunately, L can be up to N+Q, so an update can be very slow. The trick is to rebuild the table S_e every T operations (for some parameter T). The table can be rebuilt in O(N + (N/K)^2) time, but the query operation optimizes to O(K + T) time.
Now all that remains is to choose the parameters K and T. Note that there are N' = N+Q queries, and after every T insertions, we perform an O(N+(N/K)^2) time “rebuild” operation. Therefore, the running time after N' operations is:
It can be shown that the optimal choice for the values K and T is K = T = O(\sqrt{N}) (hence “sqrt decomposition”) (why?), so that the running time is O(N'\sqrt{N}) = O((N+Q)\sqrt{N}).
2D Fenwick tree
Finally you can also use a [2D Fenwick tree](https://www.topcoder.com/community/datascience/datasciencetutorials/binaryindexed trees/#2d) to perform the queries in O(\log^2 N) time. The problem is that the memory requirement becomes O(N^2). There are ways to reduce the memory requirement, for example using a hash map for every 1D Fenwick tree (but this has a rather large constant behind the O notation).
We leave it to the reader to discover the details on how to optimize this. If you need more hints, feel free to ask. Also, the author’s solution uses this approach so you can check that solution too.
Notes
If you have some questions, or would like some parts of this editorial in more detail, please feel free to ask in the comments below, and I’ll try to accommodate your request
Time Complexity:
O((N + Q) \sqrt{N}) or O((N + Q) \log^2 N)