PROBLEM LINK:
Author: Misha Chorniy
Tester: Karan Aggarwal
Editorialist: Pushkar Mishra
DIFFICULTY:
Medium
PREREQUISITES:
Segment Trees
PROBLEM:
Given is an array A of length N. There are two types of operations that we need to handle on this array:
- 1 x y: change the value of A[x] to y, i.e., A[x] = y.
- 2 l r: Answer in a “Yes” or a “No” whether the subarray A[l..r] is dominating or not.
Here dominating means that there is a number in the subarray that appears in the subarray at least as many times as one more than half of the length of the subarray.
EXPLANATION:
The nature of the problem clearly hints towards segment trees. We need to efficiently perform two kinds of operations: update a value at an index, and answer for a subarray whether there exists a number in it which appears at least (one more than half its length) times in it.
Clearly, There isn’t a notion of an elegant lazy propagation here since the updates we perform are to single elements.
Let us start by making some observations about dominant arrays and subarrays. Let us consider the entire array A[1..N]. The first important observation is that it will only be dominating if either subarray A[1..\frac{N}{2}] is dominating or subarray A[\frac{N}{2}+1..N] is dominating. This is simple to prove. We use proof by contradiction. Assume that neither of the subarrays are dominatinng. That means all the numbers in either of the subarrays appear at maximum \frac{N}{4} times only. That means, if we combine the two arrays, there is no number which appears more than \frac{N}{2} times. This is a contradiction. Therefore, A is not dominating.
We can further extend the observation to say that if A[1..N] is dominating, then the dominating number must either be the dominating number of A[1..\frac{N}{2}] or the dominating number of A[\frac{N}{2}+1..N]. This follows from the same arguments that we made above.
This leads us to our method of building our segment tree. Each node of the segment tree stores 2 fields: an integer ‘Dominant’ which stores the dominant number in the subarray covered by the node, and a map (it can be a hash map for better bounds) ‘CountOccurrences’ which counts the number of times each number appears in the subarray covered by the node. The memory used in this is \mathcal{O}(N\log N) because each number is stored in \log N nodes only. So memory isn’t an issue in our structure at all.
The pseudocode for the build operation is pretty straightforward now:
void buildNode (node r):
if(r == leaf) {
//let us say that this leaf corresponds to A[index] of the given array.
SegTree[r].CountOccurrences.clear(); // clear map of any garbage
// Increment the occurrence of the number at this leaf
SegTree[r].CountOccurrences[a[x]] += 1;
SegTree[r].Dominant = a[x]; // a number dominates its own cell
return;
}
if(r != leaf) {
buildNode(r->left);
buildNode(r->right);
length = r->length; // length of the subarray at this node.
// We merge the maps of children. Merging simply means adding the
// number of occurrences of each number in either of the children maps
// to the parent map.
SegTree[r].CountOccurrences = merge(SegTree[r->left].CountOccurrences,
SegTree[r->right].CountOccurrences);
if(SegTree[r].CountOccurrences[SegTree[r->left].Dominant] > length/2) {
SegTree[r].Dominant = SegTree[r->left].Dominant;
} else if (SegTree[r].CountOccurrences[SegTree[r->right].Dominant] > length/2) {
SegTree[r].Dominant = SegTree[r->right].Dominant;
} else {
// indicates that subarray at this node is non-dominating.
SegTree[r].Dominant = -1;
}
}
}
The update routine is very similar. We just subtract 1 from the number of occurrences of the old value and add 1 to the number of occurrences of the new value at the correspoing leaf plus all its ancestors. Recalculating the ‘Dominant’ variable is done in the same way as the build function.
With this data structure, how can we query whether a subarray is dominant or not. We use the same logic as before. To query a subarray, we will be visiting at most \mathcal{O}(\log N) nodes in the segment tree. Each of these nodes will have its dominant number (or -1 in case it is not dominating). If the subarray we are querying about is dominating then the dominating number must be one of the dominating numbers of the nodes we visit in the query operation. Therefore, the first thing to find out is the nodes that the query function is going to visit in the segment tree. Then for the dominating number of each of them, we just count how many times they appear over all the nodes that the query function visited. If any of the dominating numbers appeared (more than half the length of the subarray) times, then the subarray is dominating; otherwise no.
What is the complexity of this approach? Build is same as the standard segment tree function which takes \mathcal{O}(N\log N). This is when we are using hash maps in our segment tree nodes. If we use normal maps, then the complexity will be \mathcal{O}(N\log^2 N). Similarly, update takes \mathcal{O}(\log N) with hash maps and \mathcal{O}(\log^2 N) with normal maps. The query function on the segment tree itself is \mathcal{O}(\log N) but we need to count the occurrences of each of \mathcal{O}(\log N) dominating numbers in the \mathcal{O}(\log N) nodes queried. Therefore, the total complexity per query is \mathcal{O}(\log^2 N) with hash maps and \mathcal{O}(\log^3 N) with normal maps.
ALITER
There is one more way to solve this problem that uses randomisation. For a subarray A[l..r], choose any number x randomly; the probability that this number is the dominating number is \frac{1}{2}. We can check by counting the number of its occurrences using the CountOccurrences structure we build before. If it is not the dominating number, we can choose some other number and try. So, in i such iterations, the probability that we don’t find the dominant number is \frac{1}{2^i}. As we can see, this decreases exponentially. Thus, after 20 or so iterations, if we find that there is a dominant number, then we output that number, otherwise, we can be pretty sure that there isn’t a dominant number, i.e., the subarray isn’t dominating.
Please see editorialist’s/setter’s program for implementation details.
COMPLEXITY:
\mathcal{O}(N\log^2 N)