CHEFLKJ - Editorial

Author: Misha Chorniy
Tester: Karan Aggarwal
Editorialist: Pushkar Mishra

Medium

Segment Trees

PROBLEM:

Given is an array A of length N. There are two types of operations that we need to handle on this array:

• 1 x y: change the value of A[x] to y, i.e., A[x] = y.
• 2 l r: Answer in a “Yes” or a “No” whether the subarray A[l..r] is dominating or not.

Here dominating means that there is a number in the subarray that appears in the subarray at least as many times as one more than half of the length of the subarray.

EXPLANATION:

The nature of the problem clearly hints towards segment trees. We need to efficiently perform two kinds of operations: update a value at an index, and answer for a subarray whether there exists a number in it which appears at least (one more than half its length) times in it.

Clearly, There isn’t a notion of an elegant lazy propagation here since the updates we perform are to single elements.

Let us start by making some observations about dominant arrays and subarrays. Let us consider the entire array A[1..N]. The first important observation is that it will only be dominating if either subarray A[1..\frac{N}{2}] is dominating or subarray A[\frac{N}{2}+1..N] is dominating. This is simple to prove. We use proof by contradiction. Assume that neither of the subarrays are dominatinng. That means all the numbers in either of the subarrays appear at maximum \frac{N}{4} times only. That means, if we combine the two arrays, there is no number which appears more than \frac{N}{2} times. This is a contradiction. Therefore, A is not dominating.

We can further extend the observation to say that if A[1..N] is dominating, then the dominating number must either be the dominating number of A[1..\frac{N}{2}] or the dominating number of A[\frac{N}{2}+1..N]. This follows from the same arguments that we made above.

This leads us to our method of building our segment tree. Each node of the segment tree stores 2 fields: an integer ‘Dominant’ which stores the dominant number in the subarray covered by the node, and a map (it can be a hash map for better bounds) ‘CountOccurrences’ which counts the number of times each number appears in the subarray covered by the node. The memory used in this is \mathcal{O}(N\log N) because each number is stored in \log N nodes only. So memory isn’t an issue in our structure at all.

The pseudocode for the build operation is pretty straightforward now:

void buildNode (node r):

if(r == leaf) {
//let us say that this leaf corresponds to A[index] of the given array.

SegTree[r].CountOccurrences.clear(); // clear map of any garbage

// Increment the occurrence of the number at this leaf
SegTree[r].CountOccurrences[a[x]] += 1;
SegTree[r].Dominant = a[x]; // a number dominates its own cell

return;
}

if(r != leaf) {
buildNode(r->left);
buildNode(r->right);

length = r->length; // length of the subarray at this node.

// We merge the maps of children. Merging simply means adding the
// number of occurrences of each number in either of the children maps
// to the parent map.
SegTree[r].CountOccurrences = merge(SegTree[r->left].CountOccurrences,
SegTree[r->right].CountOccurrences);

if(SegTree[r].CountOccurrences[SegTree[r->left].Dominant] > length/2) {
SegTree[r].Dominant = SegTree[r->left].Dominant;

} else if (SegTree[r].CountOccurrences[SegTree[r->right].Dominant] > length/2) {
SegTree[r].Dominant = SegTree[r->right].Dominant;

} else {
// indicates that subarray at this node is non-dominating.
SegTree[r].Dominant = -1;
}
}
}


The update routine is very similar. We just subtract 1 from the number of occurrences of the old value and add 1 to the number of occurrences of the new value at the correspoing leaf plus all its ancestors. Recalculating the ‘Dominant’ variable is done in the same way as the build function.

With this data structure, how can we query whether a subarray is dominant or not. We use the same logic as before. To query a subarray, we will be visiting at most \mathcal{O}(\log N) nodes in the segment tree. Each of these nodes will have its dominant number (or -1 in case it is not dominating). If the subarray we are querying about is dominating then the dominating number must be one of the dominating numbers of the nodes we visit in the query operation. Therefore, the first thing to find out is the nodes that the query function is going to visit in the segment tree. Then for the dominating number of each of them, we just count how many times they appear over all the nodes that the query function visited. If any of the dominating numbers appeared (more than half the length of the subarray) times, then the subarray is dominating; otherwise no.

What is the complexity of this approach? Build is same as the standard segment tree function which takes \mathcal{O}(N\log N). This is when we are using hash maps in our segment tree nodes. If we use normal maps, then the complexity will be \mathcal{O}(N\log^2 N). Similarly, update takes \mathcal{O}(\log N) with hash maps and \mathcal{O}(\log^2 N) with normal maps. The query function on the segment tree itself is \mathcal{O}(\log N) but we need to count the occurrences of each of \mathcal{O}(\log N) dominating numbers in the \mathcal{O}(\log N) nodes queried. Therefore, the total complexity per query is \mathcal{O}(\log^2 N) with hash maps and \mathcal{O}(\log^3 N) with normal maps.

ALITER

There is one more way to solve this problem that uses randomisation. For a subarray A[l..r], choose any number x randomly; the probability that this number is the dominating number is \frac{1}{2}. We can check by counting the number of its occurrences using the CountOccurrences structure we build before. If it is not the dominating number, we can choose some other number and try. So, in i such iterations, the probability that we don’t find the dominant number is \frac{1}{2^i}. As we can see, this decreases exponentially. Thus, after 20 or so iterations, if we find that there is a dominant number, then we output that number, otherwise, we can be pretty sure that there isn’t a dominant number, i.e., the subarray isn’t dominating.

Please see editorialist’s/setter’s program for implementation details.

COMPLEXITY:

\mathcal{O}(N\log^2 N)

SAMPLE SOLUTIONS:

3 Likes

Was (O(log n)^3) supposed to pass with normal maps? I tried this by maintaing a sorted subarray in every subsegment and then counting kth element in given subsegment which runs in ((logn^2 * log(10^9)) per query, but it gave tle

it can be solved in O((N+Q)logN) as well

Suppose we have an array with N elements , if you pick up any 2 different elements with unequal value and remove them from the array and keep doing this until only 1 distinct value is left , that value is the only possible candidate for answer

Now To make this faster , we store this information in segtree , segtree node denotes that if we do this stuff on segment [l , r] , what will be the value left and its frequency.
Now we have to check the frequency of an element from l to r , we can either do it online with a bst or do it offline with a simple bit in nlogn time.


[1]

[1]: https://www.codechef.com/viewsolution/10897065
11 Likes

My solution uses square root decomposition. For segments of length < 2000, just loop over and calculate frequency of elements. Report answer.

For segments of length > 2000, the dominant element should have frequency > 1000. There will be at most 200 such elements. So I maintain one BIT (binary index tree, or seg tree) for each of those elements, loop over all 200, find the number of elements in range l…r using BIT.

Update is 2 fold, update the base array. Also update BIT.

Update is O(log N), you update at most one BIT.
Query is O(2000) or O(200 * log N).

6 Likes

Even Solution using maps (Complexity O(log^3n) per dominant query) pass within time limit. Here is


[1].

[1]: https://www.codechef.com/viewsolution/10901084
2 Likes

I have also passed with (somehow) “not good” complexity.

Worst case [H - means complexity of hash-map]:

Update query: O(H^2log(10^9))

Get query: O(sqrt(N)*H+sqrt(N)*Hlog(10^9)*log(N))


[1] (well just to see, it is barely readable :/ )

Thought: I had (for each distinct numbers) a Fenwick-Tree [BIT Tree] in which I held positions. I also held a "priority queue" in which I stored numbers sorted by total frequency.

Get query was then to choose between two:

A) If the range was lesser than sqrt(N), I brute-forced it.

B) If the range was higher than sqrt(N), I looked upon all distinct numbers which had frequency greater than "RANGE/2" and checked (by Fenwick) how many are there on my range. (PS: there could be at most sqrt(N) of these numbers)

Anyway I'm wondering how it managed to pass TLE :)

Have nice day ^_^

[1]: https://www.codechef.com/viewsolution/10897137
4 Likes

I feel the explanation for dominant sub-arrays in the editorial is insufficient since it deals with only sub-arrays of size N/2. This is not always the case - especially in case of queries where the query range can span multiple segments(sub-arrays). I think this would be better way to put it,

Over a range(i,j) of length N, let there be k sub-arrays {1,2,3…,k} each of length {n1,n2,…nk} respectively. If a majority element ‘e’ exists in range(i,j) then one of the k sub-arrays must have e as the majority element.
Proof is by contradiction (along the same lines as in the editorial),

n1 + n2 + … + nk = N

Assume e exists which means freq(e) in (i,j) > N/2

Now lets say none of the k sub-arrays have e as the majority element. Then,

freq(e) in sub-array {1] <= n1/2

freq(e) in sub-array {2} <= n2/2

freq(e) in sub-array{k} <= nk/2

Therefore, freq(e) in (i,j) <= n1/2 + n2/2 + … + nk/2 <= N/2 which contradicts our assumption that e is a majority element.
So at least one of the k sub-arrays must have e as the majority element.

Thanks!

1 Like

Could anyone tell me where I could be going wrong in this solution?
It is very similar to the editorial and though the running time is O(Q * (log N)^3), I am getting a WA verdict and not TLE. So could anyone tell me if there’s something wrong with the logic?

Can someone please explain the meaning of the phrase ‘elegant lazy propagation’ in the line ‘Clearly, There isn’t a notion of an elegant lazy propagation here since the updates we perform are to single elements’

1 Like

I have doubt in complexity for each query.Anyone please explain how complexity is O(logn^3) by map?

getting WA for this link text code. Any help?

How can a segment be dominating if either of subarrays are dominating?? Consider a case [1,2,3,4][2,2,2,5],here 2nd subarray is dominating but when we merge them , the resulting array is not dominating!!

1 Like

No, it was not supposed to pass.

Seems, like, you can use Square Root Decomposition on each and every question

3 Likes

Won’t there be total 500 such elements with frequency > 1000?

Initially we have 10^5 elements. Even if you consider the 10^5 query operations as “append” operation instead of replace, at the end you would have 2 * 10^5 elements => at most 200 with >= 1000 frequency

@anudeep2011, yes, you are right.

@vsp4, I think your solution gives tle because of the ordered statistic set used in your solution. Though, the complexity of ordered statistic set is as desired, but constants are too large. It happened with me too when orderedd statistic set gave tle, but bit with binary search passed easily.

1 Like

@vsp4 Can you tell me what is the idea of your solution?