ANUSFR - editorial

elfus · October 20, 2014, 12:31am

PROBLEM LINK:

Practice
Contest

Author: Anudeep Nekkanti
Tester: Constantine Sokol
Editorialist: Florin Chirica

DIFFICULTY:

Hard

PREREQUISITES:

offline algorithm, greedy, segment tree, two pointers technique

QUICK EXPLANATION

We have 5 conditions. x, minL, maxL, L, R.
We eliminate x by processing each x separately.

We eliminate L by processing from Right to Left.

For each i, we calculate min_len[i] and max_len[i] with meaning array[i…i + min_len[i]-1] and array[i…i+max_len[i]-1] contain x characters. For each i, we update a segment tree for range [min_len[i], max_len[i]] with value i.

For eliminate minL and maxL constrains, we query segment tree on range [minL, maxL] and find minimal i to respect all conditions so far.

One final constraint is R.

I claim that if a solution does not respect R constraint for i, it won’t respect it either for i’ > i. If solution is respected for i, find minimal j such as [i…j] respects all conditions and print i and j. Otherwise, print -1 -1.

EXPLANATION

This problem looks very complicated at first sight. For a query, you’re given 5 numbers: x minL maxL L and R. You need to output (if they exist) two numbers i and j such as:

substring a[i…j] contains exactly x different characters
length of a[i…j] is between minL and maxL
L <= i <= j <= R
If there are multiple solutions, pick the one with minimal i. If there are still multiple solutions, pick the one with the minimum j.

If you try to solve all those conditions in the same time, you’ll probably have a hard time. Instead, when a problem with multiple restrictions is given, it’s really useful to simplify them, one by one, until you’re done. This editorial will do exactly this: we’ll simplify the problem step by step.

When you’re dealing with complicated queries, it can be useful to solve them offline. This means, you read the queries, remember index of query (its position in the list of queries), arrange queries in an order you want, solve them, then sort again by index and print the answer.

Here, a first problem is the x values differ between queries. But an offline algorithm can fix this. So let’s sort queries increasing by x value. Obviously, now all queries that have same x value will form a contiguous subsequence in queries list. Let’s solve now all queries that have equal x values. By now we’ll assume x is constant.

Let’s think at a brute force now. For each query, we iterate with i from L to R. Let’s iterate with j from i to R. We keep track of number of different characters of a[i…j]. This number has two possibilities:

j-th character is not fount yet in a[i…j-1]. This means number of different characters of a[i…j] is number of different characters of a[i…j-1] + 1
j-th character is already found in a[i…j-1]. This means number of different characters of a[i…j] is number of different characters of a[i…j-1].

Obviously, an a[i…j] will be good if it has x characters and minL <= j – i + 1 <= maxL. This is how you answer in O(N ^ 2) per query. But we can do much better…

For a fixed i, let’s define f(j) = number of different characters of a[i…j]. You can notice that function f is monotone (use 1 and 2 from above for a proof). Since f is monotone, all f(j) = x will form a contiguous subsequence. This means, we can find minimal value of j such as f(j) = x and maximal value of j such as f(j) = x. Let’s note them by jmin and jmax. Then, f(j) = x if and only if jmin <= j <= jmax. Let’s define min_len[i] = jmin – i + 1 and max_len[i] = jmax – i + 1 (this is length of a[i…jmin] and of a[i…jmax]).

Our next step is to calculate min_len and max_len for each i. This is a simplification of problem, because if for a given i, we pick a j from i + min_len[i] – 1 and i + max_len[i] – 1, we know for sure a[i…j] will have x different characters.

We can calculate min_len and max_len in O(N) time, using a technique called “two pointers”. I’ll describe how to get jmin and jmax values, for each i (calculating min_len and max_len is straight-forward then).
Suppose I calculated for an index i values jmin and jmax. For lack of confusion, jmin and jmax values for i + 1 will be noted by jmin’ and jmax’. Let’s prove that jmin’ >= jmin and jmax’ >= jmax.

We know for sure that a[i…jmin] contains x characters, but a[i…jmin-1] contains x – 1 characters. Let’s see what happens when we move from a[i…j] to a[i+1…j]:

character from position i also appears in a[i+1…j] => number of different characters remains the same
character from position i does not appear in a[i+1…j] => number of different characters is decreased by 1.

So a[i+1…jmin-1] will contain at most number of characters of a[i…jmin-1], so it will contain at most x – 1 characters. So jmin’ can’t be jmin – 1. It if can’t be jmin – 1, it can’t also be 1, 2, …, jmin – 2, because of monotony property. So we get that jmin’ > jmin – 1, so jmin’ >= jmin.

By a similar reason, array a[i+1…jmax] can contain at most x characters. It it contains x – 1 characters, then obviously jmax’ > jmax. If it contains x characters, then jmax’ >= jmax.

So we get jmin’ >= jmin and jmax’ >= jmax. Why is this helpful?

Complexity of this part is number of operations for i = 1, for i = 2 and so on. We can define an operation as an increase of jmin and jmax.

But we get that jmin and jmax <= N, so the numbers can increase at most N times. So we get O(N) at this step.

How to implement this? I’ll explain only for jmin, jmax is similar. Let’s keep a cnt[i] = number of times character i appears. Also, let’s keep distinct_cnt as number of different characters in range [i…jmin]. When we move from i to i + 1, we decrease cnt[a[i]] by one. If cnt[a[i]] becomes zero, we decrease distinct_cnt by one. Now, we advance jmin as long as distinct_cnt = x – 1. We increment cnt[a[jmin]] and when a cnt[a[jmin]] is incremented from 0 to 1, obviously a new character is found.

Let’s come back to the problem. We need to find 2 indices i and j such as:

Length of a[i…j] is between minL and maxL
Length of a[i…j] is between min_length[i] and max_length[i]
L <= i <= j <= R
If there are multiple solutions, pick the one with minimal i. If there are still multiple solutions, pick the one with the minimum j.

Sadly, it does not look friendlier now, except we replaced “distinct x characters” restriction by another restriction. We can try to use one more time an offline algorithm to get rid of a restriction.
The only restriction that seems easy to remove is “L <= i <= j”. Let’s sort queries in decreasing order by L. We’ll process them in this order. When we process a query for a given L, we’ll make sure all array a[L…N] is processed (more about what processed means here later).

Let’s simplify problem more now. Suppose somehow (more about that later, too) we find all indices i, j such as conditions 1 and 2 are respected. The only condition that need to be checked now is j <= R. We can use then 4 to pick from all pairs (i, j) which respect 1 and 2.

Here is a key observation. Suppose we fix an index i. Obviously, we need to check minimal index j such as array[i…j] respect conditions 1 and 2 (it’s good to pick minimal j because, as small j it is, as high the chances are that j <= R).

How to find j? We have 2 conditions to respect:
lenght of array[i…j] >= minL
length of array[i…j] >= min_length[i]
We get that j = i + max(minL, min_length[i]) – 1. So, for a fixed i, optimal j is also fixed. If optimal j for a given i is greater than R, then for sure optimal j of an i’ > i is also greater than R. Formally,
Let a[i…j] and a[i’…j’] two subarrays that respect 1 and 2 . If j > R, then for sure j’ > R. Before reading the proof, think for a while why this happens!

Proof: We need to proof that for every i’ > i, a valid substring starting at i’ ends no before a valid substring starting at i. Let’s do counter proof…

Suppose there are two subarrays [i…j] and [i’…j’] such as i < i’ and j’ < j. If we proof j’ = j, we get a contradiction (since we assumed j’ < j) and this proofs our theorem.

Given conditions mean that [i’, j’] is a subarray of [i, j]. Also, since subarray a[i’, j’] is valid, it contains exactly x characters. But a[i, j] also contains x characters. This means a[i…j’] contains x characters. Since j’ < j and a[i…j’] contains x characters, this means j’ would be minimal index such as a[i…j’] to contain x characters. This means j = j’ (since we defined j to be minimal index such as a[i…j] contains x characters), which ends the proof.

Why is this proof useful? It’s enough to take minimal i such as a[i…j] to respect all conditions (except j <= R). Then, we can check if j <= R. If it’s not, we can output -1 -1 (no other i’ > i will meet this restriction). Otherwise, we can output i, j. Since we pick minimal i and j is by definition the minimal index possible, condition 4 is also met.

We’ve reduced the problem to: find minimal i such as

there exist a j such as length of a[i…j] is between minL and maxL
there exist a j such as length of a[i…j] is between min_length[i] and max_length[i].

Let’s suppose current query has L_cur (value of L) and previous one has L_last. We need to add all indexes i from [L_cur, L_last) into something. Then, from that something, pick minimal i which respects 1 and 2 .
Let’s add a new i into something (=a segment tree, as you’ll see) as described above. We can update a segment tree from range [min_length[i], max_length[i]] with value i (since values of i are processed in decreasing order, after an update the range will always contain the minimal value).

Why we use the segment tree? You use it to find, for a range [A, B], minimal value of i such as there exist a j and length of a[i…j] is between [A, B] and [min_length[i], max_length[i]]. When we update [min_length[i], max_length[i]] with value i, it’s like we say that condition 2 allows us to have an array a[i…j] with length between min_length[i] and max_length[i] starting at i. Any length between min_length[i] and max_length[i] starting at i will be fine, it won’t break condition 2 .

Let’s see what a query is now. When we query a range [A, B], we ask for subarrays of lengths between A and B. Update always guarantees condition 2 is respected. For a query, we can simply query range [minL, maxL] and condition 1 will be magically respected, too

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution
Tester’s solution

Time complexity is O(alfa * N + Q * log(Q) + Q * log(N)), where alfa = 26. O(alfa * N) is necessarily for “2 pointers” part, Q * log(Q) operations are necessarily for solving offline and Q * log(N) is needed for querying the segment tree Q times.

lonecodertaken · November 30, 2014, 10:17am

You put so much effort into this editorial man! Amazing work. I lost track going through your reaoning multiple times but finally understood it. Great work. So many thanks