BINSTR - EDITORIAL

taran_1407 · November 28, 2018, 3:49pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2

Setter: Danya Smelskiy
Tester: Zhong Ziqian
Editorialist: Taranpreet Singh

DIFFICULTY:

Medium-Hard

PREREQUISITES:

XOR Operation, Binary Search, Persistent Data Structures, Segment Tree, Tries and a lot of Implementation.

PROBLEM:

Given N binary strings numbered from 1 to N, we are to answer queries of form (l,r,X) where we have to output Minimal index i such that max(A_l \oplus X, A_{l+1} \oplus X, \ldots ,A_R \oplus X) = A_i \oplus X.

SUPER QUICK EXPLANATION

In case length of string X is Greater than or equal to Maximum of length of all strings in range [l,r], we can just ignore the bits of X over and above the maximum length of binary strings in range [l,r] and find out the best string using remaining bits of X.
Otherwise we have at least one binary string having the length greater than X, say length D. So, to maximize XOR, It is optimal to choose the set of string having length D. If multiple strings, choosing the set of string having maximal prefix up to length D-|X|. If still multiple strings having the maximal prefix, We proceed with all strings in this set and try to choose the best string using bits of X.
To get the best string, i.e. to maximize XOR with X, we always try to choose the set of strings out of our current set of strings, which has the current bit opposite to current bit of X. In case we have no such string in this set, we are forced to use the set of strings having the same bit.
For doing all this, we can build an array B which stores the number assigned to each string had the strings been in sorted order of strings. By building a persistent tree with values inserted from right to left in order of B[i], we can check if a range in B contains at least one value belonging to the query range.
In case of the query of type 2, For finding rightmost string having prefix just less than or same as the length of the longest string in range by another segment tree giving the length of the common prefix in a range.

EXPLANATION

This problem probably has a variety of solutions, but I’m focusing on setter’s approach to this problem. Feel free to share your approach in answers.

Let us consider a basic solution, where we insert all strings into a trie having set at each node denoting indices of strings passing through the current node. For answering queries, we need to move over the trie steps equal to the maximum of length of all strings. For our problem, this length can go up to 10^6 in the worst case, which will lead to TLE for answering 10^5 queries. Clearly, we need a faster approach.

We can create an array B storing indices of strings, sorted in order of the value of the binary string. We have the index of smallest string at first position, second smallest string at the second position. Note that here, a string is smaller than other string, if it has the smaller length than other string, or it has the same length as other string and is lexicographically smaller than the other string.

Let us try to answer queries now. We have query (l,r, X) and suppose, we have a range (ll, rr) initially (1,n) denote the current range over the B array. We can classify queries into two types. One in which range (l,r) do not have any string larger in length than the length of the query string, and queries of the second type will have at least one string which has the length greater than the length of the query string.

Consider only queries of the first type now. See, In this case, all bits of X over the length of the maximal string in the range (l,r) are useless and can be ignored. So, we can just maintain a range (ll,rr) of B array denoting the range in which our answer lies in. While processing the ith bit, we maintain the invariant, that all bits higher than ith bit are same for all positions (ll,rr) in the range B. We also need a helper function check, which tells us, whether (l,r) of the A array contains any string from range (ll,rr) from the B array or not.

Now, How do we handle our query, assuming we have range (ll,rr) in the B array and query (l,r, X) while processing the i bit. Since we ensure that all strings in the range (ll,rr) have all bits significant than ith bit same, then we can find using binary search, the position K which is the leftmost position in the range (ll,rr) having the ith bit on. This way, if ith bit of X is off, we would want to have ith bit of answer string on, so we want to check whether in range (K, rr) of B array, do we have any string from range (l,r) of original order of strings, using our helper function. If yes, we update range (ll,rr) to (K, rr) and move toward lower bit. If helper function returns false, we know, all strings in the range (l,r) have ith bit off, and we update (ll,rr) to range (ll, K-1) and move to lower bit. After last bit, if we still have multiple strings in the range (ll,rr). We just find the smallest index out of those strings.

This way, if we can get the helper function to work, we can easily answer queries of the first type. One way is to make a segment tree with an ordered set of indices at each node, storing indices of strings which lie in the range (ll,rr) in the B array. Using this, we can query on this tree to get the answer to our function in O(log^2N) time. Since the Sum of query string length overall queries may lead up to 10^6, O(log^2N) per query is too slow. We need a bit of persistence.

Here, we shall build persistent segment tree with range min query where each version of the tree will represent a suffix (or prefix, your choice of implementation) of positions of original array inserted at their positions in the B array. This way, we can check, in the version of segment tree representing suffix (l,n) of the array A contain any element in the range (ll,rr) if the minimum index inserted in range (ll,rr) is \leq r or not.

This way, we have answered queries of the first type. But what happens when there’s a string in the range (l,r) having the length greater than X.

Consider following

000000XXXXX (query string ,0 added to make strings equal in length)
XXXXXXXXXXX (Maximum string in query range)

We know, to get Maximum XOR, we would want the left portion of answer string to be as large as possible, and if multiple strings having a same left portion, the best string shall be decided using right portion. Due to the ordering of B, we know, all strings of the same length, having same left prefix shall form a consecutive segment in the B array, which contains our answer string. So, We need to find this interval and then, if this interval contains more than one string, we can just solve it the same way as queries of the first type.

So, we need the endpoints of the segment in the B array, which have left portion same as the left portion of the maximal string in query range. We can find the right end by simply finding the position of the largest string in query range, because, all strings in query range shall always lie to the left of this position. This can be done using a query to our persistent segment tree.

To find the left end, we know, the string at left end position has the same prefix up to the length of left portion. We can build another Segment tree answering length of the common prefix in a given range, then we can easily find the leftmost position of string which has the same prefix as the maximum string. So, using this tree, we can easily find the leftmost index in the B array, which has the same prefix as the maximal string. This way, we know both ends of the segment of the B array, which contains our answer, depending upon the right portion of the X string. We can now proceed the same way as the first type of query now, within this range of the B array, thus solving the task.

Implementation

The implementation of this problem is a bit complex, so, feel free to refer solutions below.

About Finding smallest position K in the range (ll,rr) of the B array, which has the ith bit on, given all strings in the range (ll,rr) have bits larger than ith bit same.

Since we have all bits significant than ith bit same, due to the ordering of indices in B, a prefix of range (ll,rr) (possibly empty) will have ith bit off, while the remaining suffix (possibly empty) shall have the ith bit on. This allows us to use binary search to find out minimum position K in the range (ll,rr) having the ith bit on. we may also build additional bit vector array for same.
Maximum prefix tree is just a range minimum tree, where for the base level, we manually calculate the length of the common prefix of adjacent strings in B.
In alternate implementations, string can been reversed for ease of implementation, so, the bit at position 0 corresponds to the least significant bit.
Tester uses complicated square root type implementation, not recommended, but still, you may refer, if looking for a different solution and are not afraid.

Related Problems

Two recommended problems related to tries are GPD and XRQRS. Apart from that, It is nice to practice to solve only first subtask in O(|X|) time using persistent trie where |X| is the sum of the length of all strings in the input. Apart from that, These on tries, persistence, persistence form a good reading materials for this and related problems.

For Segment tree and Binary search, I guess this would be sufficient.

Time Complexity

Overall Time complexity is O((N+Q)*logN).

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter’s solution
Tester’s solution
Editorialist’s solution

Feel free to Share your approach, If it differs. Suggestions are always welcomed.

sdssudhu · December 3, 2018, 5:55pm

It can be solved by just compressing the trie : CodeChef: Practical coding for everyone

It is a bit faster than the setter’s solution also.

vijju123 · December 3, 2018, 6:51pm

*le me scrolls down the editorial*

*le me cut-copy-paste*

<copy>

Woah!! This long editorial, s̶c̶a̶r̶y̶ ̶e̶v̶e̶n̶ ̶a̶f̶t̶e̶r̶ ̶s̶o̶l̶v̶i̶n̶g̶ ̶t̶h̶i̶s̶ ̶p̶r̶o̶b̶l̶e̶m̶. 1000x more scary as I did not upsolve the problem yet.

</copy>

Also,

SUPER QUICK EXPLANATION

I am questioning life and universe and everything else after looking at this and the length of 4 points beneath it.

(Hopefully people know the context of this by now xD)

ryan312 · December 4, 2018, 4:58pm

Editorialist’s solution page is not working. Please resolve it.

sdssudhu · December 6, 2018, 8:02pm

So here is the full explanation of my solution.

The first thing we need to notice here is that searching in the trie is impossible if all input strings are going to be of different length.

So in order to solve this problem we pad 0’s to the beginning of all input strings to make to a length maxi, which denotes the length of the largest unmodified input strings.

Second thing to notice here is that if we have a query string whose length > maxi, we can trim to maxi. If we have a query string whose length < maxi , we can pad 0’s to it.

Now the only thing left here is that the zero padding. This can take a lot of space since there can be an input string of length 10^5 and all other strings of length 1. This would take the complexity to 10^10.
So instead what we do is that we just have a count of 0’s to be padded.

For the input strings we initially insert into the trie a string of maxi 0’s and store the memory address of all these zeroes in a map whose key is the count of 0’s before it.
What this does is that we can now identify the count of 0’s required for the input string and jump directly to that index and then start adding the input string to the trie. Thus, the complexity of input is just O(len + n*log(maxi)).

Also in the leaf node we add the index of the string to a vector.

Now we have to compress the trie. Initially while constructing the trie itself we have a pointer for the next node/nodes from this node. Now we run a dfs on the trie.
Also notice here that there are a lot of wasted padded 0’s which may be not required. For this what we do is that if a child of the node does not have any element in the index vector then we mark that child as NULL.

Now we have 3 cases :
1.If a node is a leaf node(has both children as NULL) we set next as that node itself.

2.If the node has exactly one child we set the next pointer as the next pointer of that child.

3.If the node has 2 children set next pointer as that node itself and merge the indices vectors of the 2 children into the index vector of this node.

Now on doing these steps from the leaves to nodes you end up with a compressed trie.

Now the last part is querying. For this as discussed earlier we make the length of the string as maxi either by removing characters or padding 0’s. Now we query similar to querying a normal trie.

If a node has exactly 1 child go to next pointer of that.
If a node has 2 children find the index with which you can maximise the XOR. Now check if that child has atleast 1 element from l to r. This can be done by binary searching on the index vector.

Now notice that when we jump in the trie we should also jump a specific number of characters in the input string. For this in the trie we store the depth of each node. This does not get modified during the compression.
So the element in the string can be found by refering to the particular index in the string. This is done as s[depth-pad_length], where pad_length is the number of 0’s that have to be padded to the input string. If depth<pad_length] then the index of the child to be preferred is given as 1(Since 0 is maximized with 1 in xor).

joffan · December 3, 2018, 7:21pm

Thanks so much for this detailed editorial. Worth the wait

l_returns · December 4, 2018, 10:50am

Hahahaha @vijju123

taran_1407 · December 4, 2018, 11:12am

Glad you liked

taran_1407 · December 4, 2018, 11:13am

Nice

Would be good for all if you explain your solution a bit.

taran_1407 · December 4, 2018, 11:14am

Thanks @vijju123 for this comment

dextrous · December 4, 2018, 1:18pm

editorialist’s solution page is not visible

vijju123 · December 4, 2018, 5:20pm

No need to thank @taran_1407 :). You have earned this by your generous comments on FCTR editorial. Also, you’d be pleased to know that these comments are earned in a pack of at least 50 per comment :D.

l_returns · December 4, 2018, 5:26pm

XDD @vijju123
pack of 50 reminds me that answer you wrote

taran_1407 · December 4, 2018, 5:28pm

You really deserve a thanks for commenting this here, and on problem ABGAME, and for coming editorials too.

taran_1407 · December 4, 2018, 5:53pm

Posted my solution at dyc1Pq - Online Java Compiler & Debugging Tool - Ideone.com

vijju123 · December 5, 2018, 6:10pm

No issues @taran_1407 , though I doubt you anticipate my comments so much as “someone else” on the forum

vijju123 · December 5, 2018, 8:53pm

@aryanc403 - True, perhaps he is anticipating that his list of people whose comments he anticipates grows ? :o

taran_1407 · December 5, 2018, 10:41pm

I am anticipating the end of this useless conversation

vijju123 · December 5, 2018, 11:16pm

I am anticipating the end of this useless conversation :P

Because from ashes of one’s end, rises another.

sdssudhu · December 6, 2018, 9:47am

Yeah sure. Will do it by today evening.