MBOARD - Editorial

pragrame · February 12, 2013, 6:20pm

Problem Link:

Practice

Contest

Difficulty:

Medium

Pre-requisites:

Data Structure (rank-queries on dictionary type D.S.)

Problem:

You are given a NxN board, with each cell initially 0. You are also given queries of the form

**RowSet i x:

ColSet i x:**

which set the i’th row/col to x (0 or 1).

Also, you are given queries of the form

**RowQuery i:

ColQuery i:**

which ask for how many 0’s are there in the i’th row/col.

Quick Explanation:

Imagine you are solving the problem for just RowQueries (the solution for ColQueries works by same logic when you take the transpose of the board). So you are asked for the number of 0’s in a particular row.

Now, lets see what happens: when you get a colset query, then ALL rows’ values get shifted to what it is set to. So by and large, your row is going to look like ???x??? (with the x in the corresponding column).

When you get a rowset query, that entire row gets overwritten. The point therefore is, you would need to consider only future colset queries for this particular row.

Hence, we need to ask; given a rowquery i:

(a) what was the last time that rowset i was called.

(b) what happened in terms of colset after that time.

Therefore, we store Time-Stamps.

For each [Row|Col]Set i [0|1] we store a timestamp. We also delete the previous stored timestamp of that row/col.

Now, given a RowQuery i, we first ask: was that row last set to 0 or to 1.

Then, if it was set to 0, at time ‘t’, then we ask, How many ColSet ‘1’ exist after time ‘t’. This can be done using Binary Indexed Trees in O(logN) time. The answer is then N - #1-cols.

Similarly, if it was last set to 1, then we query how many columns were set to 0, and answer #0-cols.

As mentioned earlier, combining both RowQuery and ColQuery can be done using the exact same logic when you consider the board as its transpose.

Explanation:

Let us first consider 2 simple versions of the problem.

The first simple version asks for the solution using only RowSet and RowQuery. Now, given a RowSet i x call, you would change the entire row i to x. Hence, it looks like xxxxxxxxx now. And you can easily tell by what the last call was whether your answer is 0 or N.

The second simple version case asks for the solution using only ColSet and RowQuery. Now, a ColSet i x is going to make the i’th column x and the row that you query for will look like ???x??? with the x at the i’th position. This is over all the rows, and hence irrespective of row, when you are asked for the count, you will know what the row looks like just by which all columns are set, and can hence answer the query easily here too.

The problem however, asks for given both RowSet and ColSet queries which are interleaved between each other, and then you are given the RowQuery i, what will the answer be?

Let us concentrate on row i and look at its history. Row i will be affected whenever: “Rowset i x” is called, or whenever “Colset j x” is called for ANY j. Further, a single “Rowset i x” overwrites ALL “Colset j x”. However, once we are given the last “Rowset i x” query, then this case pretty much boils down to the second simple version of the problem. Hence, the important thing to store, is the last point we made an update to row i.

Hence, if we store the timestamp when we make updates, then we will be able to tell what queries we are too consider, and what to discard. Thus, let us see what information we can glean by just storing all possible time stamps.

Consider the arrays RS0TS[], RS1TS[], CS0TS[], CS1TS[] (abbreviations for “RowSet0TimeStamp” etc.) which store the time points where things were last changed. Here, RS0TS[i] stores when was the last time that you got a “Rowset i 0”. Now, given RowQuery i, we look at row i and we know.

If RS0TS[i] > RS1TS[i], then the row was last completely set to 0, else it was set to 1.

If it was last completely set to 0, at say time point t (= RS0TS[i]), we then need to count, how many columns were set to 1 after this point. That is, we need to know, for how many ‘j’, do we have CS1TS[j] > CS0TS[j] and CS1TS[j] > t. Subtracting this count from N is the answer to our query here. Similarly, we need to take cases for RS1TS[i] > RS0TS[i], and for ColQuery i etc.

The main issue with this is : finding the count of how many ‘j’ satisfy the above property, if implemented naively, is O(N) time. We need to do this faster.

Using BIT as a rank-query dictionary-type DS:

Suppose on top of storing arrays of time-stamps, what if we used something like a STL-set as well. So we have now say sets RS0, RS1, CS0, CS1. When we get a RowQuery i, we can do the check for whether RS0TS[i] > RS1TS[i] and find whether it was 0 or 1 last, and then we can binary search in the set CS1 for the time-stamp value RS0TS[i]. We now just need how many values are there between this and the end of the set CS1.

This is called the Rank-Query, where we are required to ask how many elements are there in a “set” whose value is <= x. In a STL set though we can binary search for x’s position, taking the difference with the beginning (or end) of the set is an O(N) operation.

This can however be accomplished by using a Binary Indexed Tree (abbrv. BIT). To see this, let us look at our set as an underlying frequency array. That is, freq[i] = number of times element i occurs in our set. Now, rank-query (x) is just summation (i <= x) freq[i].

Thus, if we were to instead construct a BIT on this underlying frequency array, we would be able to do all that an STL-set offers (inserting i is simply freq[i]++, searching for i is simply freq[i] > freq[i-1], deleting i is simply freq[i]–), as well the rank-query.

Thus, we now have our complete solution that takes O(logN) time for each RowSet, ColSet, RowQuery, ColQuery operation. Hence, time complexity = O(QlogN).

Algorithm:

On RowSet i 0, at query T:


if RS0TS[i] > RS1TS[i]
	RS0.increment(RS0TS[i], -1); //delete the previously stored time-stamp of RS0TS[i]
else 
	RS1.increment(RS1TS[i], -1); //delete the previously stored time-stamp of RS1TS[i]
RS0.increment(T, 1); //insert time point T into RS0
RS0TS[i] = T;

Similar code holds for ColSet i 0, at query T, and for RowSet i 1 and ColSet i 1.

Now, for RowQuery i, at query T:


if RS0TS[i] > RS1TS[i]
	T0 = RS0TS[i]
	ones = CS1.query(INF) - CS1.query(T0)
	return N - ones
else 
	T0 = RS1TS[i]
	zeros = CS0.query(INF) - CS0.query(T0)
	return zeros

And similarly for ColQuery i, at query T.

Practice on Rank-Query:

The spoj problem ORDERSET is a very classic problem that uses just rank-query. You could play around using STL set, segment trees, BITs, maps/vectors etc. on this problem.

Setter’s Solution:

Can be found here

Tester’s Solution:

Can be found here

anudeep2011 · February 12, 2013, 6:48pm

But the most important part here is to avoid duplicates?

No note on that!

Example Test Case

RowSet 1 0
ColSet 1 1
RowQuery 1
RowSet 2 0
ColSet 1 1
ColSet 1 1
RowQuery 1
RowQuery 2

In this particular case
Query #1 needs to consider first ColSet 1 only

Even Query #2 needs to consider the same

Query #3 has to consider 2nd ColSet 1 only

(or am i missing sometng?)

Practice Problem - DQUERY @ SPOJ

deepai_dutta · February 12, 2013, 11:23pm

I have only recently learned about BIT and could not solve this problem.Its yet unclear to me what frequency is actually being stored in the BIT here. Please could anyone explain that once?

prakhs123 · February 13, 2013, 2:41pm

Could any one explain me the part after heading “Using BIT as a rank-query dictionary-type DS:”. Its very unclear.

madmax3 · February 13, 2013, 4:17pm

I cant figure out why in java I am getting RTE NZEC CodeChef: Practical coding for everyone

pragrame · February 13, 2013, 7:27pm

@prakhs123, here’s a Detailed Description.

Using “BIT” as a “rank-query” “dictionary-type” DS. Lets break up the terms one by one.

“Dictionary” type Data Structure.

Here, you are basically given the problem of having some universe of objects, and you wish to perform the operations of Insert(object)

Search(object)

Delete(object)

Lets say you solve this using an array. You can either maintain the array as sorted, or just insert new elements at the end.

If it is sorted, then Insertion takes O(N) time, search takes O(logN) time if you binary-search, deletion takes O(N) time.

If it is unsorted, then Insertion takes O(1) time, search takes O(N) time since you have to linear search, deletion takes O(N) time.

Hence, people designed the Binary Search Tree. Here, Insertion, Search and Deletion takes O(height) time. But height can go as bad as O(N).

So they had to modify it to balanced binary trees. Examples include the AVL tree, Red-Black trees etc. In these, the height was guaranteed to be O(logN), which gave O(logN) time bounds for Insertion, Search and Deletion.

Rank-Query

Imagine you have this Dictionary Data Structure: it gives you insertion, searching, deletion operations. Lets say they are all O(logN) time, since thats pretty fast. We now ask, given an object x, how many objects in this dictionary are <= x?

We could do a search for x in O(logN) time. That would give us something of a “position” in the Data Structure, and all elements before that position would be “<=” x. For example, the STL set implements a red-black tree for performing the dictionary operations. So given this x, we can binary search for the position in the STL set where to insert x. However, finding how many elements are <= x, would require us to traverse through the set from the beginning to element x, which is O(N) time.

We need to get this down to O(logN) too!

Binary Indexed Trees (BIT)

The topcoder tutorial linked in an earlier comment:

would be of use.

The key thing to note is that you can use the BIT to give you cumulative frequencies/prefix sums as well as quick updating. So, if we store our “dictionary” as a frequency array, where freq[i] tells you how many objects are there of value ‘i’, then querying for a prefix sum gives you the rank-query. BIT allows you to update and query prefix sums in O(logN) time.

Hope its clear now.

bodmas · February 13, 2013, 8:49pm

Boy! I guess it takes a lot of hardwork to become a competitive programmer. Because I can’t imagine, it looks like one has to know all the theories in computer science and mathematics before he can beat his chest to a programming contest. Kudos to you guys. I doff my hats.

karan173 · February 15, 2013, 2:16am

You have recommended the problem ORDERSET. But looking at the constraints of the problem x<=10^9, i dont think it’ll be possible to solve with BIT coz of memory overflow while creating such a large array? So how to do it?

prakash1529 · February 15, 2013, 6:45pm

Are there any corner cases for this problem. I generated a large test case where N, Q == 5*10^5, and diff’ed my answer with one of the accepted solutions. I find 0 diffs. I am definitely missing some corner case. Can somebody give some corner cases. Below is my code.

http://www.codechef.com/viewsolution/1841745

I’ve exactly implemented it as ALGORITHM given in editorial.

kavishrox · February 16, 2013, 2:36pm

Setter’s solution is not understandable. Probably some comments should have been used

deepai_dutta · February 17, 2013, 3:43pm

I finally Understood how its being done and have implemented it successfully. Apparently the index of the BIT are the Time values and we increment those indexes in the BIT which corresponds to the time when we made the Set operations and also decrement those indexes in the BIT when the previous set operation was done.

codetrash · February 17, 2013, 4:03pm

Could anyone explain the need to deleting elements from BIT?
Specifically this code -

if RS0TS[i] > RS1TS[i]

RS0.increment(RS0TS[i], -1); //delete the previously stored time-stamp of RS0TS[i]

else

RS1.increment(RS1TS[i], -1); //delete the previously stored time-stamp of RS1TS[i]

ujjwalprakash · February 21, 2013, 10:59pm

Really a nice editorial.It would have been better if the setter’s and tester’s solution were written with comments.
@admin : the time limit is very tight with this as well because i got TLE with fast Input and cin but when i replaced cin with printf its AC.It would be better if you look at these issues because even if ones algorithm is correct its not good to get TLE just due to “cin” and “cout” or sometimes even with “scanf”. These never happen at codeforces or any other coding site

vivekvelankar · February 21, 2013, 11:54pm

Can we have a array, data[2][N] where at each index in first row we store number of zeros in N rows, and in 2nd row, we store number of zeros in each column.
At each “Rowset/Colset i 1/0” operation we will change the counts accordingly.
While querying we just need to look up this array at correct indices.

Correct me if wrong.

vineetpaliwal · February 22, 2013, 12:14am

@vivekvelankar : How do you update the counts with each operation ??? Suppose i do Rowset i 0 . Now one entry in each column has become 0 . So you have update N entries in your column . For Q queries , it will take , N * Q updates , Since N , Q < 2 * 10^5 . This is not acceptable . Also how do you know in this case which columns will have their count incremented and which not , as some entries may already be zero , while some may not .

pragrame · February 12, 2013, 7:24pm

We’re storing pretty much only the values of CS1TS[] in our CS1 BIT. So when a value gets overwritten, we delete the previous time-stamp from the BIT.
I suppose it wasn’t specified clearly in the Editorial - I’ll update it now, thanks for asking.

rajneesh2k10 · February 12, 2013, 9:46pm

Correct! I had the same doubt while reading the solution.

rajneesh2k10 · February 12, 2013, 10:25pm

I think if the solution code is well commented, readers would definitely need less effort and time to understand it. Also it’ll give a better understanding of the approach.

karthsharma · February 13, 2013, 12:20pm

you can check out the editorial on topcoder: Here’s the link-

rajneesh2k10 · February 13, 2013, 1:13pm

I knew someone would come up with this link but he has not asked about BIT if you read it carefully.