PROBLEM LINK:
Setter-
Tester- Kevin Charles Atienza
Editorialist- Abhishek Pandey
DIFFICULTY:
MEDIUM-HARD
PRE-REQUISITES:
2-D Arrays and Strings, Probability and Expected Values, C.O.B.F. Strategy, Binary Search ,BFS on a Matrix (optional) , Pre-Processing
PROBLEM:
Given a 2-D character grid, representing color of painted houses (or if the house is not painted), we need to find the expected beauty of the city, beauty being defined as “The overall beauty of the town will be equal to the total number of beautiful square sub-grids (continuous) in the town.”
QUICK EXPLANATION:
This problem, although looks complex, is actually fairly simply. The chief idea of the solution is pre-processing to reduce complexity. What observations, or pre-processing can you make to optimize your brute force significantly better time complexity. One of the chief observations is that beauty depends on number of sub-grids and not on size of sub-grids. Hence, contribution of sub-grids with large unpainted houses is negligible! (A completely unpainted sub-grid with 100 houses contributes {3}^{-100} to the answer!!). The tester used a combination of above along with Binary Search and some formulas to bring down complexity to O(NMLogL) where L is the length of subgrid.
EXPLANATION:
This question is a deceptively simple question once you decide and plan on what to do. A Clever Optimization of Brute Force (C.O.B.F.) is needed here. The editorial is divided into several parts-
- Calculating contribution of a subgrid filled with houses of same color.
- Contribution of unpainted houses and when to discard it
- How to optimize it.
1. Calculating contribution of a subgrid filled with houses of same colors-
Lets say, we have a sub-grid like-
RRR
RRR
RRR
What is the contribution of this sub-grid to beauty? Now, there are two ways to approach this problem.
First is, we can go by the mathematical approach. We say that there are (3-0)*(3-0) squares of size 1, then (3-1)*(3-1) squares of size 2 and (3-2)*(3-2) squares of size 3. By some mathematics, we can see that the answer is \sum_{l=0}^{min(n,m)-1} (n-l)*(m-l). You can try deriving it (derivation in Chef Vijju’s Corner given for reference).
Theres, however, another method which I want to explain. The one which we will be using. This method is nothing but a simply brute force. What we will do is-
- For each house (i,j), do following-
- For each possible length L from L=1 to Length of subgrid, do 3.
- Check if a valid square of length L is possible starting from that cell (i,j). If yes, add 1 to answer, else break and start checking another cell.
Basically what the above will do, it will chose a cell. Then it will check if a square of length 1 is possible starting from it. Then it will check if a square of length 2 is possible from it. So on and so forth, it will check if a square of length, say l is possible or not. If possible, it will add 1 to answer and check for l+1, else it will break out and check the next cell. This will also give the same answer.
But of course, this brute force approach will time out. Also, we haven’t considered anything about unpainted houses yet! What about that? Any thoughts of what change the unpainted houses will bring to the picture?
2. Contribution of unpainted houses and when to discard it-
There can be three cases here.
- We get a sub grid where all houses are unpainted.
- We get a sub grid where some houses are unpainted, and its possible to convert this sub-grid into a sub-grid where “All houses are of same color” by appropriately painting unpainted houses.
- We cannot get all houses in sub-grid of same color no matter what (because of some conflicting colors of houses which are already painted).
By first section, it would be clear that we are leaning towards sub-grids of type 1 and 2 for optimization of brute force. The reason for this is simple, in a sub-grid of type 1 or 2, we dont have to check if the sub-grid is valid to be made beautiful or not, as its guaranteed by definition.
Also, notice that, we can decompose sub-grid of type 3 as “Its made up of various subgrids of type 1 and 2 (even if the size of these sub-grid is 1!!)”. An example is given in tab for those in confusion.
Click to view
Say we have a sub-grid
RRBBG
RRBGG
We can say that this sub-grid us made up of subgrids-
1.RR RR
2. B B
3. B
(at cell (4,1). 1-based indexing)
4. G G
5. G
at cell (4,2), 1-based indexing)
Note that all of these sub-grids are rectangular. We can similarly decompose any sub-grid of type 3 into those of type 1 and 2. Sub-grid of type 2 can have any number of unpainted houses, even 0.
Now, what about the contribution when houses are unpainted? Lets take an example of sub-grid-
RRR
R*R
RR*
Lets dry-run our brute force algorithm here.
We start at cell (1,1) (1-based indexing). We check if a square of size 1 is possible or not. It is, we add 1 to answer.
We now check if a square of size 2 is possible or not. It is, if the unpainted house at (2,2) is painted red. What is the probability that the house will be painted red? Of course, its P(Red)=\frac{1}{3}, as we have 3 colors, out of which any one can be chosen with equal probability. Recall that Expected value E(x)=P(X)*X. P(X)=\frac{1}{3} as we calculated above. Also, if we paint the house red, we get 1 more beautiful sub-grid which adds 1 to the beauty (strictly w.r.t. when starting cell is (1,1).) Hence, we add E(x)=\frac{1}{3}*1=\frac{1}{3} to the answer.
Then, we will check for a square of size 3 starting from (1,1) now. It is possible as well, if both the unpainted houses are painted red. The probability of this is P(Red)=\frac{1}{9}, and hence we add \frac{1}{9}*1=\frac{1}{9} to the answer.
No more squares are possible which start from (1,1), hence we now move on to cell (2,1) and repeat the above process again.
Notice how each unpainted house divided the contribution by 3. Say we are checking a square grid of L=100 which has 80 unpainted houses. It will add {3}^{-80} to the final answer. Clearly, such a small value wont affect our final result as an absolute difference of {10}^{-2} is allowed. Also, such calculations take up lot of time. Hence, we decide that if the number of unpainted houses becomes more than 50, we disregard any further contributions and move on to next cell.
Note that, by above example, we can clearly see that if there are k unpainted houses in the square grid under consideration, then we add {3}^{-k} to the contribution.
This was dealing with sub-grids of type 2. Dealing with sub-grids of type 1 is even easier!
Say we have got-
***
***
***
Lets apply our brute force here.
Starting from (1,1) again, we check if a beautiful sub-grid of size 1 is possible. It is, if the unpainted house is painted R , B or G. Each color has a probability of \frac{1}{3}. But this time, we have 3 valid choices instead of just one (as house can be painted either Red, or Blue, or Green instead of only 1 color which happened in case above!). Hence, we will add E(x)=3*\frac{1}{3}=1 to the answer.
Now we check if a subgrid of size 2 is possible, which starts from (1,1). It is, if all 4 houses are painted R,B or G. Hence, we add E(x)=3*\frac{1}{{3}^{4}} to the answer. (Observe that we have to paint ALL houses either Red, or Blue, or Green. Hence, only 3 valid options).
For size 3, I think you guys can handle yourself now.
So far, we saw how brute force works, and got the essence of the solution. But, how to optimize the brute force? We got one point, that we should stop calculation if number of unpainted houses becomes too large. Any other?
3. How to optimize it-
We are doing better than original brute force by stopping if number of unpainted houses is too large. But we need something more. Note that a lot of time is consumed in the step “Check if square grid of length 1 is possible, Check if square grid of length 2 is possible, ...”
Lets see the sub-grid example of-
RRRRRR
RRRRRR
RRRRRR
RRR*RR
RRRRRR
RRRRRR
Since there are no unpainted houses in between square grid of size 1 to size 3 which start from cell (1,1), we already know we will be getting a contribution of 3 in final answer. No need to check for square grid of any other length in between. For square grid of size 4, we know the contribution to expected value is \frac{1}{3}, and its same for square grids of size 5 and 6 as well, no need to check individually.
How to prevent the individual checking?
One thing we can do is, to first, find largest length of subgrid of type 1 and type 2 which start from the cell under consideration, (i,j). We can do so quickly by Binary Search and some pre-processed data. We can store a prefix sum in a 2-D prefix sum array (refer to counts[k][i][j]
editorialist’s solution) and use it to tell if the entire sub-grid starting from (i,j) consists of houses of only single color (and unpainted houses) only or not. With this, we know the valid sub-grid size.
With some similar pre-processing, we also store the location of next unpainted house from (i,j)(refer to nextLeft[i][j]
and nextUp[i][j]
in Editorialist’s solution). This helps because we can skip checking squares whose sizes “lie in-between” as explained in above example. This step’s validity can be proved. We know that, contribution of a grid depends on number of unpainted houses in it. If count of unpainted house of two square grids are same, then their contribution is same as well.
We combine the above with our very first observation, to discontinue counting contribution if number of unpainted house becomes too large. Lets say that we fixed this number to 50. Calculate the time complexity now-
Time Complexity- O(NM *LogL*50)\equiv O(NMLogL) in worst case, which can easily pass. Note that range of L is 0\le L \le min(n,m).
SOLUTION
I had purposefully not gone deep into implementation. The editorial would had else, become very long. If you got the logic and intuition, then go ahead and look at the editorialist’s solution. I have commented things so that its easier to understand implementation
CHEF VIJJU’S CORNER:
1. This question, uptil now, has no AC solution in practice…Editorialist’s of course! xD
2. Make sure you use long double
instead of double data type for better precision. I found double data type giving undue WA’s , so watch out!
3. Derivation of \sum_{l=0}^{min(n,m)-1} (n-l)*(m-l)
Click to view
Say you are considering a square of length L. The grid has N rows and M columns.
What is the maximum column number from where square can start? Of course, (M-L), else it will get out of grid. Similarly the row number to start is bounded by (N-L).
Now, each square which starts from a different cell is unique, by common sense.
\implies Number of unique Squares=Number of Starting Points available
How many starting positions are there? All the points which lie inside our “restricted” grid with (N-L) rows and (M-L) columns.
\implies For a particular L , number of unique squares are (N-L)*(M-L)
Summing up for all valid L, we get the required final form.
Bonus-What if N=M? Does this reduce to a known series??
Any request for explanations, examples, derivations etc. are welcome. I would be grateful if the community can suggest related problems, I had a hard time finding a good one :(.