CHEFZERO - Editorial

vijju123 · September 23, 2018, 6:14pm

PROBLEM LINK:

Div1
Div2
Practice

Setter- Misha Chornyi
Tester- Teja Vardhan Reddy
Editorialist- Abhishek Pandey

DIFFICULTY:

CHALLENGE

PRE-REQUISITES:

General Programming. Knowledge of Probability and Expectations

PROBLEM:

Given a N*M matrix, divide it into K connected components such that |Max-Min| is minimized, where Max is the maximum valued component and Min is minimum valued component.

QUICK-EXPLANATION:

Key to AC- Notice that numbers are uniformly, randomly generated. Trying out various orderings and simulations was expected to give good score.

Note that the values are uniformly, randomly generated. Hence, if we divide the array into \frac{N*M}{K} partitions, then it;d be a good start. Lets see what more we can do. Assign labels to array elements such that elements of same labels are in same connected component. Now, try to convert it into a 1-D array and see if we can do manipulations (eg- push another element into minimum district subarray). Trying out multiple such labellings/orderings/patterns and operating ideally on them was expected to give a good score.

EXPLANATION:

I will be sharing the expected solution of setter in the editorial. As usual, the editorial of challenge problem is never complete without contribution from community. So if you have any approach which is not yet covered, feel free to share it!

Setter’s solution is quite simple and relies on the fact that numbers are randomly generated, so its very unlikely that theres a huge concentration of high or low values accumulated around a small area.

I tried to visualise the process in pic below, before beginning to explain it.

alt text

The first step in the above algorithm would be, to try any random order. One property which must hold in that order is that, elements which are adjacent in the equivalent 1-D array must also be adjacent in the initial 2-D array as well. You can try multiple such orders, for example, in the picture above I followed the order/pattern of going from left to right, then 1 step down, then right to left … and so on.

Now, for every order you try, assign the matrix elements “labels” or indices where they’d appear in 1-D array. Once done, convert the 2-D array into 1-D array. Now our problem reduces to “Partition a 1-D array into atmost K partitions such that difference between highest and lowest valued partition is minimum.”. Note that, following just 1 or 2 patterns may not give optimal answer, we simulate the process for multiple patterns/orderings. (Meaning, steps till now are simulated for multiple patterns).

One way of initially partitioning the array could be, find the array sum and partition whenever the size of partition is \approx N*M/K, other would be to partition when the partition sum is close to \sum A_{ij}/K. Another very good improvement is, once we get an initial partition, try “popping an element out of maximum partition” and/or “pushing an element into minimum partition” (since we made sure that elements adjacent in 1-D array are adjacent in 2-D matrix as well, its well possible). Do this as long as it helps.

However, note that our algorithm has an expensive step, determine and assign partition. That step takes O(N*M) time, while manipulating the already partitioned array (pushing/popping as discussed before) takes significantly less time. So, its worthwhile to pay some extra attention in trying to improve manipulation techniques rather than blindly using thousands of orderings/patterns.

Thats it from setter’s side XD. I tried to decode participants solutions but most of the top solutions are complex . I’d hence like to invite everybody to share their approach and how it ended up for them. Hope you guys enjoyed the problem.

SOLUTION

Setter

Click to view

#include <bits/stdc++.h>
 
    using namespace std;
 
    typedef long long ll;
    typedef long double dbl;
    typedef vector <int> vi;
    typedef pair <int, int> pii;
 
    int const N = 1002;
    int const MD = 1000000007;
 
    #define FOR(i, a, b) for (int (i) = (a); (i) <= (b); ++(i))
    #define RFOR(i, a, b) for (int (i) = (a); (i) >= (b); --(i))
    #define REP(i, n) FOR(i, 0, n - 1)
    #define pb push_back
    #define mp make_pair
    #define X first
    #define Y second
    #define sz(v) int(v.size())
 
    int res[N][N];
 
    int main() {
    	int N, M, K;
    	scanf("%d%d%d", &N, &M, &K);
    	vector <pii> parts;
    	FOR (i, 1, N) {
    		FOR (j, 1, M) {
    			if (i & 1) {
    				parts.pb(mp(i, j));
    			} else {
    				parts.pb(mp(i, M - j + 1));
    			}
    		}
    	}
    	int X = (N * M) / K;
    	int ptr = 0;
    	FOR (it, 1, (N * M) % K) {
    		FOR (i, 1, X + 1) {
    			res[parts[ptr].X][parts[ptr].Y] = it;
    			++ptr;
    		}
    	}
    	FOR (it, (N * M) % K + 1, K) {
    		FOR (i, 1, X) {			
    			assert (ptr < sz(parts));
    			res[parts[ptr].X][parts[ptr].Y] = it;
    			++ptr;
    		}
    	}
    	assert (ptr == sz(parts));
    	FOR (i, 1, N) {
    		FOR (j, 1, M) {
    			printf("%d%c", res[i][j], j == M ? '\n' : ' ');
    		}
    	}
    	return 0;
    }

Tester

Editorialist

Time Complexity=O(N*M)
Space Complexity=O(N*M)

CHEF VIJJU’S CORNER

1. Setter’s Notes-

Click to view

I expected something like partition N*M in components with approximately equal size [N*M]/K
Several times. As distribution is uniform it’s OK.So, you can move some element from one district to another. When you find some partition Then let’s try to increase the population in the minimal populated district Or decrease the population in the maximal populated district.

2. Hall of fame for noteworthy Solutions

mcfx1 - This bad boy made us change the scoring function during contest :(. Shoutout at his crisp 852 line code!
whzzt - Best solution in terms of score and memory. His code is \approx 180 lines.
gorre_morre - GORRE MORRE STRIKES BACK!! XDD. This time with a crisp 629 line code in python.
algmyr - The guy in class who scores 99\% (Hint: Check his score).

3. A note on when questions have this random generation-

Click to view

There are multiple times when you’ll come across a question quoting “test data is randomly generated.” Usually, for such problems, one thing to keep in mind is, that usually the worst case of an algorithm is HIGHLY unprobable. Like, in DARTSEGM, a problem which I shared below, the expected size of convex hull when points are randomly generated is O(logN). This does not mean that for all arrangements it holds, we can make an arrangement when the size is O(N), i.e. all points are part of hull, but what it means is that, since expected size is O(LogN), then a huge deviation from it is NOT expected if test data is generated randomly. In other words, the worst case may (or usually will) not occur among the test data. With this in mind, you can try going forward to such questions from now on. Remember, “test data randomly generated” is the keyword, dont try this if this isnt guaranteed! XD

4. Related Problems-

DARTSEGM - Randomly Generated. Enough said. XD

tech_mint0013 · September 24, 2018, 12:20pm

Thanks for the editorial!!

algmyr · September 25, 2018, 7:05am

Here is a visual comparison of the hall of fame authors on a random case (n=536, m=958, k=879). The colors in the images are districts and the difference between the largest cluster and smallest cluster is in the title. The grayscale image gives a sense of the order in which order districts are placed (black to white corresponds to earliest to last).

Click images to see them in full resolution.

mcfx1: max - min = 1

No obvious pattern for the top part, the bottom part has alternating seams. The packing is done top-down, so I presume the transition at the bottom is to abort the previous packing cleanly. other than that I have no idea what happens here, especially on the top, but evidently it works really well.

whzzt: max - min = 1

Starts off with a clock-wise spiral and transitions into slicing the interior when it’s small enough. Nice choice of pattern to maximize the circumference which directly translates to a good chance for optimization. The interior is also easy to handle, it corresponds to something like phase 2 that I describe in my solution.

gorre_morre: max - min = 12

Divide into bands of as equal sum as possible, then split the bands into correct sized districts. Seems similar to my approach but much more aggressive in it’s choice of bands (wide) and in the optimization of districts. This seems to help a lot, in contrast to my solution. The choice of seams seems (no pun intended) to be a lot less strict, which makes the districts look quite fuzzy.

algmyr: max - min = 911

I’ll elaborate a bit more on this one, for obvious reasons. I didn’t quite reach the scores of the top 3, but I wasn’t too far off considering a naive solution typically has differences of order between 10^8 and 10^9.

I do two phases, creating seams to separate bands of equal(ish) size and then dividing the bands into districts. The reason for this structure is to simplify code and guarantee connectivity. Essentially the same code can be used for both phases.

Phase 1 - Bands

Say we want to create b bands and that the total sum is S. My goal is to create one band, so the goal sums are \frac{(b-1)S}{b} and \frac{S}{b}. Start naively picking elements from left and right and add them to a left and right partition and try to maintain the correct ratio. Proceed until we have a 1 wide seam to go:

L.......    L......R    LLL.RRRR
L.......    LL.....R    LLL.RRRR
........    LL....RR    LLLL.RRR
........    LL....RR    LLLL.RRR
.......R    LL....RR    LLLL.RRR

Now, the elements in this seam can belong to either partition, and we have a goal sum for both partitions. This is an instance of a partition problem! Sadly this problem is NP-hard, but fortunately it’s easy to approximate (this has been called the easiest NP-hard problem for a reason). I use the Karmakar-Karp differencing method (see the wiki page linked).

One band is now done, proceed to split the larger part into b-1 equal(ish) bands.

Phase 2 - Districts

I pick bands so that they are at least narrow enough for districts cover around 3 rows, which means I can safely do the same kind of seam process to produce correct sized districts.

Note on image creation

The color images were generated based off my validation+scoring script I used during the competition. In short: find all clusters and note which clusters are neighbors. Go through all clusters and assign to it the mex of all its neighbors. This is then translated using a color table. Typically this only takes 4-5 colors (in general any board will require at most 4 colors, and this method typically comes pretty close).

For this kind of image related stuff I can recommend NetPBM as a format. For the graymaps, the output data by itself is only a header

P2
958 536
879

away from being a valid graymap!

vijju123 · September 24, 2018, 1:26pm

And sorry for the delay. I wasted some time in trying to understand some top solutions, but it wasnt easy.

algmyr · September 25, 2018, 7:28am

vijju123 Rather than diving into the code directly, visualizing the solutions gives a rough insight of how each strategy works. It’ll likely also tell you what to look for if you want to try to understand the code. Have a look at my answer for images and my quick analysis of the hall of fame submissions!

algmyr · September 25, 2018, 7:36am

I quite liked the problem, even if I didn’t get as good a score as the others. mcfx1 and whzzt in particular achieve perfect or near perfect scores, which is just amazing. They typically get max-min = 1 which is optimal unless the sum of all elements is divisible by k).

vijju123 · September 25, 2018, 3:35pm

Wow, thats seriously awesome dude!! Thanks a tonnnn!!!

vijju123 · September 25, 2018, 3:37pm

After reading your answer, it seemed like a really nice strategy. I agree, reading code is easier if you know what to expect of it, or have hints on its behavior. Since setter’s solution was based on trying off multiple patterns, perhaps my view got biased that they all would be trying multiple patterns, which is a mistake. (Not that I would have thought of making those colored images like you, that idea wouldn’t have striked me, but its just too cool!! Thanks for this, will keep in mind for future )

algmyr · September 26, 2018, 2:57pm

I can highly recommend visualization in general. I’ve found it useful a lot of the time, whether it is typical programming competition problems, optimization problems like hashcode (or here on codechef) or something completely different.

I really could go on about visualization. So many things you can do for so many types data. Typical plotting for 1d data, images for 2d-data, generating vector graphics, graph visualization formats.

vijju123 · September 26, 2018, 5:17pm

That makes me feel a little better about the upcoming assignment where we have to do plotting using Python. XD. Someday perhaps when we both are willing to go on and on about visualization and its uses we can shot each other a pm then