GERALD09 - Editorial

challenge
editorial
gerald09
greedy
july14

#1

PROBLEM LINK:

Practice
Contest

Author: Gerald Agapov
Tester: Shang Jingbo and Gerald Agapov
Editorialist: Devendra Agarwal

DIFFICULTY:

Challenge

PREREQUISITES:

Greedy.

PROBLEM:

Given an empty matrix of NxM , fill the matrix with G,C,A and T.

The matrix will be stable if it contains K different submatrices. Your aim is to fill the matrix with A,C,G and T such that the difference between the number of different submatrices in matrix and number K should be as small as possible.

Explanation:

This problem is a challenge problem. For challenge problem , you should aim at finding some approximate/partial solution using algorithms like greedy , dp , etc…

For this problem , let us break the problem in cases :

Case 1 : For small n and m where n * m<=10


This case can be solved using brute force. Try all possible matrices and then find the matrix which gives the best possible estimate. All possible matrix are 4n*m , and each matrix can be checked for different submatrices in O(n2m2) time . You need to be little careful in this method because test cases may go to 100 , so use this approach only if 4nm*n2*m2 <= (4 * 108) / Total_Tests

Case 2 : A more generic case with any N,M and K.


First start with all ‘A’ in the matrix , then try to change each element of the matrix randomly and check if it gave a better estimate or not. But checking after changing each element may lead to TLE because time complexity for that will be T* n3m3 where T is the Total_Tests.
So, instead of checking after changing at each box , we can use any strategy to reduce number of checks. One simple idea is to check only log(N
M) times. Start with changing first (N * M)/2 elements and then keep on decreasing the number by 1/2 . So time required is T * n2*m2log(nm) <= 100 * 154 * 10 < 108

We will use greedy approach in our selection , we will pick up only those changes which will take us close to K.

Now you can still make some random iterations in which you pick any two elements of the matrix and swap them and then check for number of different sub-matrices. If we get close to K , we accept those changes otherwise we reject it.
You can try some more random strategies here : by fixing an element of the matrix and choosing the other element for swapping.

Be careful on the number of random iterations, so,let Iterations be the number of Iterations you will run your program, then Iterations * n2*m2 <= 100000000/Total_Tests must be satisfied to avoid TLE . Actually you can try by changing the constant and submitting the solution to get a better bound on number of Iterations.

I would like to thank Ju Ruo , i got this insight after reading his code :slight_smile: .
I would invite all the participants of this problem to share their awesome ideas with us :).

AUTHOR’S and TESTER’S SOLUTIONS:

Author’s solution to be uploaded soon.
Tester’s solution


#2

I’d like to highlight another approach, which was apparently enough for 0.921 points (at least on 20% of test data), and is great for large K.

You may have noticed that there just 225 possible matrix sizes (actually, just 120 when N <= M), but an incredible 50 kB of allowed source size. Why not actually use this space? Of course, not by writing 50 kB of code, but by pre-computing something for all possible N_x_M and hardwiring it into the code?

That something is (let’s just fix N,M) a map of matrix:score (score = number of different submatrices), with scores as evenly distributed over the range [1, maximum of scores found by 1000 times running “random generate genome, check score”] as possible. Of course, one does not simply fit many matrices into 50 kB, so we need to compress the data somehow. My idea was to just remember the first matrix (random, maximum score), and then keep on replacing all non-A nucleotides in row-major order and only remember the scores we’d obtain after each replacement. This implicit description is easy to replicate, except with much less time necessary because we don’t need to actually calculste the scores.

I ran a bruteforce solution that computed this initial genome and an array of scores (printing it already in copy-pastable array form) on my computer with O(N^2M^2 log N^2M^2) time per check, it runs in slightly less than a minute. Then, I just wrote a solution that stoarts with a shitload of numbers and then has a short main(): read input, simulate changing non-As to A from the initial genome for this N,M, pick the option which gives score closest to K, write output.

Effortless solution is effortless. To anyone that took more than several hours working on this problem to get a score of at least 0.9, I suggest playing Battle for Wesnoth intensively - you’ll find out how much can be optimized outside of algorithms :smiley:

I also have a question: Were the setter/tester aware of this solution?


#3

Where to know the score for my solution to this problem? And How to check July challenge rating of me(global/country) easily(not your boring next button clicking system).


#4

Sir what would be the best way to find the number of sub matrices? I used PnC, but things get complicated as the types of letters increase.


#5

Here is my editorial for that problem with a lot of details, it gave me 19th place overall:

http://chasethered.com/?p=221


#6

The author’s solution is not uploaded even now (after 3 years)!


#7

http://www.codechef.com/rankings/JULY14/


#8

“not your boring next button clicking system”
Nobody asked you to click next buttons and no one is obliged to provide your rating. CodeChef is amazing in what they do, if you don’t like it, then this site is not for you.


#9

My initial idea was to precompute as many matrices as possible (using various strategies) during the contest and then “compress” them somehow into the 50 KB of allowed source code. That didn’t work, of course (the “compression” into 50 KB, I mean), so I spent a lot of submissions identifying a good part of the matrices from the test cases. Thus, I could save in my source code only my answers for those matrices (I needed only 506 different matrices, which fit into 50 KB without trying too use any special compression techniques).


#10

What’s PnC?


#11
  • using Permutations

#12

I have no idea at all how finding the number of distinct submatrices could ever involve permutations. I just tried hashing all submatrices (there are just O(N^2M^2) of them) and storing their hashes in an STL set<>.


#13

you can go to “My submissions” to check your score to this problem. Click on your profile to see your rating/ranking which is overall. Its unnecessary to check your rank in July challenge unless you solved all the questions =)


#14

size of submatrix vary from [1…n]x[1…m] makes the number of sub-matrices n^2.m^2!!


#15

@xellos0 I used hashing too. I just used unordered_set instead.