Editorial - KZEN2020

kanish_anand · October 28, 2020, 5:44pm

PROBLEM LINK

Practice

Author: sigma_g , animesh1309, arpan_dg, nikhil_chandak,harrycoda, yoogottamk and myself

Tester: tanuj208, dk502, akshat_goyal, manish_1729, kunalserum123 and abhishek8899

Editorialist: sigma_g , animesh1309 and myself.

DIFFICULTY

Easy to hard (based on method).

PREREQUISITES

Based on methods, check individual problems below for details.

The problem can be interpreted in the form of graphs. Though there can be solutions to the problems without interpreting problems as graphs such as greedy solutions or gradient descent techniques which are described below. But hand crafted patterns in A, B, C as stated in “Contest About” can be found by interpreting problems as graphs and running dfs on it to know the details of the graph. Observations to frame problem as graphs:

Stocks can be considered as nodes of the graph.
Covariance matrix can be treated as adjacency matrix where covariance value between stock i and stock j can be considered as weight of edge connecting node i and node j.
When the covariance value between stock i and j is 0 it can be treated as i and j nodes are not connected by any edge.

Problem A

Step 0

After doing DFS, you will notice that this graph is a tree.

Step 1

So now we can get a deterministic answer to this problem by using DP on trees concept.
We can also use some greedy techniques to solve this problem by exploiting the property that graph is a tree. We tried and achieved good scores with greedy, likely because the graph is very sparse (only n-1 positive edges). So, although tree DP is the best deterministic answer, it is also possible to use greedy and achieve reasonably high scores.

Step 2

For DP: We will use a 2D dp state here where dp[a][i] would represent what would be the maximum score we can get from the subtree of node a (including a) by taking \beta_a as i.
For the update step we can loop over all possible beta values of the child and take the one which gives the maximum score and do the same thing for every child of node a.
By this way we can get the best answer and to get all \beta_i's corresponding to our best answer we can simply backtrack starting from the root.
Dp solution code: Pastebin Link

Problem B

Step 0

After doing DFS, you will notice that this graph contains only two connected components, one of which contains only 16 nodes and the other connected component contains the remaining 784 nodes.

Step 1

So in the first component we can use exponential search to get high scores, and the second component can be treated as a separate random file and techniques as given in solutions for D and E can be used.
While framing this file we gave more weight to values of the first component so that you can get more increase in scores if you exploit exponential search techniques in the first component.
Example function to do exponential search (in this all betas are considered to be 1 they can be handled in a similar way)

void exponential(vector<int> &nodes){
    int no = nodes.size(), ans = 0;

    // exponential search loop from 0 to 2^no
    for (int mask = 0; mask < pow(2,no); mask++) {
        int temp_mask = mask, bit = 0;
        vector<int> store;

        while(temp_mask > 0){
            if(temp_mask % 2 == 1){
                store.pb(nodes[bit]);
            }
            bit++;
            temp_mask /= 2;
        }

        int total_val = 0, total_cov = 0;

        // calculate value
        for(auto a : store){
            total_val += valu[a];
        }

        // calculate covariance
        for (int i = 0; i < store.size(); i++) {
            for (int j = i; j < store.size(); j++) {
                total_cov += adj_mat[store[i]][store[j]];
            }
        }
        
        // final cost
        int cost = total_val - total_cov;
        ans = max(ans, cost);
    }

    cout<<ans<<endl;
}

Step 2

Exponential search technique is basically “iterating over all subsets of nodes”. Since there are only 16 nodes, this will take 2^{16} \approx 10^5 iterations of the search loop. For evaluating the score of a subset of nodes, you can write a custom nested loop which should take n^2 time where n\approx16.

Problem C

Step 0

It makes sense to run a dfs on the nodes first, especially considering so many of the values in the covariance matrix are zero.

Step 1

After doing DFS, you will notice there are several independent components in this graph.
Each component is a small set (of 10 nodes or less) of nodes which have high covariance with each other.
Nodes in component A have zero covariance with nodes in another component B.
Therefore, we can solve component A independently of component B.

Step 2

To solve a particular component, we can randomly sample all subsets of nodes from that component.
While sampling, we only take subsets which have size <= 3.
We compute the score for each subset and take the largest scoring subset for that component.

Step 3

We can prove that larger sized subsets (> 3) will not work.
We plot the distribution of values and covariances, and notice that all values are close to the mean. Let the mean value be V (roughly ~2800) and mean covariance be C (~1000).
If we choose k nodes from an island (with \beta_i=1), the expected score contribution of that island would be \text{es} = kV - \frac{k(k+1)}2 C\approx 500\times 5k-k(k+1)). \text{es}\le0 for k\ge4. Hence, k\le3 is only useful.
If all \beta_i=2, then \text{es} = 2kV - \frac{k(k+1)}2 2\times2 C\approx 1000\times 5k-2k(k+1)). \text{es}\le0 for k\ge2.
Notice similarly that \beta_i\ge3 will never be the case. So, even though \alpha_i is given upto 10, in practice you will not need more than 2.

Step 4

The complexity of this whole procedure is roughly x\times y\times z where
- x = number of components \approx 100
- y = number of subsets per island \frac {z(z+1)}2 \approx 50 where z is size of island (max 10)
- z = beta_i attempts per subset (=4, 11, 12, 21, 22)
which comes out to be less than 1e5 operations.

Problem D

Observations

This file had random data generated in a range
Therefore you can try many random heuristics
inputting all \beta values as one gives a pretty high score, but it’s not the best score possible
Obviously as this is a random file you should try a few greedy heuristics on this one.
We shall explain gradient descent here.

Gradient descent

The score function can be written in terms of the \vec{\beta} vector as the following vector expression:

S = max(0, \sum(\vec{\beta} .* \vec{V}) - \sum((\vec{\beta} * \vec{\beta}') .* C)

Here \vec{V} is the value vector, and C is the covariance matrix, .* is the element-wise product, whereas * is the matrix product.

This is a downward parabola in terms of \vec{\beta} and hence can be optimized using gradient descent.
Gradient descent will need to “clamp” \vec{\beta} in the range [0, \alpha], and finally round the final fractional values to integers.
The final rounding is mostly suboptimal however we cannot do much about it. We can - however - further heuristics on the rounded output of gradient descent.

You can use matrixcalculus to find the derivative and perform gradient descent. A sample implementation of the gradient descent is given in this Colab notebook. If you the gradient descent to run really well, let us know in the comments!

Problem E

Again, the approach is similar to problem D. The only difference here is that high values of alpha are not feasible in this test case. In the previous problem, you can prove by hand that for alpha > 5, the score would start becoming negative. In this problem, that bound on alpha is even lower. This bound is proportional to the ratio of average value to average covariance.

Note on LP solvers

We ran two linear programming solvers before the contest, however, they either produced average results or did not finish processing at all.

CVXPY

The formulation for constraints can be done like so:

def solver(cov_mat, values, alphas):
    x = cp.Variable(n)
    part = cp.quad_form(x, cov_mat) - values @ x
    print(part.shape) # should be (1,1)

    prob = cp.Problem(cp.Minimize(part), [x >= 0, x <= alphas])
    prob.solve()

    # x.value will be continuous, hence we need to round it
    opt = np.round(x.value)
    # submit opt vector

Note that x = cp.Variable(n, integer=True) does not work (i.e., cvxpy is unable to solve the problem). The opt vector achieves good performance on prob A and prob D, average on problem C and E, and terrible performance on B (due to a very dense graph, with values and covariances of almost scale). Overall, it achieved around 300/500 points on our testing and therefore we comfortably dismissed this method as it was not expected to be in top 100 of ranklist.

MATLAB

The following code can iteratively solve for the desired vector. Pastebin link.

We ran it on our institute research machine with the input files. However, it could not finish processing even after roughly two hours of compute time.

Starting loop at HH:MM:SS
Status: 99 0 1.170225e+06 1
Status: 199 0 5.797305e+05 1
Status: 299 0 2.213001e+05 1
Status: 399 0 1.040847e+05 1
Status: 499 0 5.940622e+04 1
Status: 599 0 3.240714e+04 1
Status: 699 0 1.510964e+04 1
Status: 799 0 1.055976e+04 1

After these experiments we concluded solvers are achieving average performance only on the cost function. Moreover, the first three input files were deliberately hand-crafted and the solvers got an average to poor score on them. If you got the solvers to run well, let us know in the comments!

Conclusion

Overall, as you can see in one problem statement we have fitted so many different concepts:

dfs on graphs
tree dp
exponential search/subset sampling
greedy heuristics
gradient descent
linear program solvers

Overall, we hope you enjoyed the contest! In case of any clarification, please let us know in the comments. Also don’t forget to share your approaches/observations in the comments, we’d love to hear them!

abhisek_12345 · October 30, 2020, 7:08am

i got approx 0.885 score on every problem except E on which i got 0.359 app. using simple linear equations. I now wonder how this problem translated to tree dfs. I’m new to CP, i couldn’t figure out.

kanish_anand · October 30, 2020, 7:37am

Yeah so some of the observations using which you could relate this problem to graphs were:

This has n*n covariance matrix which can be thought of as adjacency matrix in graphs by taking all stocks as nodes.
Covariance matrix values can be thought of as weights of edges between two stocks (nodes).
A lot of covariance matrix values were 0 in files so that can be thought as no edge between the corresponding stocks.

After you get these observations i.e you interpret the problem as graphs first thing you can try is dfs to check what is hand-crafted pattern (as mentioned earlier) in files. And with dfs you will notice various patterns as mentioned in editorial.

Though if you are a beginner to CP we would recommend you to first read some concepts and practice some simple graph problems on codechef/codeforces involving adjacency matrix and tree dfs concepts.
After reading about those concepts you will definitely have a good grasp on how to manipulate the given problem using graph techniques

abhisek_12345 · October 30, 2020, 10:49am

Yes Kanish, thanks for insight. l’ll remember this.
It’s an amazing problem, kudos to IIITH team.

rohit_varma · October 31, 2020, 6:45am

@kanish_anand ,the problem(s) were really amazing, we had fun solving them although we weren’t able to figure out or model this as a graph problem thing during the contest
But, we(me and my team-mate) had solved it by simplifying the given cost function by using some approximations and had achieved pretty decent scores on two of the sub-problems. Overall it was a great experience and we also had fun solving them. Thank you, for such an amazing problem set.

kanish_anand · October 31, 2020, 7:15am

Thanks alot for the appreciation ! @rohit_varma. We are glad that you liked the problem set.

sg1729 · October 22, 2021, 8:04pm

Could you please elaborate more on how to handle the case when betas do not have to be considered only 0/1 in problem B