PROBLEM LINK
Author: sigma_g , animesh1309, arpan_dg, nikhil_chandak,harrycoda, yoogottamk and myself
Tester: tanuj208, dk502, akshat_goyal, manish_1729, kunalserum123 and abhishek8899
Editorialist: sigma_g , animesh1309 and myself.
DIFFICULTY
Easy to hard (based on method).
PREREQUISITES
Based on methods, check individual problems below for details.
The problem can be interpreted in the form of graphs. Though there can be solutions to the problems without interpreting problems as graphs such as greedy solutions or gradient descent techniques which are described below. But hand crafted patterns in A, B, C as stated in “Contest About” can be found by interpreting problems as graphs and running dfs on it to know the details of the graph. Observations to frame problem as graphs:
- Stocks can be considered as nodes of the graph.
- Covariance matrix can be treated as adjacency matrix where covariance value between stock i and stock j can be considered as weight of edge connecting node i and node j.
- When the covariance value between stock i and j is 0 it can be treated as i and j nodes are not connected by any edge.
Problem A
Step 0
- After doing DFS, you will notice that this graph is a tree.
Step 1
- So now we can get a deterministic answer to this problem by using DP on trees concept.
- We can also use some greedy techniques to solve this problem by exploiting the property that graph is a tree. We tried and achieved good scores with greedy, likely because the graph is very sparse (only n-1 positive edges). So, although tree DP is the best deterministic answer, it is also possible to use greedy and achieve reasonably high scores.
Step 2
- For DP: We will use a 2D dp state here where dp[a][i] would represent what would be the maximum score we can get from the subtree of node a (including a) by taking \beta_a as i.
- For the update step we can loop over all possible beta values of the child and take the one which gives the maximum score and do the same thing for every child of node a.
- By this way we can get the best answer and to get all \beta_i's corresponding to our best answer we can simply backtrack starting from the root.
- Dp solution code: Pastebin Link
Problem B
Step 0
- After doing DFS, you will notice that this graph contains only two connected components, one of which contains only 16 nodes and the other connected component contains the remaining 784 nodes.
Step 1
- So in the first component we can use exponential search to get high scores, and the second component can be treated as a separate random file and techniques as given in solutions for D and E can be used.
- While framing this file we gave more weight to values of the first component so that you can get more increase in scores if you exploit exponential search techniques in the first component.
- Example function to do exponential search (in this all betas are considered to be 1 they can be handled in a similar way)
void exponential(vector<int> &nodes){
int no = nodes.size(), ans = 0;
// exponential search loop from 0 to 2^no
for (int mask = 0; mask < pow(2,no); mask++) {
int temp_mask = mask, bit = 0;
vector<int> store;
while(temp_mask > 0){
if(temp_mask % 2 == 1){
store.pb(nodes[bit]);
}
bit++;
temp_mask /= 2;
}
int total_val = 0, total_cov = 0;
// calculate value
for(auto a : store){
total_val += valu[a];
}
// calculate covariance
for (int i = 0; i < store.size(); i++) {
for (int j = i; j < store.size(); j++) {
total_cov += adj_mat[store[i]][store[j]];
}
}
// final cost
int cost = total_val - total_cov;
ans = max(ans, cost);
}
cout<<ans<<endl;
}
Step 2
- Exponential search technique is basically “iterating over all subsets of nodes”. Since there are only 16 nodes, this will take 2^{16} \approx 10^5 iterations of the search loop. For evaluating the score of a subset of nodes, you can write a custom nested loop which should take n^2 time where n\approx16.
Problem C
Step 0
- It makes sense to run a dfs on the nodes first, especially considering so many of the values in the covariance matrix are zero.
Step 1
- After doing DFS, you will notice there are several independent components in this graph.
- Each component is a small set (of 10 nodes or less) of nodes which have high covariance with each other.
- Nodes in component A have zero covariance with nodes in another component B.
- Therefore, we can solve component A independently of component B.
Step 2
- To solve a particular component, we can randomly sample all subsets of nodes from that component.
- While sampling, we only take subsets which have size <= 3.
- We compute the score for each subset and take the largest scoring subset for that component.
Step 3
- We can prove that larger sized subsets (> 3) will not work.
- We plot the distribution of values and covariances, and notice that all values are close to the mean. Let the mean value be V (roughly ~2800) and mean covariance be C (~1000).
- If we choose k nodes from an island (with \beta_i=1), the expected score contribution of that island would be \text{es} = kV - \frac{k(k+1)}2 C\approx 500\times 5k-k(k+1)). \text{es}\le0 for k\ge4. Hence, k\le3 is only useful.
- If all \beta_i=2, then \text{es} = 2kV - \frac{k(k+1)}2 2\times2 C\approx 1000\times 5k-2k(k+1)). \text{es}\le0 for k\ge2.
- Notice similarly that \beta_i\ge3 will never be the case. So, even though \alpha_i is given upto 10, in practice you will not need more than 2.
Step 4
-
The complexity of this whole procedure is roughly x\times y\times z where
- x = number of components \approx 100
- y = number of subsets per island \frac {z(z+1)}2 \approx 50 where z is size of island (max 10)
-
z = beta_i attempts per subset (=4,
11, 12, 21, 22
)
which comes out to be less than 1e5 operations.
Problem D
Observations
- This file had random data generated in a range
- Therefore you can try many random heuristics
- inputting all \beta values as one gives a pretty high score, but it’s not the best score possible
- Obviously as this is a random file you should try a few greedy heuristics on this one.
- We shall explain gradient descent here.
Gradient descent
The score function can be written in terms of the \vec{\beta} vector as the following vector expression:
S = max(0, \sum(\vec{\beta} .* \vec{V}) - \sum((\vec{\beta} * \vec{\beta}') .* C)
Here \vec{V} is the value vector, and C is the covariance matrix, .* is the element-wise product, whereas * is the matrix product.
- This is a downward parabola in terms of \vec{\beta} and hence can be optimized using gradient descent.
- Gradient descent will need to “clamp” \vec{\beta} in the range [0, \alpha], and finally round the final fractional values to integers.
- The final rounding is mostly suboptimal however we cannot do much about it. We can - however - further heuristics on the rounded output of gradient descent.
You can use matrixcalculus to find the derivative and perform gradient descent. A sample implementation of the gradient descent is given in this Colab notebook. If you the gradient descent to run really well, let us know in the comments!
Problem E
Again, the approach is similar to problem D. The only difference here is that high values of alpha are not feasible in this test case. In the previous problem, you can prove by hand that for alpha > 5, the score would start becoming negative. In this problem, that bound on alpha is even lower. This bound is proportional to the ratio of average value to average covariance.
Note on LP solvers
We ran two linear programming solvers before the contest, however, they either produced average results or did not finish processing at all.
CVXPY
The formulation for constraints can be done like so:
def solver(cov_mat, values, alphas):
x = cp.Variable(n)
part = cp.quad_form(x, cov_mat) - values @ x
print(part.shape) # should be (1,1)
prob = cp.Problem(cp.Minimize(part), [x >= 0, x <= alphas])
prob.solve()
# x.value will be continuous, hence we need to round it
opt = np.round(x.value)
# submit opt vector
Note that x = cp.Variable(n, integer=True)
does not work (i.e., cvxpy
is unable to solve the problem). The opt
vector achieves good performance on prob A and prob D, average on problem C and E, and terrible performance on B (due to a very dense graph, with values and covariances of almost scale). Overall, it achieved around 300/500 points on our testing and therefore we comfortably dismissed this method as it was not expected to be in top 100 of ranklist.
MATLAB
The following code can iteratively solve for the desired vector. Pastebin link.
We ran it on our institute research machine with the input files. However, it could not finish processing even after roughly two hours of compute time.
Starting loop at HH:MM:SS
Status: 99 0 1.170225e+06 1
Status: 199 0 5.797305e+05 1
Status: 299 0 2.213001e+05 1
Status: 399 0 1.040847e+05 1
Status: 499 0 5.940622e+04 1
Status: 599 0 3.240714e+04 1
Status: 699 0 1.510964e+04 1
Status: 799 0 1.055976e+04 1
After these experiments we concluded solvers are achieving average performance only on the cost function. Moreover, the first three input files were deliberately hand-crafted and the solvers got an average to poor score on them. If you got the solvers to run well, let us know in the comments!
Conclusion
Overall, as you can see in one problem statement we have fitted so many different concepts:
- dfs on graphs
- tree dp
- exponential search/subset sampling
- greedy heuristics
- gradient descent
- linear program solvers
Overall, we hope you enjoyed the contest! In case of any clarification, please let us know in the comments. Also don’t forget to share your approaches/observations in the comments, we’d love to hear them!