PROBLEM LINK:Practice Setter: Teja Vardhan Reddy DIFFICULTY:EasyMedium PREREQUISITES:Combinatorics and Factorials. Modular Arithmetic. PROBLEM:Given an array $A$ of size $N$, Count the number of ways we can select a subset from $A$ such that the median of selected subset is also present as an element in the subset. SUPER QUICK EXPLANATION
EXPLANATIONThis problem is probably the best example of how we can use combinatorics to optimize our solution from $O(N^3)$ to $O(N)$ (except sorting). First of all, let us formulate the definition of median. If the size of the selected subset is odd, the median is just the middle element of subset after sorting. Since the middle element is present in the subset, all subsets of odd size are valid. It can be easily proven that there are $2^{N1}$ such subsets. So, we can directly count these and move toward even size subsets. If the size of the selected subset is even, the median is defined as the average of two middlemost elements after sorting. Now, say we have two middle elements $x$ and $y$, with condition $x \leq y$. Let $z = (x+y)/2$ be the median of sequence. If we write $y = x+d, d \geq 0$, we can see, $z = x+d/2$ and also, $z = yd/2$. This way, we can see, that the median of a sequence can never be smaller than $x$ and greater than $y$. So, For $z$ to be present in subset, we need either $z = x$ or $z = y$. But, this would imply $d = 0$, Hence leading to the conclusion that For an even size subset to be valid, the two middlemost elements should be equal. This forms the crux of our solution, and now, we need to count the number of even sized subsets with equal middle elements. After all this, there are a number of approaches to solving this problem, all of which required us to sort the array $A$ first. Let us consider a $O(N^3)$ solution first. Iterate over every pair of equal elements $(i, j)$ such that $A_i = A_j$ and iterate over the size $2*X$ of subset from $X = 1$ to $N$. The number of ways to make the subset of size $X$ with two fixed middle elements is just the product of the number of ways we can select $X1$ elements from $[1, i1]$ and $X1$ elements from $[j+1, N]$. This solution requires to iterate over every pair $(i,j)$ which takes $O(N^2)$ time and $O(N)$ time per pair, leading to Overall time complexity $O(N^3)$ which shall pass only the first subtask. For a faster implementation, We will see two implementations, one from setter, and another interesting solution which we can further optimize to $O(N)$ except for sorting. Setter's Implementation First of all, see what we were doing in the previous solution. We were fixing two equal elements and tried to count the number of ways we can make subsets of all sizes. Now, We shall fix only the Left Middle Element (Or Right one, whichever implementation you prefer). Suppose we fixed the ith element as the left middle element. Now, We will iterate over all sizes $2*X$ and try to include $X1$ elements from $[1, i1]$ and $X$ elements from $[i+1, N]$. We need the right middle element to be same as the left middle element. So, When choosing $X$ elements from $[i+1, N]$, we need at least one occurrence of $A[i]$. This is same as subtracting all the ways to select $X$ elements in the range $[i+1, N]$ which do not have $A[i]$ at all. Suppose Number of occurrence of $A[i]$ in range $[i+1, N]$ is $f$, then we can count the number of ways to select $X$ elements from range $[i+1, N]$ such that it contains at least one occurrence of $A[i]$ as $T$ = Number of ways to select $X$ elements out of $Ni$ elements less Number of ways to select $X$ elements out of $Nif$ elements. We can select $X1$ elements from $[1, i1]$ in suppose $U$ ways. Then, Number of ways we can have good subsets with $A[i]$ as the left middle element is $U*T$. Summation of this product for all sizes for all elements gives us the number of good subsets of even size. We can add to it, the number of good odd sized subsets and print the answer. Alternative Implementation. For this solution, We will not count good subsets, but subtract bad subsets from total subsets. The total number of subsets is $2^N$ out of which one is empty. Empty subset doesn't contain its median, hence a bad subset. So left with $2^N1$ subsets. In this solution, we will count the number of subsets which have the different value of middle elements. For this solution, we iterate over all distinct elements and try to the number of ways to select subsets of all even sizes. In this solution too, we fix the left middle element and try to count the number of bad subsets. Suppose we have $cnt$ elements smaller than $A[i]$ and There are total $f$ occurrences of $A[i]$ and we have to make a subset of size $2*X$, Then, we need to select $X$ elements from $cnt+f$ such that it contains atleast 1 occurrence of $A[i]$ and selecting $X$ elements from Elements strictly greater than $A[i]$, THere are $Ncntf$ such elements. The required number of ways is the product of both. We can find the summation of this product over all even sizes of subsets for all distinct values. This solution also has time complexity $O(N^2)$, thus fits the time limit easily. Hint to O(N) Solution. Note: This optimization is not exactly necessary. We rely on the same idea as the alternative implementation. We still iterate over all distinct values, but now, we will find a closed form to count bad subsets of all sizes. But the thing is, we try to find a closed form for all sizes. We see, the summation is like $^nC_x*^mC_x$ for all sizes $x$. Can you find the closed form? Seems like many users actually did, and submitted a Linear time solution during the contest itself. For those who did not reduce, this is left as an exercise for them. Looking for something helpful? See Vandermonde's identity here. Computing Number of ways to select $X$ elements out of $N$ elements The thing is, we need to calculate N select X efficiently. We know from the definition of Combinations that we can represent them in form of factorials. We can precompute Factorials and their inverses for the division. Then, $^NC_R = N!*inv(R!)*inv((NR)!)$. For details, refer this excellent post here. Time ComplexityTime complexity is $O(N^2)$ per test case for the intended solution. AUTHOR'S AND TESTER'S SOLUTIONS:Setter's solution Feel free to Share your approach, If it differs. Suggestions are always welcomed. :)
This question is marked "community wiki".
asked 12 Nov '18, 16:14

Here's my $O(n)$ time complexity solution. Let $\epsilon = \{a  a \in A \}$ (i.e., the set of distinct elements of $A$), let $k = \epsilon$ and $n_i$ refer to the number of occurrences of $\epsilon_i$ in $A$, assume that $\epsilon$ is sorted. Obviously $\sum_{i=1}^{k} n_i = n$. Also, we use the notation $\binom{n}{r}$ to denote the number of ways of selecting $r$ elements out of $n$ objects. $$\binom{n}{r} = \binom{n}{n  r} = \begin{cases} 1 & \text{if } n = r = 0 \\ \frac{n!}{r!(nr)!} & \text{if } r \in \mathbb{N}_0 \wedge r \leq n\\ 0 & \text{in any other case} \end{cases}$$ Now, clearly a subsequence $B \subseteq A$ is good iff $B$ is odd or the middle two elements of $B$ are equal. We could add $2^{n1}$ to the answer because that's the number of subsequences of odd length, however this would cause us a lot of problems later on. Now, for each $\epsilon_i$, we count all the subsequences such that $\epsilon_i$ is the median of those subsequences. Clearly summing this up over all $\epsilon_i$'s gives us this answer. Now comes the matter of actually counting this. We do this in the following manner:
In general, if we choose $m$ $\epsilon_i$'s, we have $$\binom{n_i}{m}\sum_{r = 1  m}^{m  1} \sum_{p  q = r} \binom{\sum_{j=1}^{i1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{q}$$ that is the number of elements preceding and succeeding $\epsilon_i$ differ by atmost $m  1$. Thus, our final answer is $$\sum_{i=1}^{k}\sum_{m=1}^{n_i}\binom{n_i}{m}\sum_{r = 1  m}^{m  1} \sum_{p  q = r} \binom{\sum_{j=1}^{i1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{q}.$$ Now, you maybe wondering in how in the world I could claim this solution to have $O(n)$ time complexity with $6$ of those summations, not mentioning the time required to compute a binomial coefficient. Vandermonde's Identity to the rescue! (I can not stress enough as to how useful this identity has been in a lot of questions for me) Here's the identity: $$\sum_{x+y=k} \binom{m}{x}\binom{n}{y} = \binom{m+n}{k}.$$ It might not be directly obvious how to use this identity in our question, given that we we have $p  q = r$ and not $p + q = r$, however this is rather easy to work around using the identity $\binom{n}{r} = \binom{n}{nr}$. Thus, we have $$\begin{align*} \sum_{p  q = r} \binom{\sum_{j=1}^{i1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{q} &=\sum_{p +(\sum_{j=i + 1}^{k} n_{j}  q) = \sum_{j=i + 1}^{k} n_{j} + r} \binom{\sum_{j=1}^{i1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{\sum_{j=i + 1}^{k} n_{j}q}\\ &= \binom{\sum_{j=1 \wedge j \neq i}^{k} n_{j}}{\sum_{j=i + 1}^{k} n_{j}+r} = \binom{\sum_{j=1 \wedge j \neq i}^{k} n_{j}}{\sum_{j=1}^{i1} n_{j}r}\\ &= \binom{nn_i}{p_ir} \end{align*}$$ It looks so much nicer already! Here $p_i = \sum_{j=1}^{i1} n_i$, clearly $p_{i+1} = p_i + n_i$. Here's our answer now, $$\sum_{i=1}^{k}\sum_{m=1}^{n_i}\binom{n_i}{m}\sum_{r = 1  m}^{m  1} \binom{nn_i}{p_ir}.$$ We are pretty much done here, it does not look like an $O(n)$ time algorithm but it indeed is, using a few recurrence relations. If we define $$t_{m, i} := \sum_{r = 1  m}^{m  1} \binom{nn_i}{p_ir}$$ then we have the recurrence, $t_{m + 1, i} = t_{m, i} + \binom{nn_i}{p_im} + \binom{nn_i}{p_i+m}$ and $t_{1, i} = \binom{nn_i}{p_i}$. Our final answer becomes $$\sum_{i=1}^{k}\sum_{m=1}^{n_i}\binom{n_i}{m}t_{m,i}.$$ The time complexity of this is $O(\sum_{i=1}^{k} n_i T) = O(nT)$ where $T$ is the time complexity for calculating $\binom{n}{r}$. We can easily have $T = O(1)$ by simple computing all $n!$ and $(n!)^{1} \mod 10^9 + 7$ beforehand since $n \leq 1000$. Also, I calculated the modular inverse by using this (the last method). Remarks: This post is pretty long already :P Here's some remarks which don't really add on much to the solution, read them if you feel like it. In my implementation, rather than actually precomputing all values of $n!$ and $(n!)^{1}$ beforehand I computed them till the maximum value they were required to whenever the factorial function was called and a static variable to keep track of the largest factorial computed up until to that point, same goes for modular inverse. Also, in my original code, the one I submitted during the competition I did not precompute factorials though I did use caching for modular inverses. The solution was still well within the time limit. Here's that code. For the set $\epsilon$, I used a map structure and incremented $map[\epsilon_i]$ by $1$ each time it encountered $\epsilon_i$ in the input. You can instead also use a bookeeping array of length $2n + 1$ and increment that instead whenever you encounter $\epsilon_i$ instead, just because $A_i \leq 2n$. Also, wow this post was pretty unecessarily long I could have made it shorter, but oh well. ¯_(ツ)_/¯ answered 18 Nov '18, 21:04
Umm, why is the $\LaTeX$ rendering so weird, it looks fine in the preview.
(18 Nov '18, 21:06)
Let me fix that.
(18 Nov '18, 21:09)
It works now, I edited it. For some reason using square brackets instead of doubble dollar signs cause problem on codechef.
(18 Nov '18, 21:11)
Oh no...I overwrote those changes because I didnt saw this message. Any chance you can do it again please?
(18 Nov '18, 21:13)
Well it renders fine now, so I don't think that's necessary. Thanks! :)
(18 Nov '18, 21:15)
Cool! :) You're welcome :D
(18 Nov '18, 21:17)
showing 5 of 6
show all

What's the point of the constraint $A_i \leq 2N$? answered 17 Nov '18, 12:33

This solution gives WA for one of the test cases :https://www.codechef.com/viewsolution/21514246 answered 18 Nov '18, 22:40
