Problem LinkAuthor: Stacy Hong Tester: Misha Chorniy Editorialist: Bhuvnesh Jain DifficultyHARD PrerequisitesBranch and Bound, Dynamic Programming, Big Integers, Hashing, DFS/Recursion ProblemYou are given a string $S$ and integer $N$. So need to place some '+' symbols in the strings $S$ so that the value of the expression equals $N$. ExplanationSubtask 1: N < 1000000A simple knapsack like dynamic programming is sufficient to pass this subtask. The dp state is as follows: $$dp[i][j] = \text{1 if first 'i' characters of S can be partitioned such that the sum is 'j' else 0}$$ A recursive pseudocode for the above logic is given below:
The above code works in time complexity $O(S * N)$ where $S$ is the length of string $S$ and $N$ is the given integer. The space complexity is also same. Note that Codechef servers allow programs using such large memory (around 1G) to pass but you can you bitset or other equivalent structures to reduce the memory of your program too. See the linked solution below for subtask 1 for more details. The solution for this subtask also exists using hashing and dynamic programming. Subtask 2: N < SAdding 2 numbers of sizes $x$ and $y$ leads to final result having size at most $max(x, y) + 1$. This is easy to prove. The size will be $max(x, y)$ if there is no carryover in the end otherwise it will increase by 1. Extending the above logic to addition of $m$ number, we can conclude that if the numbers have lengths $x_1, x_2, \cdots, x_m$, then the length of final result is bounded by $(max(x_1 + x_2 + \cdots + x_m) + m  1)$. You can easily prove it using induction. Note that number of ways to partition the string $S$ into different expression is exponential. But using the above observation, you can the conclude the following fact: Given a string $S$ and integer $N$, having the length as $n$, there is very less number way to achieve the target $N$ if $n$ is comparable to $S$. This is because most of the partitions will either have the maximum number in them as too low. Equivalently, if the number of partitions we put in $S$ is large, then achieving a larger target $N$ is not possible. This hints that for sufficiently large integers, the greedy technique of choosing a larger size partition and checking if the remaining part can add up to desired $N$ will run very fast in practice as the number of iterations will not be large. For example: $S = 114390211, N = 43915$ Considering greedily large size partition of $S$ such that their value is less than $N$, we have the following numbers: [11439, 14390, 43902, 39021]. (Note that the size of this selected numbers can be at most (S  n + 1).) Let us greedily start with $43902$. The remaining parts of $S$ are $(11, 11)$ and the required sum now is $(43915  43902) = 13$. Note that there are 2 ways to achieve it $(11 + 1 + 1) \text{ or } (1 + 1 + 11)$. As you can see, the number of iterations were less. The example is just a short one to make to understand how greedy recursion might behave for larger test cases. But the other case where the integer $N$ is small but $S$ is very large, there can be large number of ways to achieve the desired result. For this, we have already seen that a dynamic programming solution already exists (Subtask 1). Trying the above greedy approach can be very bad in this case, a simple example being $(S = 99999999999, N = 108)$. With some of the above ideas, we design a simple branch and bound based algorithm for the problem:
Note the above is a simple solution based on the initial observations. But do we need to really check for all possible ranges? Can we decide greedily at some point that given range can never result in an answer as $N$, i.e. Say we have selected some ranges, can we say that with the remaining ones we can never achieve the target $N$ without checking all possible partitions or greedily checking large number ranges. Actually, given initial choice of our ranges, we can bound the maximum number we can achieve with the remaining ones. A simple check which ignores the digits of remaining parts of $S$ and considers all of them to be $9$ and finds the maximum value possible is a good starting point. If this value is already less than $N$, then we can simple prune our solution. Even stronger checks based on actual values of digits in string $S$ can lead to better pruning. So, the loop in the above code modifies as follows:
Another thing we can do is to remove the early exit of recursion where a check is based on "remain < 0". This can be easily done by directly starting from ranges such that value of considered numbers is always less than "remain". This is again helpful as after greedily choosing a large size partition, it is possible in most case the other large size partitions should be ignored in further recursion either due to conflicts in common ranges or "remain" decreasing too fast to become less than $0$. For this, a simple binary search can help us to find the first index in "RANGES" from where we should begin our search. This is possible as we had initially stored our ranges in decreasing order of the value of integers they represent. With the above ideas, the recursive solution based on branch and bound works quite fast in practice. A small analysis of the time complexity is as follows:
A more detailed analysis of time complexity is will available soon. Ashmelev solution (Fastest in the contest): Branch and Bound with strong bound checkingTips for Big integer library (mostly for C++ users)Languages like python and java already have big integer library implemented in them. But C++ users need to implement the same for their usage in this problem. A small tip is to store the numbers as groups instead of single digits. For example: $S = 123456789123456789$. Below are 2 possible ways to store $S$: $$S = 123456789123456789$$ $$S = 123456789123456789$$ This helps to perform operations on base different than 10 and reduces the constant factor of your program. Generally, the base is chosen as ${10}^{9}$ so that all possible operations like $(+, , * , /)$ fit inside the datatypes provided by the language. You can see setter's library for example. Time ComplexityTo be described in detail later. Space ComplexityTo be described in detail later. Solution LinksThe solution for subtask 1 can be found here Setter's solution can be found here Ashmelev's solution can be found here
This question is marked "community wiki".
asked 14 Jun, 03:24

The standard solution seems not very promising, for example, check this data:
There should be a solution:
While in fact no AC solution in contest (except java ones, I didn't have a java compiler installed) or even std can produce a solution in reasonably small time. For example see this link on ideone. answered 14 Jun, 07:16
I was first to solve PLUSEQ, but realize my solution is quite bad. It can handle cases with either small targets or targets with some large terms. It will completely fail for large targets consisting of a lot of similar sized small terms, like the case you gave. I honestly have no clue how to solve that case in a timely manner.
(14 Jun, 07:41)
1
Yeah, I think this problem itself is in a NPHish manner, so I was quite impressed to see it appear on formal contests with no restriction on the data :/
(14 Jun, 07:43)
1
This fairly simple solution in pypy takes about 1 minute to solve that case (compared with about 18 hours for Ashmelev's; I didn't time fjzzq's solution). It is more robust against selfsimilar instances like fjzzq's example, but it is quite a memoryhungry solution and can still get tripped up by some other selfsimilar examples.
(19 Jun, 06:11)
I tried fjzzq's program but it got stopped by a power cut(!) so all I know is that it takes at least 8 hours to find a solution.
(19 Jun, 19:02)
I used a different method to handle this kind of case (see my answer below). It now takes about 1 second in python to find a solution to fjzzq's awkward case.
(25 Jun, 02:09)
showing 5 of 6
show all

I don't know if anyone's still reading this editorial or if this comment is too late, but I have a solution for this problem that appears to avoid the extremely bad worst cases mentioned above. Edited to add: Some notation:$$ S = \textrm{the string considered as a decimal integer} $$ $$ s = \log_{10}(S) = \textrm{the length of }S $$ $$ N = \textrm{the target value} $$ $$ n = \log_{10}(N) = \textrm{the length of the target value} $$ Solution 1There is a search tree whose root is $(S,N)$, and whose general node is (set of remaining substrings of $S$, target value). The children nodes are those you get by removing a substring from the current set of strings and subtracting its value from the current target to make the child's target. This solution aims to strike a balance between breadthfirst search and depthfirst search, doing exploration (BFS) nearer the top of the tree and something like DFS nearer the bottom. Simple DFS is not as good because it refuses to spend a small amount of time at the top of the tree looking for a better situation to exhaustively search from. Here there is a priority queue of unexpanded nodes, and the priority of node $x=((S_1,\ldots,S_k),N')$ is given by $$ p(x)=\log(N')+\lambda.k+\mu.\textrm{depth}(x)\nu\sum_iS_i $$ (smaller is better), where $S_i$ is the length of the substring $S_i$ (up to 120 here). $\lambda, \mu$ and $\nu$ are constants. I found $\lambda=7$, $\mu=1.75$, $\nu=0.1$ worked fairly well. This prioritises smaller target values, lessfragmented set of substrings (smaller $k$), shallower depth in the tree and more total substring remaining to work with. (I'm sure there are better priority functions, but this one is reasonably good and simple to evaluate.) At each step it expands the best node (lowest $p(x)$), then chases down the best child from that node, the best child from that child etc. until it reaches the bottom of the tree, which is either a solution ($N'=0$) or a node which it can prove with (precalculated) min/max bounds can never reach a solution. This chasingdown phase prevents it purely expanding highpriority nodes and indefinitely postponing actually trying to find a solution. It also keeps a record of which nodes it has visited to avoid duplicate searches. If the same substrings occur in a different position then it treats these as the same, so it is doing more than avoiding duplicate bitmaps of where the substrings occur. This helps in the case where the original string is very selfsimilar (which is the difficult case). For example, if the original string is 1212126121212121212, then the two nodes (121212612,121212,xxxx) and (121212612,xxxx,121212) (where xxxx indicates a substring that has been removed) are regarded as the same, because the sets of remaining substrings are the same, even though they come from different positions in the original string. This is an example implementation in python/pypy. It's not written in a superoptimised way from the point of view of lowlevel coding, e.g., the nodes are represented as tuples of tuples, rather than bitmaps or something, but hopefully the underlying algorithm is robust enough that it doesn't have hidden bad cases. For example, it solves fjzzq2002's difficult example 11111111111111111111111111111111121111111111111111111111111 [continued] 1111111311111111111111111111111111111111114111111111111111 122531659665 in about 0.25 seconds. Solution 2 (twophase algorithm)This handles the case separately where there are many repetitions of a single digit in $S$. It's not meant to handle the general case of more random $S$, where picking off the biggest number works reasonably well, so it ought to be used in conjunction with a more general algorithm like solution 1. My initial implementation of solution 1 was less efficient than it is now, so the twophase algorithm was necessary to handle the cases of large numbers of repetitions of a single digit, but now solution 1 is reasonably good even in high repetition cases, so the twophase algorithm is less necessary. Anyway, let's say the majority digit is "1" to make the explanation simple  it works in a similar way for any other nonzero digit (zero is treated slightly differently). It works by first finding a "virtual" decomposition where (a) the the non1s are given positions within their substrings (effectively they are each given a powerof10 weighting), and (b) the digits of S are restricted into substrings of specified lengths, but at unspecified locations. The next phase tries to realise this set of constraints in an actual decomposition of $S$ into substrings. For example, in fjzzq2002's example, there are three non1 digits: 2, 3 and 4. The first phase might find the following formula: digit 2 is in place 5, digit 3 is in place 7, digit 4 is in 3, and the set of substring lengths are 4, 6, 6, 6, 6, 8, 8, 8, 8, 8, 8, 9, 9, 11, 12. This works because $$(21)\times10^5+(31)\times10^7+(41)\times10^3+1111+111111\times4+11111111\times6+$$ $$111111111\times2+11111111111+111111111111=122531659665.$$ Then phase 1 might find a realisation of this like 111111111+11111111+111111+11111111+11211111+111111+111111+11111111111+ 111131111111+111111+11111111+111111111+11114111+1111+11111111 where the substring lengths are taken in the order 9, 8, 6, 8, 8, 6, 6, 11, 12, 6, 8, 9, 8, 4, 8. (Note that these substrings include the non1 digits.) This is an example implementation. It switches to the twophase algorithm if the proportion of nonmajority digits is less than 0.052, otherwise it uses solution 1 above. Analysis part 1First consider the case where $S$ is a uniform random integer of size $s$. Fix $S$ for the moment and consider different values of $n$. For large $n$ the problem is overconstrained and morally there are no solutions, though there will always be the planted one. If you decrease $n$ (i.e., imagine different problem instances with smaller and smaller $n$) then eventually there will be a kind of phase change at some critical value, $n_c(s)$, of $n$. Below $n_c(s)$ the problem is underconstrained and unplanted solutions will spontaneously pop into existence. In other words there will be many solutions for $n<n_c(s)$. In our case of $s=120$, I think (from experimentation) the critical value, $n_c(120)$ is something like 20 or 21. You can get a good experimental idea of this by creating a problem instance and then solving it. If $n>n_c(s)$ then you have to get back the same solution you started with, but if $n<n_c(s)$ you will probably get back some random alternative solution. (Really we only care about whether the large summands are unique. It's possible that some of the smaller ones can be varied, but this won't affect the search very much.) In these kinds of situations, the most difficult value of $n$, from the point of view of finding a solution, is the critical value $n_c(s)$. Or in other words, if you are the setter and want to make the problem as hard as possible, you should choose something like $n=n_c(s)$. For $n<n_c(s)$ there are lots of solutions, which means we only need to search a fraction of the searchspace to find one and we can use the extra freedom to our advantage to search solutionrich areas of the searchspace first. For $n>n_c(s)$ there is only the planted solution, and in general we have to search the whole searchspace to find it, but the bigger $n$ is, the smaller the searchspace, so again the most difficult value of $n$ is $n_c(s)$. What is $n_c(s)$ and how does it affect the amount of searching you have to do? Here is a very rough argument which is nonetheless hopefully good enough to get some idea of what is going on. It relies on subtracting the biggest number from the current target value at each step. Since there are other options, this should underestimate the chance of finding a solution, so underestimate the critical value $n_c(s)$. But hopefully subtracting the biggest number is a good enough approximation to make this calculation somewhat meaningful. When you choose a summand from $S$ you can thinking of it as using up the resource $S$ in order to reduce $N$ as much as possible (keeping it nonnegative of course). It's essentially not a problem to reduce $N$ too much, because it's easy to back off and make the sum smaller by breaking down the summands (e.g., replace 917 by 91+7). If you do this then there will be plenty of freedom to choose any small sum you want. So really the game is to keep reducing $N$ moreorless as much as possible. How much can you expect to do this on the first step? Assuming $n\ll s$, there will be about $sn\approx s$ positions in the string $S$ to choose a substring of length $n$. The minimum of $s$ random integers less than $N$ is about $N/s$, so in one step we expect to reduce $N$ to about $N/s$ at the expense of decreasing $s$ by $n$. (You can see how rough and ready this argument is. It doesn't distinguish between taking a chunk from the end of $S$ which is less damaging than taking a chunk out of the middle. And it doesn't take account of the fact that sometimes you don't want to subtract the maximum possible value from $N$. But let's cross our fingers and continue.) If we make the operations $N\to N/s$ and $s\to sn$ continuous, we get differential equations: $$ \dot n = \log_{10}(s) $$ $$ \dot s = n $$ Dividing them to get rid of the timedependence, we get $$dn/ds = \log_{10}(s)/n$$ which has the solution $$\frac12 n^2 = s\log_{10}(s)s/\log(10)+C$$ Whether $n<0$ (meaning we managed to reduce target to 0) or $n>0$ (we didn't manage to do so) at $s=0$ will determine whether there are spontaneous solutions or not, and if $n=0$ at $s=0$ then we have criticality  you expect just about 1 nonplanted solution. So criticality corresponds to $C=0$, which gives $$n_c(s)=\sqrt{2(s\log_{10}(s)s/\log(10))}=\sqrt{2s\log_{10}(s/e)}.$$ If you try $s=120$ in this formula you get $n_c(120)=19.9$. The true value is probably around 20 or 21, so it's a bit flukily worked more accurately than we had a right to expect. Analysis part 2How is this related to the time taken to find a solution? (To be continued, maybe.) answered 25 Jun, 02:16
I would really like to know it!! Please explain it :)
(25 Jun, 11:35)
@alexthelemon , @vivek_1998299 is very much interested. Please explain your solution now :p :) xD
(25 Jun, 20:50)
1
I'll write something soon  sorry been a bit busy!
(29 Jun, 16:46)
I added something. Sorry, I was meaning to write more, but didn't have the time so I wrote what I had.
(05 Jul, 08:40)
simply amazing!!! Thank you so much..
(05 Jul, 12:43)
Very decent explanation!!
(05 Jul, 15:16)
showing 5 of 6
show all

The first thing  we just use the branch and bound approach with recursive iteration function. So it looks like a simple DFS  lets select the first term (substring), fix it, then select the second term and so on. If we failed to find a correct solution in some branch, we go back and change the selected terms. Of course it works extremely long time and I think that it is impossible to prove that some optimizations can solve any test in the given time limit. Now, we have to improve the solution to make is significantly faster. In general there are two useful methods: 1. make operations asymptotically faster 2. detect states where we cannot find the solution and stop deeper recursion in such cases The first type optimization are just implementationdepended, there are no useful ideas regarding the problem. For example, if we are selecting the next substring which will be used as a single term, we should check, whether it intersects with already selected substrings (if we already took substring [1..5], we cannot use substring [4...11] for example). The naive implementation iterates through all the positions and checks whether they are empty. It requires about 120 (full string length) operations in the worst case. But we may replace array of 120 boolean values by two long long (64 bit) variables  w1, w2, and we assume that position X is empty if the Xth bit of w1 is 0 (or (X60)th bit of w2 is 0 for X >= 60). So we have only 2 bitmask operations instead of 120. well this was one question that was easy to undertand but too difficult to fit in TL:) answered 25 Jun, 22:12

"A more detailed analysis of time complexity is will available soon.",waiting for this eagerly :),please post it soon.
"Soon"....for some reason this word brings up bad memories of past xD
I have mailed the author of the problem regarding his complexity analysis about 10 days back but got no reply yet. The editorial is completely based on my understanding and his insights of his solution. I couldn't solve the problem and can't figure out the complexity myself.
@likecs  Thats sad :( . Can even @mgch not help here? He was the tester and is quite reachable as far as my experience goes. Please do try that!