×

PLUSEQ - Editorial

Practice

Contest

Author: Stacy Hong

Tester: Misha Chorniy

Editorialist: Bhuvnesh Jain

HARD

Prerequisites

Branch and Bound, Dynamic Programming, Big Integers, Hashing, DFS/Recursion

Problem

You are given a string $S$ and integer $N$. So need to place some '+' symbols in the strings $S$ so that the value of the expression equals $N$.

Explanation

A simple knapsack like dynamic programming is sufficient to pass this subtask. The dp state is as follows:

$$dp[i][j] = \text{1 if first 'i' characters of S can be partitioned such that the sum is 'j' else 0}$$

A recursive pseudo-code for the above logic is given below:


def solve(int idx, int remain):
if idx == len(s):
return remain == 0
if dp[idx][remain] != -1:
return dp[idx][remain]
dp[idx][remain] = 0
num = 0
for i in [idx, len(s) - 1]:
num = num * 10 + s[i] - '0'
if (num > remain):
break
if (solve(i + 1, remain - num)):
PLUS[i] = 1;
return dp[idx][remain] = 1
return dp[idx][remain];

solve(0, N)     # N = integer value of string s in input
for i in [0, len(s) - 1]:
if i and PLUS[i]:
print '+'
print s[i]



The above code works in time complexity $O(|S| * N)$ where $|S|$ is the length of string $S$ and $N$ is the given integer. The space complexity is also same. Note that Codechef servers allow programs using such large memory (around 1G) to pass but you can you bitset or other equivalent structures to reduce the memory of your program too. See the linked solution below for subtask 1 for more details.

The solution for this subtask also exists using hashing and dynamic programming.

Adding 2 numbers of sizes $x$ and $y$ leads to final result having size at most $max(x, y) + 1$. This is easy to prove. The size will be $max(x, y)$ if there is no carry-over in the end otherwise it will increase by 1. Extending the above logic to addition of $m$ number, we can conclude that if the numbers have lengths $x_1, x_2, \cdots, x_m$, then the length of final result is bounded by $(max(x_1 + x_2 + \cdots + x_m) + m - 1)$. You can easily prove it using induction.

Note that number of ways to partition the string $S$ into different expression is exponential. But using the above observation, you can the conclude the following fact:

Given a string $S$ and integer $N$, having the length as $n$, there is very less number way to achieve the target $N$ if $n$ is comparable to $|S|$. This is because most of the partitions will either have the maximum number in them as too low. Equivalently, if the number of partitions we put in $S$ is large, then achieving a larger target $N$ is not possible. This hints that for sufficiently large integers, the greedy technique of choosing a larger size partition and checking if the remaining part can add up to desired $N$ will run very fast in practice as the number of iterations will not be large. For example: $S = 114390211, N = 43915$

Considering greedily large size partition of $S$ such that their value is less than $N$, we have the following numbers: [11439, 14390, 43902, 39021]. (Note that the size of this selected numbers can be at most (|S| - n + 1).) Let us greedily start with $43902$. The remaining parts of $S$ are $(11, 11)$ and the required sum now is $(43915 - 43902) = 13$. Note that there are 2 ways to achieve it $(11 + 1 + 1) \text{ or } (1 + 1 + 11)$. As you can see, the number of iterations were less. The example is just a short one to make to understand how greedy recursion might behave for larger test cases.

But the other case where the integer $N$ is small but $|S|$ is very large, there can be large number of ways to achieve the desired result. For this, we have already seen that a dynamic programming solution already exists (Subtask 1). Trying the above greedy approach can be very bad in this case, a simple example being $(S = 99999999999, N = 108)$.

With some of the above ideas, we design a simple branch and bound based algorithm for the problem:


range_tried = [0 for i in [1, len(S) - 1]]
# Find all range in S such that their value is less than N and sort
# them in decreasing order of their value. Let it be called "RANGES"

def recurse(range_idx, remain):
if remain < 0:
return false
if remain < LIMIT:      # LIMIT ~ 100000
else:
for i in [range_idx, len(RANGES) - 1]:
if conflict(current_range, range_tried):
# If current range we are trying conflicts with one
# already tried in recursion before.
continue

X = value of integer for RANGES[i]
# Update range_tried
if recurse(range_idx, remain - X):
return True
# Update range_tried to old value
return False



Note the above is a simple solution based on the initial observations. But do we need to really check for all possible ranges? Can we decide greedily at some point that given range can never result in an answer as $N$, i.e. Say we have selected some ranges, can we say that with the remaining ones we can never achieve the target $N$ without checking all possible partitions or greedily checking large number ranges.

Actually, given initial choice of our ranges, we can bound the maximum number we can achieve with the remaining ones. A simple check which ignores the digits of remaining parts of $S$ and considers all of them to be $9$ and finds the maximum value possible is a good starting point. If this value is already less than $N$, then we can simple prune our solution. Even stronger checks based on actual values of digits in string $S$ can lead to better pruning. So, the loop in the above code modifies as follows:


for i in [range_idx, len(RANGES) - 1]:
if conflict(current_range, range_tried):
# If current range we are trying conflicts with one
# already tried in recursion before.
continue
MAX_POSSIBLE = get_max(ranges_set)
if MAX_POSSIBLE < N:
# As we iterate in decreasing order of numbers, the
# MAX_POSSIBLE can only decrease as 'i' increases
break

X = value of integer for RANGES[i]
# Update range_tried
if recurse(range_idx, remain - X):
return True
# Update range_tried to old value



Another thing we can do is to remove the early exit of recursion where a check is based on "remain < 0". This can be easily done by directly starting from ranges such that value of considered numbers is always less than "remain". This is again helpful as after greedily choosing a large size partition, it is possible in most case the other large size partitions should be ignored in further recursion either due to conflicts in common ranges or "remain" decreasing too fast to become less than $0$. For this, a simple binary search can help us to find the first index in "RANGES" from where we should begin our search. This is possible as we had initially stored our ranges in decreasing order of the value of integers they represent.

With the above ideas, the recursive solution based on branch and bound works quite fast in practice. A small analysis of the time complexity is as follows:

1. $N < 100000$: Assuming we set LIMIT to this value in above pseudo-code. A dynamic programming based solution is run with complexity $O(|S| * N)$. Again to optimise this, we can use recursive version instead of iterative version. This is because will work in exactly the above complexity but in recursive solution, most of the states will be unvisited due to our initial choices of the ranges. This will significantly reduce the constant factor in your code.

2. For most of the other cases, as $N$ becomes larger i.e. size $n$ reaches closer to $|S|$, the possible solutions reduce significantly as explained before.

A more detailed analysis of time complexity is will available soon.

View Content

Tips for Big integer library (mostly for C++ users)

Languages like python and java already have big integer library implemented in them. But C++ users need to implement the same for their usage in this problem. A small tip is to store the numbers as groups instead of single digits. For example: $S = 123456789123456789$. Below are 2 possible ways to store $S$:

$$S = 1|2|3|4|5|6|7|8|9|1|2|3|4|5|6|7|8|9$$

$$S = 123456|789123|456789$$

This helps to perform operations on base different than 10 and reduces the constant factor of your program. Generally, the base is chosen as ${10}^{9}$ so that all possible operations like $(+, -, * , /)$ fit inside the datatypes provided by the language. You can see setter's library for example.

Time Complexity

To be described in detail later.

Space Complexity

To be described in detail later.

The solution for subtask 1 can be found here

Setter's solution can be found here

Ashmelev's solution can be found here

This question is marked "community wiki".

6★likecs
3.4k1356
accept rate: 9%

"A more detailed analysis of time complexity is will available soon.",waiting for this eagerly :),please post it soon.

(21 Jun, 12:22)

"Soon"....for some reason this word brings up bad memories of past xD

(21 Jun, 13:21)

 5 The standard solution seems not very promising, for example, check this data: 111111111111111111111111111111111211111111111111111111111111111111311111111111111111111111111111111114111111111111111 122531659665 There should be a solution: 11111111+11111111+111111+111111+11111211111+11111111+111111111+111111+111131111111+111111+11111111+11111111+111114111+11111111+1111 While in fact no AC solution in contest (except java ones, I didn't have a java compiler installed) or even std can produce a solution in reasonably small time. For example see this link on ideone. answered 14 Jun, 07:16 66●2 accept rate: 100% 13.6k●1●10●36 (14 Jun, 07:17) I was first to solve PLUSEQ, but realize my solution is quite bad. It can handle cases with either small targets or targets with some large terms. It will completely fail for large targets consisting of a lot of similar sized small terms, like the case you gave. I honestly have no clue how to solve that case in a timely manner. (14 Jun, 07:41) algmyr6★ 1 Yeah, I think this problem itself is in a NPH-ish manner, so I was quite impressed to see it appear on formal contests with no restriction on the data :/ (14 Jun, 07:43) 1 This fairly simple solution in pypy takes about 1 minute to solve that case (compared with about 18 hours for Ashmelev's; I didn't time fjzzq's solution). It is more robust against self-similar instances like fjzzq's example, but it is quite a memory-hungry solution and can still get tripped up by some other self-similar examples. (19 Jun, 06:11) I tried fjzzq's program but it got stopped by a power cut(!) so all I know is that it takes at least 8 hours to find a solution. (19 Jun, 19:02)
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×14,366
×1,754
×1,206
×160
×84
×19
×13
×7