PERMPART - EDITORIAL

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2

Setter: Hasan Jaddouh
Tester: Aleksa Plasvic
Editorialist: Taranpreet Singh

DIFFICULTY:

Medium

PREREQUISITES:

Observations, Dynamic-Programming. (Or Greedy with even more observations).

PROBLEM:

Given an array A of N integers, and we are allowed in a single operation to either add value or remove once occurrence of an existing value from the sequence. Find the minimum number of operations needed so that after applying all operations, we can divide the sequence into L parts such that for each part having length x shall have exactly one occurrence of each value from 1 to x.

For explanation, I am defining term N-straight as a sequence from 1 to N including all the elements exactly once. So the problem is to find the minimum number of operations so that the resulting sequence can be divided into a set of straights such that each element is present in exactly one straight.

QUICK EXPLANATION

  • There always exist a way with N moves where we delete all elements. So, the final answer cannot exceed N.
  • By adding N elements in worst case, we can have any permutation up to 2*N only. So, any integer value in A greater than 2*N shall be deleted.
  • Now, considering integers from 1 to 2*N in order, we try to calculate the cost of dividing all elements less than or equal current value into y straights for each valid y, using dynamic programming.
  • Minimum cost of forming y x-straights is given by |f(x) - y| plus Minimum of Cost for forming z \geq y (x-1)-straights, where f(x) denote frequency of x in A.
  • For current value x, The maximal number of straights with at most 2*N elements is \left\lfloor (2*N)/x \right\rfloor. This restricts the total number of states of DP to \sum_i \left\lfloor (2*N)/i \right\rfloor which is of the order of N*log_2N.

EXPLANATION

This solution proceed step by step, so don’t move to next paragraph before understanding previous ones. You have been warned.

First of all, let us find an upper bound on the answer. If we delete all elements, we can definitely have a valid sequence (though empty :P) which is valid. So we can always have a valid sequence in N operations.

Suppose we add N elements to sequence. Now, there are 2*N elements in sequence. So, it is not possible to have any x-straight for x > 2*N. So, to minimize the number of operations, we have to remove all integers in A which are greater than 2*N.

Now, we have only elements below or equal to 2*N. So, Let us use the dynamic programming here to find out minimum additions/deletions in the increasing order of value, starting from 1 to 2*N.

For each current number x from 1 to 2*N, we want to find out for each valid y, the minimum number of operations needed over all elements from 1 to x such that there are exactly y x-straights (may include other straights, which are not x-straights). The idea behind this is that once we have an (x-1)-straight, we can easily convert it into an x-straight by adding an occurrence of x in it. Further, we can also choose not to extend any straight from (x-1)-straight to x-straight.

So, for current element x, suppose we want to keep y x-straights, given by DP[x][y]. If f(x) denote frequency of x in given A, The minimum number of operations required is |f(x)-y| plus minimum number of operation to have at least y (x-1)-straights. The reason is, that for current element x, we have f(x) occurrences already present, but we want exactly y. So we add them (or remove them) and this is counted by the term |f(x)-y|. Now, to have y x-straights, we need at least y (x-1)-straights, which we have already calculated while considering x-1 as the current element. We choose to have z (x-1) straights, such that z \geq y and DP[x-1][z] is minimum. The minimum value of each suffix can now be calculated easily, to get minimum operations to have at least y (x-1)-straights.

Now let us compute the number of states of this seemingly N^2 dynamic programming. But the trick here lies in the number of valid values of y. For current element x, each x-straight needs x elements, and there are total 2*N elements. So, there can be at max \left\lfloor (2*N)/x \right\rfloor x-straights. So, the total number of states for this dynamic programming is \sum_{i = 1}^{2*N} 1+\left\lfloor \frac{2*N}{i} \right\rfloor which is of the order of N*log_2(N). Hence, we have solved this problem in O(N*logN) time and memory.

Bonus
There also exists a greedy solution to this problem. Can you find it?

Secondly, Can this problem be solved in O(N) time using any approach?

Time Complexity

Time complexity is O(N*log(N)) per test case.

SOLUTIONS:

Setter's Solution
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <assert.h>
using namespace std;
 
 
 
long long readInt(long long l,long long r,char endd){
	long long x=0;
	int cnt=0;
	int fi=-1;
	bool is_neg=false;
	while(true){
		char g=getchar();
		if(g=='-'){
			assert(fi==-1);
			is_neg=true;
			continue;
		}
		if('0'<=g && g<='9'){
			x*=10;
			x+=g-'0';
			if(cnt==0){
				fi=g-'0';
			}
			cnt++;
			assert(fi!=0 || cnt==1);
			assert(fi!=0 || is_neg==false);
		
			assert(!(cnt>19 || ( cnt==19 && fi>1) ));
		} else if(g==endd){
			assert(cnt>0);
			if(is_neg){
				x= -x;
			}
			assert(l<=x && x<=r);
			return x;
		} else {
			assert(false);
		}
	}
}
string readString(int l,int r,char endd){
	string ret="";
	int cnt=0;
	while(true){
		char g=getchar();
		assert(g!=-1);
		if(g==endd){
			break;
		}
		cnt++;
		ret+=g;
	}
	assert(l<=cnt && cnt<=r);
	return ret;
}
long long readIntSp(long long l,long long r){
	return readInt(l,r,' ');
}
long long readIntLn(long long l,long long r){
	return readInt(l,r,'\n');
}
string readStringLn(int l,int r){
	return readString(l,r,'\n');
}
string readStringSp(int l,int r){
	return readString(l,r,' ');
}
 
int n;
int T;
int sm_n=0;
int arr[2001001];
vector<int> dp[2002002];
 
 
 
int main(){
	//freopen("09.txt","r",stdin);
	//freopen("09o.txt","w",stdout);
	T=readIntLn(1,1000);
	while(T--){
		n=readIntLn(1,1000000);
		sm_n+=n;
		assert(sm_n<=1000000);
		for(int i=1;i<=2*n;i++){
			arr[i]=0;
			dp[i].clear();
			dp[i].resize(2*n/i + 1, 0);
		}
		int sl=0;
		for(int i=1;i<=n;i++){
			int x=0;
			if(i==n){
				x=readIntLn(1,1000000000);
			} else {
				x=readIntSp(1,1000000000);
			}
			if(x<=2*n){
				arr[x]++;
			} else {
				sl++;
			}
		}
		int s =dp[1].size();
		for(int i=s-1;i>=0;i--){
			dp[1][i] = abs(arr[1] - i);
			if(i!=s-1){
				dp[1][i]=min(dp[1][i],dp[1][i+1]);
			}
		}
		for(int i=2;i<=2*n;i++){
			int s=dp[i].size();
			for(int j=s-1;j>=0;j--){
				dp[i][j] = dp[i-1][j] + abs(arr[i]-j);
				if(j!=s-1){
					dp[i][j] = min(dp[i][j],dp[i][j+1]);
				}
			}
		}
		cout<<sl+min(dp[2*n][0],dp[2*n][1])<<endl;
	}
	assert(getchar()==-1);
}
Tester's Solution
#include<bits/stdc++.h>
 
using namespace std;
 
#define pb push_back
 
const int maxN = 2e6+10;
vector<int> dp[maxN];
int a[maxN], cnt[maxN];
int sum;
void solve()
{
	int n;
	cin>>n;
	sum+=n;
	assert(n>=1 && n<=1000000);
 
	for (int i=1;i<=2*n;i++){
	 dp[i].clear();
	 cnt[i]=0;
	}
 
	int ans = 0;
	for (int i=1;i<=n;i++){
	    scanf("%d",&a[i]);
	    assert(a[i]>=1 && a[i]<=1e9);
	    if (a[i]>2*n) ans++; else
	        cnt[a[i]]++;
	}
 
	for (int i=0;i<=2*n;i++)
	    dp[0].pb(0);
 
	for (int i=1;i<=2*n;i++){
	 for (int j=0;j<=(2*n)/i;j++)
	    dp[i].pb(dp[i-1][j]+abs(cnt[i]-j));
 
	 for (int j=(2*n)/i -1;j>=0;j--)
	    dp[i][j]=min(dp[i][j],dp[i][j+1]);
	}
 
	ans+=min(dp[2*n][0],dp[2*n][1]);
 
	printf("%d\n",ans);
  return;
}
int main()
{
	int t;
	cin>>t;
 
	assert(t>0 && t<=1000);
 
	while(t--)
	    solve();
 
	assert(sum<=1e6 && sum>0);
	return 0;
}
Editorialist's Solution
import java.util.*;
import java.io.*;
import java.text.*;
//Solution Credits: Taranpreet Singh
class PARTPERM{ 
	//SOLUTION BEGIN
	void pre() throws Exception{}
	void solve(int TC) throws Exception{
	    int n = ni(), ans = 0;
	    int[] f = new int[1+2*n];
	    for(int i = 0; i< n; i++){
	        //values above 2*n shall always be deleted.
	        int x = ni();if(x>2*n)ans++;
	        else f[x]++;
	    }
	    long[][] dp = new long[1+2*n][];
	    dp[0] = new long[1+2*n];
	    for(int cur = 1; cur<= 2*n; cur++){
	        //mx - Maximum number of straights possible using total 2*n elements
	        int mx = (2*n)/cur;
	        dp[cur] = new long[1+mx];
	        //Calculating cost of having st straights from 1 to cur
	        for(int st = 0; st<= mx; st++)dp[cur][st] = dp[cur-1][st] + Math.abs(f[cur]-st);
	        //dp[cur][st] - Min cost of having at least st straights of length cur
	        for(int st = mx-1; st>= 0; st--)dp[cur][st] = Math.min(dp[cur][st], dp[cur][st+1]);
	    }
	    //Taking minimum cost of adding elements.
	    ans+=Math.min(dp[2*n][0], dp[2*n][1]);
	    pn(ans);
	}
	//SOLUTION END
	void hold(boolean b)throws Exception{if(!b)throw new Exception("Hold right there, Sparky!");}
	long mod = (long)1e9+7, IINF = (long)1e18;
	final int INF = (int)1e9, MX = (int)5e5+1;
	DecimalFormat df = new DecimalFormat("0.00000000000");
	double PI = 3.1415926535897932384626433832792884197169399375105820974944, eps = 1e-8;
	static boolean multipleTC = true, memory = false;
	FastReader in;PrintWriter out;
	void run() throws Exception{
	    in = new FastReader();
	    out = new PrintWriter(System.out);
	    int T = (multipleTC)?ni():1;
	    //Solution Credits: Taranpreet Singh
	    pre();for(int t = 1; t<= T; t++)solve(t);
	    out.flush();
	    out.close();
	}
	public static void main(String[] args) throws Exception{
	    if(memory)new Thread(null, new Runnable() {public void run(){try{new PARTPERM().run();}catch(Exception e){e.printStackTrace();}}}, "1", 1 << 28).start();
	    else new PARTPERM().run();
	}
	long gcd(long a, long b){return (b==0)?a:gcd(b,a%b);}
	int gcd(int a, int b){return (b==0)?a:gcd(b,a%b);}
	int bit(long n){return (n==0)?0:(1+bit(n&(n-1)));}
	void p(Object o){out.print(o);}
	void pn(Object o){out.println(o);}
	void pni(Object o){out.println(o);out.flush();}
	String n()throws Exception{return in.next();}
	String nln()throws Exception{return in.nextLine();}
	int ni()throws Exception{return Integer.parseInt(in.next());}
	long nl()throws Exception{return Long.parseLong(in.next());}
	double nd()throws Exception{return Double.parseDouble(in.next());}
 
	class FastReader{
	    BufferedReader br;
	    StringTokenizer st;
	    public FastReader(){
	        br = new BufferedReader(new InputStreamReader(System.in));
	    }
 
	    public FastReader(String s) throws Exception{
	        br = new BufferedReader(new FileReader(s));
	    }
 
	    String next() throws Exception{
	        while (st == null || !st.hasMoreElements()){
	            try{
	                st = new StringTokenizer(br.readLine());
	            }catch (IOException  e){
	                throw new Exception(e.toString());
	            }
	        }
	        return st.nextToken();
	    }
 
	    String nextLine() throws Exception{
	        String str = "";
	        try{   
	            str = br.readLine();
	        }catch (IOException e){
	            throw new Exception(e.toString());
	        }  
	        return str;
	    }
	}
} 

Feel free to Share your approach, If it differs. Suggestions are always welcomed. :slight_smile:

6 Likes

This was an amazing question. Loved the editorial as well. But there was one thing that I could not understand. Why are we going in reverse as well.

After calculating
dp[x][y] = dp[x-1][y] + abs(freq(x)-y);
Why are we going in reverse and calculating
dp[x][y] = min(dp[x][y],dp[x][y+1]);

And one more thing. I am trying your method but unable to pass the last test case. Since I am passing other test cases easily, it is evident that implementation is correct and there is some conceptual problem.

Please have a look here and help me thanks.
Link to submission

DP[x][y] exactly needs freq(x) to be y,
but it can have more than y (x-1)-straights (atleast y).

dp[x][y] = min(dp[x][y],dp[x][y+1]);

Reverse and calculating is done ,so that for state x+1 we get optimal answer

1 Like

Yes, I get the second part that it is to optimize for next state but doesn’t that violate what our dp store?

dp[x][y] = dp[x-1][y] + abs(freq(x)-y)

We are confident about our equation because we know that dp[x][y] stores the minimum operation to be performed to get to y straights of 1-x but lets say
Our array was this
1 1 1 2 2
then dp[1][3] = 0 but dp[1][2] = 1 and dp[1][1] = 2
But that will consider dp[2][1], then answer which should be 3 will become 1

dp[x][y] stores cost for exactly y x-straights but that does not
mean that there are no other straights .
It can have any number of (x-1) ,(x-2) . . straights.

In your example dp[2][1] is 1 only because here there are 3 straights :
1-straight , 1-straight , 2 -straight .
So have to remove extra 2 hence cost is 1.

One approach that i thought is that you make a vector of N sets, then iterate over the sets from 0 when an element is provided and insert into the one which doesn’t have this element. Then start from set 0 until the size of current set is 0, then for each set, we iterate over all the elements from start. At every element x, we store the cost of building the array from 1 to x and deleting the elements after that. We find the optimal point among all such values for that set.
The complexity will be NlogN(cz of set insertions).
Can someone tell me the flaw in this logic?

But dp[2][1] means making 1 sequence of 1-2 (2 straights)?
Wouldn’t that be 3?

Why should answer be 3?

Considering elements up to 2, we have dp[2][1] = min(dp[1][i]) for i \geq 1 + abs(f[2]-1) which is 0 plus 1 = 1 which is correct, since we have to delete one occurrence of 2.

considering
dp[2][1] = min(dp[1][i]) + abs(freq(2)-1)
So in sequence
1 1 1 2 2
dp[1][1] = 2
dp[1][2] = 1
dp[1][3] = 0
so we are saying dp[2][1] = 0 + 1 = 1
But My Question is Can we convert the given sequence to 1 2’s straight in 1 operation?
it is a wrong answer for dp[2][1]. Minimum step we need to convert sequence to 1 2’s straight is 3. But it is storing answer for 2 2’s straight
There I am confused. This is making our dp[x][y] not standing for what we assumed it to be that is to be the minimum number of operations to convert our sequence y of x-straights

PS: I am not questioning the editorial, just clearing my doubt.

dp[2][1] = 1 which is correct because we can just delete one occurrence of 2 to get only one 2-straight. The point is, we may have any number of y-straights with y < x but when we consider x+1 in next step, only the number of x-straights matter, since only those ones can be extended.

Sorry for late reply, just saw your comment.

1 Like

Does anyone have the answer for bonus?
Please share.

Yes, I did it using a greedy approach. https://www.codechef.com/viewsolution/25046163 :: It should be fairly simple to understand. //Happy coding