Setter: Prakhar Goyal
Tester: Abhinav Sharma, Lavish Gupta
Editorialist: Devendra Singh

1877

# PROBLEM:

You are given an array A of N integers. You must perform some (possibly zero) operations to make the elements of A distinct.

In one operation, you can either:

• Remove one element from the beginning of the array A and append any positive integer to the end.
• Or remove one element from the end of the array A and prepend any positive integer to the beginning.

Find the minimum number of operations required to make all the elements of the array distinct.

# EXPLANATION:

It is trivial that after some operations the final array contains a prefix of distinct elements inserted into A followed by a subarray of A which contains distinct elements or a subarray of A which contains distinct elements is followed by a suffix of distinct elements inserted into A. For each index i where 1\leq i \leq N, we assume the subarray present in the final answer starts at this index and to decrease the number of operations we need the maximum length of such subarray for each index. For each index i where 1\leq i \leq N, we need to find maximum index j where i\leq j such that the subarray A[i,j] contains distinct elements. This can be done using two pointers approach in O(Nlog(N)).

Observation

Let the optimal answer contain the subarray A[i,j],Let R denote the number of elements in the suffix of the array A starting from j+1 i.e A[j+1,N] and L denote the number of elements in the prefix of array A ending at i-1. Let L<=R, then the answer cannot be less than 2\cdot L+R. This is because until all elements till index i-1 are deleted by performing operation of type 1, performing an operation of type 2 only increases the total number of operations. Thus it is optimal to have all similar operations together. Similarly the case when R<=L can be analyzed.

The minimum number of operations required to delete all numbers upto index i-1 and from j+1 to N is min(2\cdot (i-1)+N-j,i-1+2\cdot (N-j)). Minimum of this value over all indices i is the answer.

# TIME COMPLEXITY:

O(Nlog(N)) for each test case.

# SOLUTION:

Setter's Solution
#include <bits/stdc++.h>
using namespace std;
#define ll long long int
#define FOR(i, x, n) for (ll i = x; i < n; i++)
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
ll t=1;
cin>>t;
while (t--)
{
ll n , k;
cin>>n;
vector<ll> A(n);
FOR(i,0 ,n)
cin>>A[i];
unordered_set<ll> st;
ll ind = 0;
ll ans =LLONG_MAX;
FOR(i, 0 ,n)
{
if(st.count(A[i])==1)
{
ll r = n-i;
ll l =ind;
// cout<<i<<"\n";
ans  = min(ans , 2*min(l , r) + max(l , r));
FOR(j , ind , n)
{
if(A[i]==A[j])
{
ind = j+1;
break;
}
else
st.erase(A[j]);
}

st.insert(A[i]);
}
else
{
st.insert(A[i]);
}
}
ans = min(ind , ans);
cout<<ans<<endl;
}
}

Editorialist's solution
#include "bits/stdc++.h"
using namespace std;
#define ll long long
#define pb push_back
#define all(_obj) _obj.begin(), _obj.end()
#define F first
#define S second
#define pll pair<ll, ll>
#define vll vector<ll>
ll INF = 1e18;
const int N = 1e5 + 11, mod = 1e9 + 7;
ll max(ll a, ll b) { return ((a > b) ? a : b); }
ll min(ll a, ll b) { return ((a > b) ? b : a); }
void sol(void)
{
int n;
cin >> n;
vll v(n);
for (int i = 0; i < n; i++)
cin >> v[i];
int ans = 1e9;
map<int, int> mp;
int r = 0;
for (int l = 0; l < n; l++)
{
while (r < n && !mp[v[r]])
{
mp[v[r]]++;
r++;
}
ans = min(ans, l + 2 * (n - r));
ans = min(ans, 2 * l + n - r);
mp[v[l]]--;
}
cout << ans << '\n';
return;
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL), cout.tie(NULL);
int test = 1;
cin >> test;
while (test--)
sol();
}


5 Likes

somone please explain what is wrong with this solution… or at least an example where it is failing…
It’s failing on testcase 7, for other testcases it’s passing…

#include <bits/stdc++.h>
using namespace std;
#define ll long long
#define ld long double
#define MOD 1000000007

void solve()
{
ll n;
cin >> n;
ll a[n];
map<ll, ll> mp;
for(ll i=0; i<n; i++)
{
cin >> a[i];
}
ll cnt = 0;
ll i=0, j=0;
// ll ab =-1, bb=-1;
vector<pair<ll, ll>> vv;
while(j < n)
{
if(mp[a[j]] == 1)
{
if(j-i > cnt){
vv.clear();
vv.push_back({i, j});

cnt = j-i;
}
else if(j-i == cnt)
{
vv.push_back({i, j});
}
while(a[i] != a[j])
{
mp[a[i]]--;
i++;
}
// mp[a[i]]--; this is a[j]
i++;
}
else
mp[a[j]]++;

j++;
}

if(j-i > cnt){
vv.clear();
vv.push_back({i, j});

cnt = j-i;
}
else if(j-i == cnt)
{
vv.push_back({i, j});
}

// [a, b)
// ll ans = min(a + 2*(n-j), 2*a + (n-j));
ll ans = LLONG_MAX;

for(auto p: vv)
{
ans = min(p.first + 2*(n-p.second), 2*p.first + (n-p.second));
}
// if(ans == INT_MAX)
//     ans = 0;
cout << ans << endl;

}

int main() {
ios_base::sync_with_stdio(0);
cin.tie(0);
int t;
cin >> t;
while(t--)
{
solve();

}
return 0;
}


1 Like

1
4
1 4 1 1

My approach:
I have found the MaxLength Distinct Subarray and after finding that i am simply checking which side is having smaller number of elements and the ans would be 2*(min(left,right)) + max(left,right) → this approach is accepted for all the test cases except one…

Can you please tell me what this one test case could be…

3 Likes

Its mostly correct except the fact that you could have multiple subarray of the same length. It seems like only one of them is inspected

1 Like

**PLEASE anyone help me to find the why my code is giving wrong ans **

#include <bits/stdc++.h>

using namespace std;

#define ll long long

ll int get_num(ll int i ,ll int j,ll int n){

 ll int left_ele = i;

ll int right_ele = n-1-j;

ll int mn = min(left_ele,right_ele);

ll int mx = max(right_ele,left_ele);

return (2*mn)+mx;


}

int main(){

    #ifndef ONLINE_JUDGE

freopen("input.txt","r",stdin);

freopen("output.txt","w",stdout);

#endif


ios_base ::sync_with_stdio(false);

cin.tie(NULL);

ll int q;

cin>>q;

while(q–){

ll int n;

cin>>n;

vectorv;

for(ll int i=0;i<n;i++){

   ll int x;

cin>>x;

v.push_back(x);


}

map<ll int,bool>mp;

ll int i1=0,j1=0;

dequed;

ll int sz = INT_MIN;

vector<pair<ll int,ll int>>vp;

for(ll int i=0;i<n;i++){

   if(mp[v[i]]==false){

d.push_back(i);

mp[v[i]]=true;

}else{

d.push_back(i);

while(v[d.front()]!=v[i]){

mp[v[d.front()]]=false;

d.pop_front();

}

d.pop_front();

}

ll int num = d.size();

if(num>sz){

// cout<<"u"<<endl;

i1 = d.front();

j1 = d.back();

vp.push_back({i1,j1});

sz = num;

}


}

ll int ans = INT_MAX;

for(int i=0;i<vp.size();i++){

  if(vp[i].first==vp[i].second){

continue;

}

ans = min(ans,get_num(vp[i].first,vp[i].second,n));


}

cout<<ans<<endl;

}

return 0;

}

Can someone find an issue with this solution.
It’s passing all but one test case.

Used sliding window approach with map: 8 out of 9 testcase passing,
tc 7 → WA

link to code: Solution: 65921736 | CodeChef
code:

void solve() {
int n;
cin>>n;
vector arr(n);
if(n==0 || n==1) {cout<<0<<endl;return;}
for(int i=0;i<n;i++)
{
cin>>arr[i];
}
unordered_set set;
int l=0,r=0;
int i = 0, j = 0, ans = 0;
while( i<n && j<n)
{
if(set.find(arr[j]) == set.end())
{
set.insert(arr[j++]);
if(j-i>ans)
{
ans=j-i;
l=i;
r=j-1;
}
}
else
{
set.erase(arr[i++]);
}
}
r=n-r-1;
// cout<<l<<" "<<r<<endl;
if(l>=r)
{
cout<<l+2r<<endl;
}
else
{
cout<<r+2
l<<endl;
}
}

O(n) solution

void sol(int sl){
lli n;
cin>>n;
lli arr[n];
REP(i,n){
cin>>arr[i];
}
map<lli,vector<int>> m;
map<lli,vector<int>> m1;
map<lli,int> count;
map<lli,int> count1;

lli last_num,last_ind;
lli minn=n+1;
int f=0;

last_ind=n+1;
REP(i,n){
m[arr[i]].push_back(i);
count[arr[i]]++;
if(count[arr[i]]>1){
if(f==0){
last_ind=0;
f=1;
}
last_num=arr[i];
last_ind=(last_ind>m[arr[i]][m[arr[i]].size()-2])?last_ind:m[arr[i]][m[arr[i]].size()-2];
}
minn=min(minn,last_ind+1+2*(n-i-1));
// cout<<minn<<" "<<last_ind<<" "<<i<<endl;
}
last_ind=n+1;
for(int i=n-1;i>=0;i--){
m1[arr[i]].push_back(i);
count1[arr[i]]++;
if(count1[arr[i]]>1){
last_num=arr[i];
last_ind=(last_ind<m1[arr[i]][m1[arr[i]].size()-2])?last_ind:m1[arr[i]][m1[arr[i]].size()-2];
}
minn=min(minn,(n-last_ind+2*(i)));
// cout<<minn<<" "<<last_ind<<" "<<i<<endl;
}
if(f==0){
cout<<0<<endl;
return;
}
cout<<minn<<endl;


there can be more than one subarray having max size, you’re checking just one.

1 Like

For subaarays with same size, we will prefer to take the one which reduces the value of: 2*min(left,right) + max(left, right).
left = Elements in Left Sub-array
right = Elements in Right Sub-array
Sub-arrays which lies to the start/end have lesser cost/operations as compared to middles ones.

1 Like

I too got stuck here. Reason being :
For subaarays with same size, we will prefer to take the one which reduces the value of:
Cost = 2*min(left,right) + max(left, right).
left = Elements in Left Sub-array
right = Elements in Right Sub-array
Sub-arrays which lies to the start/end have lesser cost/operations as compared to middles ones.

I was storing all the indices that had max length with unique elements but…

during updating ans, I missed ans = min(ans, new_ans); instead I was like ans = min_ans;

such a silly mistake

@im_ashh can you give one example such

1 Like

First of all thanks for the reply, I have thought of this during the contest but i was unable to find any concrete testcase for it…If you know any please let me know…

Thankyou for the help.

Why is two pointers technique in prerequisites?

1 Like

Because it is very easy to implement the solution of this problem using two pointers technique. Have a look at editorialist’s solution

I have used the same approach as you, but instead of using one subarray with a distinct element, I compared all the subarrays and took their minimum.
here is the link of the accepted solution Solution starts from line 128

okaayy I got it, thank you for your help, I didn’t realize my code missed that case. Thanks again😅

oohhh…
Thank you for your reply🤗, I totally missed the case where there could same-sized subarrays