 # CNTNOPARS - Editorial

Author: Satyam
Testers: Nishank Suresh, Tejas Pandey
Editorialist: Nishank Suresh

3042

# PREREQUISITES:

Basic algebraic manipulation, frequency arrays

# PROBLEM:

Given a permutation P of \{1, 2, \ldots, N\}, count the number of pairs (i, j) such that 1 \leq i \leq j \leq N and i+j divides P_i + P_j

# EXPLANATION:

The condition "i+j divides P_i + P_j" can be rewritten as "there exists an integer k such that P_i + P_j = k\cdot (i+j)".

Notice that the above equation can be rewritten as P_i - k\cdot i = k\cdot j - P_j = - (P_j - k\cdot j).
In particular, if the value of k is fixed, it’s not hard to find, in linear time, all pairs (i, j) that achieve divisibility using this k.

How?

Iterate over j from 1 to N. Also maintain a frequency array freq, which is initially empty.
When at an index j,

• Add 1 to freq[P_j - k\cdot j]
• The number of i \leq j that satisfy the condition is then exactly freq[k\cdot j - P_j]

This can be done in \mathcal{O}(N) time.

However, repeating the above algorithm for each k is too slow, since k can be anything at all and each run takes \mathcal{O}(N) time.

Let’s make a few observations to reduce the work we do.

The first thing to notice is that P is a permutation, which means that P_i \leq N for every i.
In particular, P_i + P_j \leq 2N. Combine this with the fact that i+j \geq 2, and you can see that any valid k must satisfy 1 \leq k \leq N.
We now have a solution in \mathcal{O}(N^2), which is still too slow.

Now, suppose we fix the value of k. Do we really need to look at every element of the array?

No!

Again, let’s look at P_i + P_j = k\cdot (i+j). We know P_i + P_j \leq 2N, so k\cdot(i+j) \leq 2N.

This tells us that i+j \leq \frac{2N}{k}: in particular, j \lt \frac{2N}{k}.

So, we don’t need to iterate across the whole array: once k is fixed, only run the initial algorithm on the first \frac{2N}{k} elements.

In fact, this optimization immediately improves our complexity: the number of operations we perform is

\sum_{k=1}^N \frac{2N}{k}

which is \mathcal{O}(N\log N), being a harmonic summation.

Note that the constraints in this problem are somewhat high, and so using a map/unordered_map/dict to maintain your frequency array will most likely result in a TLE verdict because of the high constant factor and/or an extra \log factor.

Instead, you can maintain a global frequency array of size 4N and use an offset to account for negative numbers.
That is, if you want to add 1 to the frequency of x, you instead add 1 to freq[x + 2N].
This shifts the indices of the frequency table from [-2N, 2N] to [0, 4N].

Once you’ve used the frequency table for a specific k, make sure to reset it properly.
If working with an offset-shifted array like this is new to you, take a look at the code linked below.

# TIME COMPLEXITY:

\mathcal{O}(N \log N) per testcase.

# CODE:

Setter's code (C++)
#include <bits/stdc++.h>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
using namespace std;
#define ll long long
const ll INF_MUL=1e13;
#define pb push_back
#define mp make_pair
#define nline "\n"
#define f first
#define s second
#define pll pair<ll,ll>
#define all(x) x.begin(),x.end()
#define vl vector<ll>
#define vvl vector<vector<ll>>
#define vvvl vector<vector<vector<ll>>>
#ifndef ONLINE_JUDGE
#define debug(x) cerr<<#x<<" "; _print(x); cerr<<nline;
#else
#define debug(x);
#endif
void _print(ll x){cerr<<x;}
void _print(char x){cerr<<x;}
void _print(string x){cerr<<x;}
template<class T,class V> void _print(pair<T,V> p) {cerr<<"{"; _print(p.first);cerr<<","; _print(p.second);cerr<<"}";}
template<class T>void _print(vector<T> v) {cerr<<" [ "; for (T i:v){_print(i);cerr<<" ";}cerr<<"]";}
template<class T>void _print(set<T> v) {cerr<<" [ "; for (T i:v){_print(i); cerr<<" ";}cerr<<"]";}
template<class T>void _print(multiset<T> v) {cerr<< " [ "; for (T i:v){_print(i);cerr<<" ";}cerr<<"]";}
template<class T,class V>void _print(map<T, V> v) {cerr<<" [ "; for(auto i:v) {_print(i);cerr<<" ";} cerr<<"]";}
typedef tree<ll, null_type, less<ll>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;
typedef tree<ll, null_type, less_equal<ll>, rb_tree_tag, tree_order_statistics_node_update> ordered_multiset;
typedef tree<pair<ll,ll>, null_type, less<pair<ll,ll>>, rb_tree_tag, tree_order_statistics_node_update> ordered_pset;
//--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
const ll MOD=998244353;
const ll MAX=5000500;
vector<ll> freqpos(MAX,0),freqneg(MAX,0);
void solve(){
ll n; cin>>n;
vector<ll> p(n+5,0);
ll ans=0;
for(ll i=1;i<=n;i++){
cin>>p[i];
if(p[i]%i){
;
}
else{
ans++;
}
}
for(ll k=1;k<=2*n;k++){
ll till=min(n,(2*n)/k);
ll zero=0;
for(ll i=1;i<=till;i++){
ll now=p[i]-i*k;
assert(abs(now)<MAX);
if(now==0){
ans+=zero++;
}
else if(now>0){
ans+=freqneg[now];
freqpos[now]++;
}
else{
ans+=freqpos[-now];
freqneg[-now]++;
}
}
for(ll i=1;i<=till;i++){
ll now=p[i]-i*k;
if(now==0){
;
}
else if(now>0){
freqpos[now]--;
}
else{
freqneg[-now]--;
}
}
}
cout<<ans<<nline;
return;
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
#ifndef ONLINE_JUDGE
freopen("input.txt", "r", stdin);
freopen("output.txt", "w", stdout);
freopen("error.txt", "w", stderr);
#endif
ll test_cases=1;
cin>>test_cases;
while(test_cases--){
solve();
}
cout<<fixed<<setprecision(10);
cerr<<"Time:"<<1000*((double)clock())/(double)CLOCKS_PER_SEC<<"ms\n";
}

Editorialist's code (Python)
import sys
for _ in range(int(input())):
n = int(input())
p = list(map(int, input().split()))
ans = 0
freq = [0 for _ in range(4*n + 10)]
offset = 2*n
for k in range(1, n+1):
lim = min(2*n // k, n) + 1
for j in range(1, lim):
freq[p[j-1] - k*j + offset] += 1
ans += freq[k*j - p[j-1] + offset]
for j in range(1, lim):
freq[p[j-1] - k*j + offset] -= 1
print(ans)


I think the constraint on this problem is unreasonable, if one wants to kill n log^2 n one does not need a size of 2e6.

As explained in this comment, the intent was actually to cut out \mathcal{O}(N\sqrt N) solutions, which were passing even for N = 10^6 (using the observation that k\cdot (i+j) \leq 2N so one of the two terms is \leq \sqrt{2N}): maps dying was an unfortunate consequence of it.

In fact, the constant factor of maps was high enough that those solutions were TLE-ing in testing even for 1e6, even when using faster hashmaps like __gnu_pbds::gp_hash_table in C++ (though funnily enough, dicts in Pypy were actually getting AC because of the 2x TL multiplier of the language)

The decision was thus between lowering constraints and allowing \mathcal{O}(N\sqrt N) to pass, or raising them to disallow it and also disallow maps in every language so as to make it language-agnostic, and we chose the latter because we thought that using an array of size 4N is a reasonable enough optimization to come up with once you have the rest of the solution.

2 Likes

I came up with solution in 10min but I spent another 10min trying to optimize maps. Only realized that we can use arrays after. But a nice problem. Thanks.