CNTNOPARS - Editorial


Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: Satyam
Testers: Nishank Suresh, Tejas Pandey
Editorialist: Nishank Suresh




Basic algebraic manipulation, frequency arrays


Given a permutation P of \{1, 2, \ldots, N\}, count the number of pairs (i, j) such that 1 \leq i \leq j \leq N and i+j divides P_i + P_j


The condition "i+j divides P_i + P_j" can be rewritten as "there exists an integer k such that P_i + P_j = k\cdot (i+j)".

Notice that the above equation can be rewritten as P_i - k\cdot i = k\cdot j - P_j = - (P_j - k\cdot j).
In particular, if the value of k is fixed, it’s not hard to find, in linear time, all pairs (i, j) that achieve divisibility using this k.


Iterate over j from 1 to N. Also maintain a frequency array freq, which is initially empty.
When at an index j,

  • Add 1 to freq[P_j - k\cdot j]
  • The number of i \leq j that satisfy the condition is then exactly freq[k\cdot j - P_j]

This can be done in \mathcal{O}(N) time.

However, repeating the above algorithm for each k is too slow, since k can be anything at all and each run takes \mathcal{O}(N) time.

Let’s make a few observations to reduce the work we do.

The first thing to notice is that P is a permutation, which means that P_i \leq N for every i.
In particular, P_i + P_j \leq 2N. Combine this with the fact that i+j \geq 2, and you can see that any valid k must satisfy 1 \leq k \leq N.
We now have a solution in \mathcal{O}(N^2), which is still too slow.

Now, suppose we fix the value of k. Do we really need to look at every element of the array?


Again, let’s look at P_i + P_j = k\cdot (i+j). We know P_i + P_j \leq 2N, so k\cdot(i+j) \leq 2N.

This tells us that i+j \leq \frac{2N}{k}: in particular, j \lt \frac{2N}{k}.

So, we don’t need to iterate across the whole array: once k is fixed, only run the initial algorithm on the first \frac{2N}{k} elements.

In fact, this optimization immediately improves our complexity: the number of operations we perform is

\sum_{k=1}^N \frac{2N}{k}

which is \mathcal{O}(N\log N), being a harmonic summation.

Note that the constraints in this problem are somewhat high, and so using a map/unordered_map/dict to maintain your frequency array will most likely result in a TLE verdict because of the high constant factor and/or an extra \log factor.

Instead, you can maintain a global frequency array of size 4N and use an offset to account for negative numbers.
That is, if you want to add 1 to the frequency of x, you instead add 1 to freq[x + 2N].
This shifts the indices of the frequency table from [-2N, 2N] to [0, 4N].

Once you’ve used the frequency table for a specific k, make sure to reset it properly.
If working with an offset-shifted array like this is new to you, take a look at the code linked below.


\mathcal{O}(N \log N) per testcase.


Setter's code (C++)
#include <bits/stdc++.h>   
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;   
using namespace std;  
#define ll long long  
const ll INF_MUL=1e13;
const ll INF_ADD=1e18;    
#define pb push_back                 
#define mp make_pair          
#define nline "\n"                           
#define f first                                          
#define s second                                             
#define pll pair<ll,ll> 
#define all(x) x.begin(),x.end()     
#define vl vector<ll>           
#define vvl vector<vector<ll>>    
#define vvvl vector<vector<vector<ll>>>          
#ifndef ONLINE_JUDGE    
#define debug(x) cerr<<#x<<" "; _print(x); cerr<<nline;
#define debug(x);  
void _print(ll x){cerr<<x;}  
void _print(char x){cerr<<x;}   
void _print(string x){cerr<<x;}    
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());   
template<class T,class V> void _print(pair<T,V> p) {cerr<<"{"; _print(p.first);cerr<<","; _print(p.second);cerr<<"}";}
template<class T>void _print(vector<T> v) {cerr<<" [ "; for (T i:v){_print(i);cerr<<" ";}cerr<<"]";}
template<class T>void _print(set<T> v) {cerr<<" [ "; for (T i:v){_print(i); cerr<<" ";}cerr<<"]";}
template<class T>void _print(multiset<T> v) {cerr<< " [ "; for (T i:v){_print(i);cerr<<" ";}cerr<<"]";}
template<class T,class V>void _print(map<T, V> v) {cerr<<" [ "; for(auto i:v) {_print(i);cerr<<" ";} cerr<<"]";} 
typedef tree<ll, null_type, less<ll>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;
typedef tree<ll, null_type, less_equal<ll>, rb_tree_tag, tree_order_statistics_node_update> ordered_multiset;
typedef tree<pair<ll,ll>, null_type, less<pair<ll,ll>>, rb_tree_tag, tree_order_statistics_node_update> ordered_pset;
const ll MOD=998244353;     
const ll MAX=5000500;
vector<ll> freqpos(MAX,0),freqneg(MAX,0); 
void solve(){
    ll n; cin>>n;
    vector<ll> p(n+5,0);
    ll ans=0;
    for(ll i=1;i<=n;i++){
    for(ll k=1;k<=2*n;k++){
        ll till=min(n,(2*n)/k);
        ll zero=0;
        for(ll i=1;i<=till;i++){
            ll now=p[i]-i*k;
            else if(now>0){
        for(ll i=1;i<=till;i++){
            ll now=p[i]-i*k;
            else if(now>0){
int main()                                                                                           
    #ifndef ONLINE_JUDGE                   
    freopen("input.txt", "r", stdin);                                              
    freopen("output.txt", "w", stdout);  
    freopen("error.txt", "w", stderr);                        
    ll test_cases=1;               
Editorialist's code (Python)
import sys
input = sys.stdin.readline
for _ in range(int(input())):
    n = int(input())
    p = list(map(int, input().split()))
    ans = 0
    freq = [0 for _ in range(4*n + 10)]
    offset = 2*n
    for k in range(1, n+1):
        lim = min(2*n // k, n) + 1
        for j in range(1, lim):
            freq[p[j-1] - k*j + offset] += 1
            ans += freq[k*j - p[j-1] + offset]
        for j in range(1, lim):
            freq[p[j-1] - k*j + offset] -= 1

I think the constraint on this problem is unreasonable, if one wants to kill n log^2 n one does not need a size of 2e6.

As explained in this comment, the intent was actually to cut out \mathcal{O}(N\sqrt N) solutions, which were passing even for N = 10^6 (using the observation that k\cdot (i+j) \leq 2N so one of the two terms is \leq \sqrt{2N}): maps dying was an unfortunate consequence of it.

In fact, the constant factor of maps was high enough that those solutions were TLE-ing in testing even for 1e6, even when using faster hashmaps like __gnu_pbds::gp_hash_table in C++ (though funnily enough, dicts in Pypy were actually getting AC because of the 2x TL multiplier of the language)

The decision was thus between lowering constraints and allowing \mathcal{O}(N\sqrt N) to pass, or raising them to disallow it and also disallow maps in every language so as to make it language-agnostic, and we chose the latter because we thought that using an array of size 4N is a reasonable enough optimization to come up with once you have the rest of the solution.


I came up with solution in 10min but I spent another 10min trying to optimize maps. Only realized that we can use arrays after. But a nice problem. Thanks.