GOOD2 - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: raysh07
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

Segment trees

PROBLEM:

An array is called good if there exists an element appearing exactly once in the array.

Given an array A, answer Q queries on it.
Each query gives you L, R, and you must report the number of pairs (l, r) such that L \le l \le r \le R and A[l, r] is good.

EXPLANATION:

First, let’s solve a single query. Without loss of generality we solve for the entire array, i.e. we want to count the number of good subarrays of A.

A good subarray must have some element that’s unique within it.
Conversely, each element of the array will make several subarrays good.

Specifically, if we define l_i to be the largest index j such that j \lt i and A_i = A_j (i.e. l_i is the nearest occurrence of A_i to the left of index i), and r_i to be the same but to the right side, then all subarrays A[L, R] such that l_i \lt L \le i and i \le R \lt r_i are good, because the element A_i will be unique in them.

One way to reason about this better is to think in terms of geometry.
Suppose there’s an N\times N grid.
We map the subarray A[L, R] to the point (L, R) of this grid.

Then, observe that the above condition really just tells us that some rectangle of points is good.
(The rectangle in question is [l_i+1, i] \times [i, r_i-1].)

So, we obtain N rectangles in this [1, N] \times [1, N] grid, and then the answer is the number of points that are contained within at least one of the rectangles.
This is nothing but the area of the union of the rectangles (where area here is defined to be the number of points contained within.)

This is a fairly well-known problem (see Library Checker), with the classical solution in \mathcal{O}(N\log N) being to solve the problem offline: sweep over the x-axis and maintain a segment tree over the y-axis, storing for each point the number of rectangles covering it.
Rectangle starts/ends correspond to range adding +1/-1 over the segment tree; and then we want to count the number of non-zero elements at each x which is done by storing the minimum and count of minimums in each range; because uncovered points will have a value of 0.


Now, let’s think about arbitrary [L, R] queries.

Note that the rectangles themselves remain the same throughout, because the array is static.
Instead, what each query is asking for, is the total area covered by rectangles when restricted to [L, R] \times [L, R].
That is, we want to answer something along the lines of “2D range area queries”.

While this might seem to lead towards some sort of 2D data structure, those don’t actually help here.

Instead, let’s return to the initial sweep line algorithm, and see if we can modify it a bit to store the information we need.
It was already an offline algorithm, so we can try to answer queries offline as well.

As noted above, the algorithm we had was as follows:

  • Store an array C of length N.
    C_i denotes the number of rectangles covering (x, i) assuming our sweep line is currently at x.
  • We’ll sweep in descending order.
    When moving to x-1, we have the following changes:
    • For every rectangle starting at x, subtract 1 from all C_i for the corresponding range of i.
    • For every rectangle ending at x-1, add 1 to all C_i for the corresponding range of i.

Now, suppose we had another array D, also of length N, that we keep updated as follows:

  • Initially, D_i = 0 for all i.
  • For each x-coordinate, after processing all updates to the array C, add 1 to D_i for each i satisfying C_i = 0.

That is, at any stage, D_i denotes the number of uncovered points among (x, i), (x+1, i), \ldots, (N, i).
We’ll call this the historical data of the array C.

If we’re able to maintain such an array D, note that answering any query (L, R) becomes quite simple: when the sweepline is at L, we simply want the value

D_L + D_{L+1} + \ldots + D_R

(This will add some “empty space” consisting of the rectangle [L+1, N] \times [L, R], just subtract that part out.)

Note that these correspond to just range sums on the D array.
Since we’re already using a segment tree to store and update the C array, it makes sense to try and figure out how to keep it updated with D as well, since we’ll then get range sums for free.


Maintaining the historical sum can, in this case, be done by just storing a bit more data in the segment tree.

In a node of the segment tree corresponding to range [L, R], store the following:

  • The minimum value among C_L, C_{L+1}, \ldots, C_R.
  • The count of such minimums.
  • The sum D_L + \ldots + D_R
  • Lazy addition flags corresponding to both \min(C) and \text{sum}(D).

During the sweepline, do the following:

  • First, process all range addition updates to C.
    This is done using lazy propagation in standard fashion.
  • Next, check if \min(C_1, \ldots, C_N) = 0 or not.
  • If this minimum is not 0, then we don’t need to change any of the D values so just move on.
  • Otherwise, we want to add 1 to all D_i, but only for those i such that C_i = 0.
    • In particular, note that the sum D_L + \ldots + D_R for a range whose minimum element equals 0, will increase by exactly the count of minimums in this range (which we’re already storing anyway.)

For the last step, we would like to send an update of the form “increase \text{sum}(D) by the count of minimums” to every node whose min is 0.
This can be done lazily - apply it to the root, and when pushing, only push to children that have the same minimum as the current node.

See the code below for implementation details.
If you’d like to read more on maintaining historic data, there are these two blogs, though they’re rather more powerful than what is needed for this problem:

  1. [Tutorial] A Bit of Math, Historic Sum, and Range Add Range Sum Binary Indexed Tree (Fenwick Tree)
  2. Historic Information on Segment Trees

TIME COMPLEXITY:

\mathcal{O}((N+Q)\log N) per testcase.

CODE:

Editorialist's code (C++)
// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

struct Data {
    int lo, hi;
    ll mn, mnct, hsum;
    ll madd, hadd;
    Data(int l = -1, int r = -1) {
        lo = l, hi = r;
        mn = 0;
        mnct = r-l;
        hsum = 0;
        madd = 0;
        hadd = 0;
    }
};
Data unit;

struct Node {
    using T = Data;
    T f(T a, T b) {
        if (a.lo < 0) return b;
        if (b.lo < 0) return a;

        T res(a.lo, b.hi);
        res.hsum = a.hsum + b.hsum;

        res.mn = min(a.mn, b.mn);
        res.mnct = 0;
        if (a.mn == res.mn) res.mnct += a.mnct;
        if (b.mn == res.mn) res.mnct += b.mnct;
        return res;
    }
 
    Node *l = 0, *r = 0;
    int lo, hi;
    T val = unit;
    Node(int _lo,int _hi):lo(_lo),hi(_hi){ val = Data(lo, hi); }
    T query(int L, int R) {
        if (R <= lo || hi <= L) return unit;
        if (L <= lo && hi <= R) return val;
        push();
        return f(l->query(L, R), r->query(L, R));
    }
    void add(int L, int R, int x) {
        if (R <= lo || hi <= L) return;
        if (L <= lo && hi <= R) {
            val.mn += x;
            val.madd += x;
        }
        else {
            push(), l->add(L, R, x), r->add(L, R, x);
            val = f(l->val, r->val);
        }
    }
    void updh(int L, int R, int x) {
        if (R <= lo || hi <= L) return;
        if (L <= lo && hi <= R) {
            val.hsum += val.mnct * x;
            val.hadd += x;
        }
        else {
            push();
            if (l->val.mn == val.mn) l->updh(L, R, x);
            if (r->val.mn == val.mn)  r->updh(L, R, x);
            val = f(l->val, r->val);
        }
    }
    void push() {
        if (!l) {
            int mid = lo + (hi - lo)/2;
            l = new Node(lo, mid); r = new Node(mid, hi);
        }
        if (val.madd)
            l->add(lo,hi,val.madd), r->add(lo,hi,val.madd), val.madd = 0;
        if (val.hadd) {
            if (l->val.mn == val.mn) l->updh(lo, hi, val.hadd);
            if (r->val.mn == val.mn) r->updh(lo, hi, val.hadd);
            val.hadd = 0;
        }
    }
};

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n, q; cin >> n >> q;
        vector a(n, 0);
        for (int &x : a) cin >> x;

        vector prv(n, -1), nxt(n, n);
        {
            vector last(n+1, -1);
            for (int i = 0; i < n; ++i) {
                prv[i] = last[a[i]];
                last[a[i]] = i;
            }
        }
        {
            vector last(n+1, n);
            for (int i = n-1; i >= 0; --i) {
                nxt[i] = last[a[i]];
                last[a[i]] = i;
            }
        }

        vector queries(n, vector<array<int, 2>>());
        for (int i = 0; i < q; ++i) {
            int l, r; cin >> l >> r;
            queries[--l].push_back({--r, i});
        }

        vector events(n, vector<array<int, 3>>());
        for (int i = 0; i < n; ++i) {
            int L = prv[i]+1, R = nxt[i]-1;
            
            // left border in [L, i], right in [i, R]
            events[i].push_back({i, R, 1});
            if (L > 0) events[L-1].push_back({i, R, -1});
        }

        Node *seg = new Node(0, n);
        vector ans(q, 0ll);
        for (int i = n-1; i >= 0; --i) {
            for (auto [l, r, x] : events[i]) {
                seg -> add(l, r+1, x);
            }

            if ((seg -> query(0, n)).mn == 0) seg -> updh(0, n, 1);

            for (auto [r, id] : queries[i]) {
                ans[id] = 1ll*(n-i)*(r+1) - (seg -> query(0, r+1)).hsum;
            }
        }
        
        for (auto x : ans) cout << x << ' ';
        cout << '\n';
    }
}
1 Like

Reading the comments on this blog will give you a better understanding.
A simple introduction to “Segment tree beats” , which is added as reference on link 1.

Hint