UNQMODE - Editorial

iceknight1093 · December 28, 2022, 9:16am

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: grayhathacker
Testers: iceknight1093, rivalq
Editorialist: iceknight1093

DIFFICULTY:

2804

PREREQUISITES:

Observation

PROBLEM:

You are given an array A, each of whose elements has frequency \leq 100.

Count the number of subarrays whose mode is unique.

EXPLANATION:

Each element has frequency \leq 100, which means the mode of any subarray also has a frequency of at most 100.

Let’s fix the right endpoint R of our subarray, and the frequency of the mode k. Let’s try to compute the number of L such that subarray [L, R] has a unique mode with frequency k.

The major observation to be made is that the set of valid L form a contiguous sequence, i.e, the set of valid L is \{x, x+1, \ldots, y-1, y\} for some x and y; so let’s instead try to find x and y.

Proof

Suppose we start L at R and keep decreasing it. As we move from subarray [L, R] to [L-1, R], the frequency of the mode can never decrease; and for a fixed frequency, the number of modes also can never decrease.

So, y is the first time the frequency of the mode is k, and x is the last time the frequency is k and there is only a single mode.

Finding x and y in \mathcal{O}(N) each is not too hard (but of course, too slow). When iterating in reverse down from R:

y is the first index at which some element occurs exactly K times.
x-1 is the higher of:
- The first index at which some element occurs exactly K+1 times (increasing the frequency of the mode); or
- The second index at which some element occurs exactly K times (increasing the number of modes)

To improve this, look at what information exactly we need.
We need to know the first time something occurs K times, the first time something occurs K+1 times, and the second time something occurs K times. We need this information for every 1 \leq K \leq 100.

So, let’s try to maintain this information. The array is treated as 1-indexed here.

Let \text{first}[K] denote the first time \leq R that an element occurs K times.
Let \text{second}[K] denote the second time \leq R that an element occurs K times.
For convenience, let’s say \text{first}[K] and \text{second}[K] are 0 if no valid position exists.

When we move from R to R+1, this information changes as follows:

For elements other than A_{R+1}, there is of course no change in their closest positions; so it suffices to look at the last 100 occurrences of A_{R+1} alone.
Consider the K-th previous occurrence of A_{R+1}. Suppose this is position u.
- If u \gt \text{first}[K], then set \text{second}[K] \gets \text{first}[K] and \text{first}[K] \gets u
- Otherwise, set \text{second}[K] \gets \max(\text{second}[K], u)
This allows us to update the \text{first} and \text{second} arrays in 100 operations.
Now, for a fixed K, we have y = \text{first}[K] and x = 1 + \max(\text{first}[K+1], \text{second}[K]), after which we simply add y-x+1 to the answer.

The above algorithm only needs us to find the previous 100 occurrences of A_{R+1} quickly, which is easy if we create a list of positions corresponding to each value.

This gives us a solution in \mathcal{O}(N\log N + 100\cdot N) or \mathcal{O}(100\cdot N \log N), depending on implementation.

TIME COMPLEXITY:

\mathcal{O}(N\cdot 100) per testcase.

CODE:

Setter's code (C++)

#include<bits/stdc++.h>
using namespace std;
 
#define mod 1000000007
typedef set<string> ss;
typedef vector<int> vs;
typedef map<int, char> msi;
typedef pair<int, int> pa;
typedef long long int ll;
 
ll n, i, last_pos[100005], lp, a[100005], j, ans, k=100;
map<ll, ll> pos;
pair<ll, ll> v[100005];
int main()
{
	ios_base::sync_with_stdio(false);
	cin.tie(0);
 
	int t;
	cin >> t;
	while (t--)
	{
		cin >> n;
		for (i = 0; i < n; i++)
		{
			cin >> a[i];
			pos[a[i]] = -1;
		}
		for (i = 2; i <= k + 1; i++)
		{
			v[i] = { -1, -1};
		}
		for ( i = 0; i < n; i++)
		{
			last_pos[i] = pos[a[i]];
			pos[a[i]] = i;
		}
		ans = n;
		for (i = 0; i < n; i++)
		{
			lp = last_pos[i];
			for (j = 2; j <= k; j++)
			{
				if (lp == -1)
					break;
				if (lp > v[j].second)
				{
					v[j].first = v[j].second;
					v[j].second = lp;
				}
				else if (lp > v[j].first)
				{
					v[j].first = lp;
				}
				lp = last_pos[lp];
			}
			for (j = 2; j <= k; j++)
			{
				ans += v[j].second - max(v[j].first, v[j + 1].second);
			}
		}
		cout << ans << "\n";
	}
 
	return 0;
}

Editorialist's code (Python)

for _ in range(int(input())):
	n = int(input())
	a = list(map(int, input().split()))

	ans = 0
	pos = {}
	nearest = [-1]*105
	nearest2 = [-1]*105
	for i in range(n):
		if a[i] not in pos: pos[a[i]] = [i]
		else: pos[a[i]].append(i)
		cur = pos[a[i]]
		for j in range(1, 101):
			if len(cur) >= j:
				u = cur[-j]
				if u >= nearest[j]:
					nearest2[j] = nearest[j]
					nearest[j] = u
				else: nearest2[j] = max(u, nearest2[j])
			if nearest[j] != -1:
				L = max(nearest2[j], nearest[j+1])+1
				ans += nearest[j] - L + 1
	print(ans)

preyam · December 28, 2022, 5:40pm

Was O(100NlogN) solution meant to pass? If so, the time constraints are extremely strict, because I could not get it to pass even after optimisation. In fact, I still cannot get it to pass.

Edit: Finally got it to pass after removing one instance of set :: find with a visited array. TCs should be have been less strict if this solution was meant to pass.

aryanc403 · December 28, 2022, 7:16pm

There exists O(N*logN+100*N) solutions which take 0.1s, so the 1s time limit is quite loose.

preyam · December 28, 2022, 7:44pm

Yeah but my solution had an extra log(n) factor.

puregarage · December 28, 2022, 11:39pm

There is nothing wrong with the equations above make sure to carry the x in the algebra equation

iceknight1093 · December 29, 2022, 5:37am

The intended solution is \mathcal{O}(N\log N + 100N). I think it should be clear that \mathcal{O}(100N\log N) is already more than 10^8 operations, so you should only be coding it if it has a low constant factor, for example if the extra log factor comes from something like a simple binary search (and I don’t mean set::lower_bound)

map and set operations are rather expensive, so your code is more than 10^8 operations with bad constant factor; I certainly wouldn’t expect that to pass within a second.

Lowering constraints or increasing TL were also unfortunately not options because there’s a bruteforce solution in \mathcal{O}(N^2) which would end up passing instead.