### PROBLEM LINK:

**Author:** Kevin Atienza

**Testers:** Sergey Kulik and Vasya Antoniuk

**Translators:** Sergey Kulik (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)

**Editorialist:** Kevin Atienza

### DIFFICULTY:

Easy

### PREREQUISITES:

Strings, Greedy algorithms

### PROBLEM:

A string is K-good if for every two letters in the string, if the first appears x times and the second y times, then |x - y| \le K. Given K and a string w, what is the minimum number of letters to remove to make w K-good?

### QUICK EXPLANATION:

For every m, define F[m] as follows:

- Remove every letter that appears < m times.
- For every letter that appears > m+K times, remove just enough letters so that it appears exactly m+K times.
- Let the number of removed letters be F[m].

The answer is then the smallest among \{F[1], F[2], \ldots, F[N]\}. To compute this quickly, simply precompute the letter frequencies in w beforehand, and operate on that array when computing F[m].

### EXPLANATION:

Let’s use the notation w_{(c)} to denote the number of appearances of the letter c in the string w.

# K-goodness

By definition, a string is K-good if for every two characters c and d, \left|w_{(c)} - w_{(d)}\right| \le K. But for a given string w, the value of \left|w_{(c)} - w_{(d)}\right| is maximized if c and d are the most and least occurring characters in w (or vice versa). Thus, we have a potentially simpler criterion for telling whether a string is K-good:

**If c and d are the most- and least- occurring characters in w, then w is K-good if and only if w_{(c)} - w_{(d)} \le K.**

Notice that we have removed the absolute value because w_{(c)} \ge w_{(d)}.

# Removing letters

This immediately gives us a **greedy** algorithm for turning w into a K-good string in a few letter removals:

- Compute the frequency counts for each of the 26 letters.
- Find the least-occurring character d. (Note that we only consider characters that actually appear in w, that is, whose frequency counts are positive.)
- For every character c appearing more than w_{(d)} + K times, remove w_{(c)} - (w_{(d)} + K) instances of c.

Clearly this algorithm yields a K-good string if you use our criterion above. Unfortunately though, *it doesn’t yield the minimum number* of removals! For example, consider the input `boooooooooo 0`

. The algorithm above will have us remove 9 instances of the letter ‘`o`

’, yielding the string “`bo`

” which is 0-good. But it’s possible to do it with just 1 removal: just remove the ‘`b`

’!

So how do we fix this? Clearly, we can’t assume that the least-occurring character in w will appear in the final string. In fact, it isn’t obvious what the minimum frequency count must be in the final string. So what do we do? We just *try them all*!

Our adjusted algorithm would be:

- Decide on the minimum frequency count on the final string, and call it m.
- Remove every letter that appears less than m times.
- For every letter c that appears more than m + K times, remove w_{(c)} - (m + K) instances of c.

The answer is then the minimum number of removals for all such values m in the range m \in [1,N]. Now, clearly this is correct, but is it fast enough? Well, it depends on your implementation. Obviously, if you process the string for every m, you’ll get an O(N^2) running time which is pretty bad. But thankfully, we can reduce it by noticing that *every iteration of the above procedure only requires the frequency counts of all the letters!* By computing the frequency counts just once and then using that array instead, we get a running time of O(N)!

Here’s a pseudocode:

```
// compute counts
counts['a'..'z'] // initialized to zero
for i=0..|w|-1:
counts[w*]++
// compute answer
answer = N
for m=1..N:
current = 0
for c='a'..'z':
if counts[c] < m:
current += counts[c]
else if counts[c] > m + K:
current += counts[c] - (m + K)
if answer > current:
answer = current
print answer
```

# Optimization

We can actually still optimize the algorithm above, by noticing that we don’t have to try all values m in the range [1,N]. Instead, we only try the m values that are equal to some w_{(c)}!

Why is that? Suppose we chose a value m and it’s not equal to any w_{(c)}. Then if we instead choose m+1, then there aren’t any additional characters whose frequency counts are < m+1, so we don’t have to remove any additional characters. However, we potentially save a few removals, particularly those whose frequency counts are > m+K! Thus, the result of the procedure above for m cannot be strictly less than its result for m+1, so it follows we don’t have to check m if it’s not equal to any w_{(c)}!

With this optimization, the number of times we perform the procedure reduces from N to (at most) 26! To formalize the optimization, notice that our original O(N) algorithm actually runs in O(N\alpha) where \alpha is the size of the alphabet. This optimization reduces that to O(N + \alpha^2), which is usually better because N \gg \alpha usually!

Here’s a pseudocode:

```
// compute counts
counts['a'..'z'] // initialized to zero
for i=0..|w|-1:
counts[w*]++
// compute answer
answer = N
for d='a'..'z':
m = counts[d] // consider only the relevant m's
current = 0
for c='a'..'z':
if counts[c] < m:
current += counts[c]
else if counts[c] > m + K:
current += counts[c] - (m + K)
if answer > current:
answer = current
print answer
```

### Time Complexity:

O(N)