SORTSTR - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: laxman2002
Testers: iceknight1093, rivalq
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

Greedy algorithms or topological sorting

PROBLEM:

You’re given a string S of length N.
You can swap adjacent characters of it as long as the characters differ by at most one (i.e, a and b, b and c, \ldots, y and z).

Find the lexicographically smallest string that you can obtain.

EXPLANATION:

There are a couple of different ways to solve this problem. I’ll detail one that converts it to a graph task.

Consider a directed graph G on N vertices, where we add the edge u \to v (where u \lt v) if and only if the character S_u will always remain before the character S_v in the final string.
In other words, we add the edge u \to v if and only if we cannot (or will not, in case S_u = S_v) swap S_u and S_v.

For example, if S = \texttt{cbadba}, we have the edges 1\to 3, 1\to 6, 2\to 4, 2\to 5, 3\to 4, 3\to 6, 4\to 5, 4\to 6.

We only add edges from a lower index to a higher index, so this graph is of course acyclic.

Notice that any valid string we can form is obtained from a topological ordering of this graph we created.
Our objective is thus to find the lexicographically smallest topological ordering of G.

Lexicographically smallest toposort

This can be done fairly easily by modifying the standard topological sort algorithm.
Recall that the algorithm repeatedly picks a vertex with indegree zero, and then processes its outedges to update the indegrees of its neighbors.

To obtain the lexicographically smallest ordering, we only need to fix our choice of vertex: instead of choosing an arbitrary zero-indegree vertex, pick the one with lowest value.
This can be done quickly by storing available vertices in a set or heap.

For a graph with E edges and V vertices, the complexity of this is \mathcal{O}(E + V\log V).

We are left with only one problem now: our graph is too large!
There are only N edges, but there can be upto order N^2 edges, which is way too much.

However, notice that many of our edges are redundant.
If we have edges u\to v and u\to w but S_v = S_w, then we also have the edge v \to w.
This makes the u\to w edge pointless (from the perspective of topological ordering) since u \to v and v\to w together give us the same information anyway.

In particular, this means that for a given index, we only need at most one edge for each character — connect it to the closest occurrence of that character to the right.

This immediately brings the number of edges down to at most 26N, after which we can easily run our toposort algorithm as outlined above to obtain the answer.

There are other ways to solve this problem as well: for example, it can be solved by building the answer greedily from left to right, but this takes some careful implementation to not TLE.
The setter’s code linked below does this.

TIME COMPLEXITY

\mathcal{O}(26\cdot N + N\log N) per test case.

CODE:

Setter's code (Python, greedy)
for _ in range(int(input())):
    n = int(input())
    s  = input()
    curr = {}
    groups = []
    for i in range(n):
        c = s[i]
        new = {}
        if c=='a':
            if 'a' in curr:
                groups[curr['a']][0]+=1
                new['a'] = curr['a']
                groups.append([0,0])
            else:
                groups.append([0,1])
            if 'b' in curr:
                new['b'] = curr['b']     

        elif c=='z':
            if 'y' not in curr:
                new['y'] = i
                groups.append([0,1])
            else:
                groups.append([0,1])
                new['y'] = curr['y']

        else:
            groups.append([0,0])
            pre,nxt = chr(ord(c)-1) , chr(ord(c)+1)
            f = False
            if c in curr:
                groups[curr[c]][0]+=1
                new[c] = curr[c]
                if c==min(curr):
                    curr = new
                continue

            else:
                groups[-1][1] = 1

            if pre in curr:
                new[pre] = curr[pre]

            else:
                new[pre] = i 

            if nxt in curr:
                new[nxt] = curr[nxt]
        curr = new

    ans = ""
    for i in range(n):
        if s[i]=='a':
            ans+='a'*groups[i][1]
        else:
            ans+=chr(ord(s[i])-1)*groups[i][0]+s[i]*groups[i][1]    

    print(ans)
Editorialist's code (C++, toposort)
#include "bits/stdc++.h"
using namespace std;

// Toposort - doesn't return nodes reachable from cycles
// https://github.com/kth-competitive-programming/kactl/blob/main/content/graph/TopoSort.h
vector<int> topo_sort(const auto& gr, const string &s) {
	vector<int> indeg(size(gr)), ret;
	for (auto &li : gr) for (auto &x : li) indeg[x]++;
	auto cmp = [&] (int i, int j) {
		return s[i] > s[j];
	};
	priority_queue<int, vector<int>, decltype(cmp)> q(cmp);
	for (int i = 0; i < (int)size(gr); ++i) if (indeg[i] == 0) q.push(i);
	while (!q.empty()) {
		int i = q.top();
		ret.push_back(i);
		q.pop();
		for (auto &x : gr[i])
			if (--indeg[x] == 0) q.push(x);
	}
	return ret;
}

int main()
{
	ios::sync_with_stdio(false); cin.tie(0);

	int t; cin >> t;
	while (t--) {
		int n; cin >> n;
		string s; cin >> s;
		vector<vector<int>> adj(n);
		array<int, 26> nxt;
		for (int i = 0; i < 26; ++i) nxt[i] = n;
		for (int i = n-1; i >= 0; --i) {
			int c = s[i] - 'a';
			for (int x = 0; x < 26; ++x) {
				if (nxt[x] == n) continue;
				if (x != c-1 and x != c+1) adj[i].push_back(nxt[x]);
			}
			nxt[c] = i;
		}
		auto ord = topo_sort(adj, s);
		for (int x : ord) cout << s[x];
		cout << '\n';
	}
}
3 Likes

Hello,
nice problem
the graph approach can still be used if we want to know also the number of operations required to get the result ?
Thanks,
David