DWW19F - Editorial

PROBLEM LINK:

Practice
Contest

Author: Ankush Khanna

DIFFICULTY:

Easy-Medium

PREREQUISITES:

Online Algorithms, Data Structures like Heaps or height balanced trees.

PROBLEM:

Create a sequence of size K from a stream A of N (N \geq K) integers (online) such that at each insertion the element is inserted at the end of the sequence and if after insertion the size of the sequence becomes greater than K, then the element with smallest value is removed. If multiple such elements are there (for deletion), then remove the one which entered the sequence the earliest.

EXPLANATION:

This is a very trivial problem which shows us the power of some wonderful data structures. Handling the queries online is a mandate here, that is why you have to XOR the query input with the currently deleted element. This condition forces online solutions, and hence causes the need of online and efficient data structures here.

The first sub-task can be solved trivially by inserting the element at the back to the list data structure (with supported Random Access) in O(1) time and, if needed, finding the suitable element to be deleted in O(N) time using a linear search and then deleting it in O(N) time in worst case (due to re-shifting of non-deleted elements in the list). That sums up to O(N) time per query and hence O(N^{2}) time per test.

For passing the second sub-task, we necessarily need an efficient data structure which can handle each query with a better performance. You can use any efficient data structure like Priority Queue (with heap implementation) or a Multiset (height balanced tree implementation) and other such sorted data structures which can maintain their order very efficiently and online. Both of the before mentioned data structures can perform an online query in O(\log_{2}{N}) time, which is fast enough.

We also need to keep a track of the entry time of each element so that we can easily remove the element which entered the earliest (in case of multiple choices for deletion). Also, at the end, we need to print the elements in input order. So, after solving all the queries, we can order the final K elements in the sequence according to their entry time. Which is a very trivial task.

What if the problem were offline?

Had it been the case, then simple sorting would have done the trick. We could have sorted A_{1 \dots (N - 1)} and picked the K - 1 largest elements from it and order them according to their entry time. In case there are insertion conflicts between elements with same value, the elements with a later entry time are preferred. And then we could simply insert A_{N} at the end.

This works because in each query, the input element always enters the sequence. If the size of the sequence becomes K + 1 then the smallest element (excluding the currently inserted element) among the K elements is removed. From this, we can satisfactorily say that the largest K - 1 elements in A_{1 \dots (N - 1)} can never be removed from a sequence of size K as there will always be an element present in the K sized sequence smaller than them. Also, A_{N} will get inserted any which ways.


This shows us how drastically the online queries changed the solution. Where offline problem is just greedy observations and simple sorting, the online problem necessarily requires use of efficient data structures.

SOLUTION:

Setter's Solution
#include <bits/stdc++.h>

using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cout.tie(nullptr);
    auto &&f = [&](const pair<int, int> &x, const pair<int, int> &y) -> bool {
        return x.second < y.second;
    };
    int tt;
    cin >> tt;
    while (tt--) {
        int n, k;
        cin >> n >> k;
        multiset<pair<int, int>> s;
        for (int i = 0; i < k; i++) {
            int x;
            cin >> x;
            s.insert(make_pair(x, i));
        }
        for (int i = k; i < n; i++) {
            int x;
            cin >> x;
            x ^= (*s.begin()).first;
            s.erase(s.begin());
            s.insert(make_pair(x, i));
        }
        vector<pair<int, int>> ans(s.begin(), s.end());
        sort(ans.begin(), ans.end(), f);
        cout << ans[0].first;
        for (int i = 1; i < k; i++) {
            cout << ' ' << ans[i].first;
        }
        cout << '\n';
    }
    return 0;
}

COMPLEXITY:

Time complexity: O(N \log_{2}{N}) per test.
Space Complexity: O(N) auxiliary space per test.

Feel free to share your approach. If you have any queries, they are always welcome.

Two words: priority queue
Complexity is O(N log K)