SSS7 - Editorial

iceknight1093 · August 20, 2025, 5:30pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: raysh07
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

Medium

PREREQUISITES:

Binary lifting, K-th element in range queries

PROBLEM:

For an array B and parameter K, define f(B) as follows:

Partition B into several subarrays, each with length \leq K.
Sort each subarray.
Let C denote the concatenation of these sorted subarrays, in the same relative order.
f(B) is the lexicographically smallest possible array C.

You’re given an array A and the parameter K.
Answer Q queries. Each query gives you L, R, X and you must report the X-th element of f([A_L, \ldots, A_R]).

EXPLANATION:

To begin, we need to understand how to compute f(B) for a fixed array B.

The aim is to minimize (lexicographically) the result of the concatenation of the sorted arrays.
This means we want to minimize the first element as much as possible, then the second, then the third, and so on.
The first chosen subarray should have length \leq K.
Suppose we choose its length to be x. Then, observe that if there are indices i and j satisfying 1 \leq i \leq x and x \lt j \leq K such that B_j \lt B_i, it’s strictly better to extend the chosen subarray to have length at least j (that is, extend the subarray to include the smaller element).

So, the optimal choice of x must be such that B_i \lt B_j for all 1 \leq i \leq x and x \lt j \leq K.
Note that this is equivalent to saying \max(B_1, \ldots, B_x) \lt \min(B_{x+1}, \ldots, B_K).

If there are multiple such indices x, it’s not hard to see that it’s always ideal to choose the leftmost among them - after all, if x_1 and x_2 both satisfy the condition, then choosing just x_2 or choosing both x_1 and x_2 will result in the same sorted array.

Thus, to find f(B), we simply find the smallest index x such that \max(B_1, \ldots, B_x) \lt \min(B_{x+1}, \ldots, B_K) and make the first subarray [1, x].
Then the remaining |B|-x elements can be recursively solved for using the same process.

Let’s try and relate the above observation to answering range queries.

Suppose we want to answer the range query (L, R, X).
Then,

There’s a unique smallest index i_0 \ge L such that the subarray [L, i_0] will be chosen.
If X \leq i_0 - L + 1, then the answer is just the X-th smallest element in this range.
Otherwise, we can just subtract i_0 - L + 1 from X, and recursively solve for [i_0+1, R] instead.

In essence, we’re repeatedly “jumping” across subarrays, with each jump being defined by only the left endpoint, till we reach at least X elements.
Once that’s done, the answer is just the (X-d)-th smallest element of the final segment, d being the total length of previous segments.

Importantly, note that the jump is determined solely by the left endpoint - the right endpoint doesn’t have much of a role to play (unless the jump takes you past the right endpoint, which can only happen for the very last subarray).

So, suppose we’re able to compute, for each index i, the index \text{jump}[i] such that the left endpoint being i means we take the subarray [i, \text{jump}[i]-1].
Then, answering a query (L, R, X) would correspond to starting at L and following the \text{jump} array till at least X elements are reached, and then finding the appropriate element in the resulting range.

Knowing how far to go can be done quickly using binary lifting on the \text{jump} array.
Once this is done, we’ll have some range [i, j] with the answer being the (X-d)-th smallest element on the range (d being the total length of jumped-over subarrays).
Answering “k-th element on range” queries is a standard problem, and can be either solved offline by sweeping one endpoint and using a segment tree, or online using a persistent segment tree/wavelet tree.

All in all, as long as we know all the \text{jump}[i] values, the answer to a single query can be obtained in \mathcal{O}(\log N) time.

All we’re left with now is computing the \text{jump}[i] values.
Recall that \text{jump}[i] is the smallest index such that \max(A_i, \ldots, A_{\text{jump}[i]-1}) \lt \min(A_{\text{jump}[i]}, \ldots, A_{i+K-1})

Equivalently, it can be thought of as the smallest integer M such that all elements \leq M that are present among [A_i, \ldots, A_{i+K-1}] form a prefix starting at i.

The second formulation is useful to come up with a way of computing all the jumps.

Let’s fix the integer M and consider which starting indices i allow M as a valid maximum.
Suppose M is at index x.
Let l_1 \lt x be the largest index such that A_{l_1} \gt M.
Let r_1 \gt x be the smallest index such that A_{r_1} \gt M.
Let r_2 \gt r_1 be the smallest index such that A_{r_2} \lt M.

Now, note that M can only possibly be ‘valid’ for the subarrays starting at indices \max(l_1+1, x-K+1), \ldots, \min(x, r_2-K).

The lower bound of x-K+1 and upper bound of x are obvious, since we’re limited to subarrays of length \leq K.
The lower bound of l_1 + 1 comes from the fact that all elements in the chosen prefix must be \leq M (given that we’re fixing the maximum to be M); and including A_{l_1} to the left of M would break that.
The upper bound of r_2-K has similar reasoning: if the subarray includes A_{r_2} then it also includes some element \gt M before it (because we ensured r_1 \lt r_2, and A_{r_1} \gt M).

We need to be able to find these indices l_1, r_1, r_2 quickly.
One way to do that is as follows:

Consider two sets S_1 and S_2.
S_1 will contain the indices of all elements \leq M, and S_2 contains everything else.
Initially, S_1 is empty and S_2 contains all of \{1, 2, \ldots, N\}.
Iterate M from 1 to N. Suppose M is at position x.
Delete x from S_2 and insert it into S_1.
Now,
- l_1 is the largest element of S_2 that’s \lt x.
  This can be found using binary search on S_2 (set::lower_bound in C++).
- r_1 is the smallest element of S_2 that’s \gt x.
  Again this can be found using binary search on the set.
- r_2 is the smallest element of S_1 that’s \gt r_1, which is yet another binary search.

Once these are known, you also know the range of indices M is valid for.
Suppose this range is [i, j].
For each index p in this range, we’d like to set \text{jump}[p] \gets \min(M, \text{jump}[p]).
However, recall that we’re processing M in ascending order anyway - so if \text{jump}[p] was previously set for any index, M surely won’t be smaller than it.
This allows us to just store those indices which haven’t had their \text{jump} values set, and update the value for only those of them that are in [i, j] while removing them from said set.

This entire pre-processing stage takes only \mathcal{O}(N\log N) time, after which building the binary lifting table also takes \mathcal{O}(N\log N) time too.
After that, queries can be answered in \mathcal{O}(\log N) time, as described above, so we’re done.

TIME COMPLEXITY:

\mathcal{O}((N+Q)\log N) per testcase.

CODE:

Author's code (C++)

#include <bits/stdc++.h>
using namespace std;
#define int long long
#define INF (int)1e18

mt19937_64 RNG(chrono::steady_clock::now().time_since_epoch().count());

struct segtree{
    int n;
    vector <int> seg;
    
    // point update + query max + find first >= x after index l 
    
    segtree (int _n){
        n = _n;
        seg.resize(4 * n + 1);
    }
    
    void upd(int l, int r, int pos, int qp, int v){
        if (l == r){
            seg[pos] = v;
            return;
        }
        
        int mid = (l + r) / 2;
        if (qp <= mid) upd(l, mid, pos * 2, qp, v);
        else upd(mid + 1, r, pos * 2 + 1, qp, v);
        
        seg[pos] = max(seg[pos * 2], seg[pos * 2 + 1]);
    }
    
    int query(int l, int r, int pos, int ql, int qr){
        if (l >= ql && r <= qr){
            return seg[pos];
        } else if (l > qr || r < ql){
            return 0;
        }
        
        int mid = (l + r) / 2;
        return max(query(l, mid, pos * 2, ql, qr), query(mid + 1, r, pos * 2 + 1, ql, qr));
    }
    
    int find(int l, int r, int pos, int ql, int x){
        if (seg[pos] < x){
            return -1;
        }
        if (r < ql){
            return -1;
        }
        
        if (l == r){
            return l;
        }
        
        int mid = (l + r) / 2;
        int ans = find(l, mid, pos * 2, ql, x);
        if (ans == -1) ans = find(mid + 1, r, pos * 2 + 1, ql, x);
        
        return ans;
    }
};

struct FenwickTree{
    int n;
    vector <int> f;
    vector <int> b;

    inline void add(int i, int x){
        b[i] += x;
        for (int j = i; j <= n; j += j & (-j)){
            f[j] += x;
        }
    }

    inline void modify(int i, int x){
        add(i, x - b[i]);
    }

    inline void init(int nn, vector <int> a){
        n = nn;
        if (a.size() == n){
            vector <int> a2;
            a2.push_back(0);
            for (auto x : a) a2.push_back(x);
            a = a2;
        }

        f.resize(n + 1);
        b.resize(n + 1);

        for (int i = 0; i <= n; i++) f[i] = 0, b[i] = 0;

        for (int i = 1; i <= n; i++){
            modify(i, a[i]);
        }
    }

    inline int query(int x){
        int ans = 0;
        for (int i = x; i; i -= i & (-i)){
            ans += f[i];
        }
        return ans;
    }

    inline int query(int l, int r){
        return query(r) - query(l - 1);
    }
};

void Solve() 
{
    int n, k, q; cin >> n >> k >> q;
    
    vector <int> a(n + 1);
    for (int i = 1; i <= n; i++){
        cin >> a[i];
    }
    
    vector <int> f(n + 1);
    
    segtree rmq(n);
    for (int i = 1; i <= n; i++){
        rmq.upd(1, n, 1, i, a[i]);
    }
    
    segtree min_seg(n);
    for (int i = 1; i <= n; i++){
        min_seg.upd(1, n, 1, i, -a[i]);
    }
    
    segtree seg(n);
    
    stack <pair<int, int>> st;
    st.push({INF, n + 1});
    
    for (int i = n; i >= 1; i--){
        while (!st.empty() && st.top().first < a[i]){
            st.pop();
            seg.upd(1, n, 1, st.top().second - 1, 0);
        }
        
        int index = st.top().second - 1;
        st.push({a[i], i});
        
        int rmax = rmq.query(1, n, 1, i, index);
        int id = min_seg.find(1, n, 1, index + 1, -rmax);
        if (id == -1) id = n + 1;
        
        seg.upd(1, n, 1, index, id);
        
        if (i + k - 1 <= n){
            int R = i + k - 1;
            f[i] = seg.find(1, n, 1, i, R + 1);
            f[i] = min(f[i], R);
        } else {
            f[i] = n;
        }
    }
    
    vector<vector<int>> lift(n + 2, vector<int>(20, 0));
    lift[n + 1][0] = n + 1;
    for (int i = 1; i <= n; i++){
        lift[i][0] = f[i] + 1;
    }
    
    for (int j = 1; j < 20; j++){
        for (int i = 1; i <= n + 1; i++){
            lift[i][j] = lift[lift[i][j - 1]][j - 1];
        }
    }
    
    vector <array<int, 3>> qs(q);
    
    for (int i = 0; i < q; i++){
        int l, r, x; cin >> l >> r >> x;
        x += l - 1;
        
        for (int j = 19; j >= 0; j--){
            if (lift[l][j] <= x){
                l = lift[l][j];
            }
        }
        
        r = min(r, f[l]);
        x -= (l - 1);
        
        qs[i] = {l, r, x};
    }
    
    vector <int> lo(q, 1), hi(q, n);
    vector <int> pos(n + 1);
    for (int i = 1; i <= n; i++){
        pos[a[i]] = i;
    }
    
    while (lo != hi){
        vector<vector<int>> adj(n + 1);
        for (int i = 0; i < q; i++) if (lo[i] != hi[i]){
            int mid = (lo[i] + hi[i]) / 2;
            adj[mid].push_back(i);
        }
        
        FenwickTree ft;
        vector <int> vv(n);
        ft.init(n, vv);
        
        for (int i = 1; i <= n; i++){
            ft.add(pos[i], 1);
            
            for (auto j : adj[i]){
                int ss = ft.query(qs[j][0], qs[j][1]);
                if (ss < qs[j][2]) lo[j] = i + 1;
                else hi[j] = i;
            }
        }
    }
    
    for (int i = 0; i < q; i++){
        cout << lo[i] << " \n"[i + 1 == q];
    }
}

int32_t main() 
{
    auto begin = std::chrono::high_resolution_clock::now();
    ios_base::sync_with_stdio(0);
    cin.tie(0);
    int t = 1;
    // freopen("in",  "r", stdin);
    // freopen("out", "w", stdout);
    
    cin >> t;
    for(int i = 1; i <= t; i++) 
    {
        //cout << "Case #" << i << ": ";
        Solve();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n"; 
    return 0;
}

Editorialist's code (C++)

// #include <bits/allocator.h>
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 RNG(chrono::high_resolution_clock::now().time_since_epoch().count());

// https://github.com/Aeren1564/CP/blob/master/Library/Succinct_Structures/succinct_dictionary.sublime-snippet
struct succinct_dictionary{
	static constexpr unsigned int wsize = 64;
	static unsigned int rank64(unsigned long long x, unsigned int i){
		return __builtin_popcountll(x & ((1ULL << i) - 1));
	}
#pragma pack(4)
	struct block_t{
		unsigned long long bit;
		unsigned int sum;
	};
#pragma pack()
	unsigned int n, zeros;
	vector<block_t> block;
	succinct_dictionary(unsigned int n = 0) : n(n), block(n / wsize + 1){}
	// O(1)
	int operator[](unsigned int i) const{
		return block[i / wsize].bit >> i % wsize & 1;
	}
	// O(1)
	void set(unsigned int i){
		block[i / wsize].bit |= 1ULL << i % wsize;
	}
	// O(n/w)
	void build(){
		for(auto i = 0; i < n / wsize; ++ i) block[i + 1].sum = block[i].sum + __builtin_popcountll(block[i].bit);
		zeros = rank0(n);
	}
	// O(1)
	unsigned int rank0(unsigned int i) const{
		return i - rank1(i);
	}
	// O(1)
	unsigned int rank1(unsigned int i) const{
		auto &&e = block[i / wsize];
		return e.sum + rank64(e.bit, i % wsize);
	}
	// O(log(n))
	unsigned int select0(unsigned int k) const{
		unsigned int low = 0, high = n;
		while(high - low >= 2){
			unsigned int mid = low + high >> 1;
			(rank0(mid) <= k ? low : high) = mid;
		}
		return low;
	}
	// O(log(n))
	unsigned int select1(unsigned int k) const{
		unsigned int low = 0, high = n;
		while(high - low >= 2){
			unsigned int mid = low + high >> 1;
			(rank1(mid) <= k ? low : high) = mid;
		}
		return low;
	}
};
 
// https://github.com/Aeren1564/CP/tree/master/Library/Data_Structure
// nor orz
// Requires succinct_dictionary
template<bool HAS_QUERY, class B, class T, class F, class I>
struct wavelet_matrix_base{
	int n, lg;
	B sigma;
	vector<succinct_dictionary> data;
	vector<vector<T>> aggregate;
	F TT; // commutative group operation
	T T_id; // commutative group identity
	I Tinv; // commutative group inverse
	wavelet_matrix_base(F TT, T T_id, I Tinv): TT(TT), T_id(T_id), Tinv(Tinv){ }
	wavelet_matrix_base &operator=(const wavelet_matrix_base &wm){
		n = wm.n;
		lg = wm.lg;
		sigma = wm.sigma;
		data = wm.data;
		return *this;
	}
	// O(n * log(sigma)) time and O(n * log(sigma) / w) memory
	void build(const vector<B> &key, B sigma){
		static_assert(!HAS_QUERY);
		assert(sigma > 0);
		for(auto x: key) assert(0 <= x && x < sigma);
		n = (int)key.size();
		this->sigma = sigma;
		lg = __lg(sigma) + (B(1) << lg != sigma) + 1;
		data.assign(lg, succinct_dictionary(n));
		vector<B> cur = key, next(n);
		for(auto h = lg; h --;){
			for(auto i = 0; i < n; ++ i) if(cur[i] >> h & 1) data[h].set(i);
			data[h].build();
			array it{next.begin(), next.begin() + data[h].zeros};
			for(auto i = 0; i < n; ++ i) *it[data[h][i]] ++ = cur[i];
			swap(cur, next);
		}
	}
	// O(n * log(sigma)) time and O(n * log(sigma)) memory
	template<class U>
	void build(const vector<B> &key, B sigma, const vector<U> &value){
		static_assert(HAS_QUERY);
		assert(sigma > 0);
		for(auto x: key) assert(0 <= x && x < sigma);
		n = (int)key.size();
		this->sigma = sigma;
		lg = __lg(sigma) + (B(1) << lg != sigma) + 1;
		data.assign(lg, succinct_dictionary(n));
		aggregate.assign(lg + 1, vector<T>(n + 1, T_id));
		vector<pair<B, T>> cur(n), next(n);
		for(auto i = 0; i < n; ++ i) cur[i] = {key[i], value[i]};
		for(auto h = lg; h --;){
			for(auto i = 0; i < n; ++ i) if(cur[i].first >> h & 1) data[h].set(i);
			data[h].build();
			array it{next.begin(), next.begin() + data[h].zeros};
			for(auto i = 0; i < n; ++ i){
				*it[data[h][i]] ++ = cur[i];
				aggregate[h + 1][i + 1] = data[h][i] ? aggregate[h + 1][i] : TT(aggregate[h + 1][i], cur[i].second);
			}
			swap(cur, next);
		}
		for(auto i = 0; i < n; ++ i) aggregate[0][i + 1] = TT(aggregate[0][i], cur[i].second);
	}
	// Returns the frequency of x in the interval [ql, qr)
	// O(log(sigma))
	int freq(int ql, int qr, int x) const{
		assert(0 <= ql && ql <= qr && qr <= n);
		assert(0 <= x);
		if(ql == qr || sigma <= x) return 0;
		for(auto h = lg; h --; ){
			auto lcnt = data[h].rank0(ql), rcnt = data[h].rank0(qr);
			if(~x >> h & 1) ql = lcnt, qr = rcnt;
			else ql += data[h].zeros - lcnt, qr += data[h].zeros - rcnt;
		}
		return qr - ql;
	}
	// Returns the frequency of x in the interval [ql, qr), along with the sum of their values
	// O(log(sigma))
	pair<int, T> freq_query(int ql, int qr, int x) const{
		static_assert(HAS_QUERY);
		assert(0 <= ql && ql <= qr && qr <= n);
		assert(0 <= x);
		if(ql == qr || sigma <= x) return {0, T_id};
		for(auto h = lg; h --; ){
			auto lcnt = data[h].rank0(ql), rcnt = data[h].rank0(qr);
			if(~x >> h & 1) ql = lcnt, qr = rcnt;
			else ql += data[h].zeros - lcnt, qr += data[h].zeros - rcnt;
		}
		return {qr - ql, TT(aggregate[0][qr], Tinv(aggregate[0][ql]))};
	}
	// Returns the number of occurrences of numbers in [0, xr) in the interval [ql, qr)
	// O(log(sigma))
	int count(int ql, int qr, B xr) const{
		assert(0 <= ql && ql <= qr && qr <= n);
		assert(0 <= xr);
		if(sigma <= xr) return qr - ql;
		if(xr == 0) return 0;
		int cnt = 0;
		for(auto h = lg; h --; ){
			auto lcnt = data[h].rank0(ql), rcnt = data[h].rank0(qr);
			if(~xr >> h & 1) ql = lcnt, qr = rcnt;
			else{
				cnt += rcnt - lcnt;
				ql += data[h].zeros - lcnt, qr += data[h].zeros - rcnt;
			}
		}
		return cnt;
	}
	// Returns the number of occurrences of numbers in [0, xr) in the interval [ql, qr), along with the sum of their values
	// O(log(sigma))
	pair<int, T> count_query(int ql, int qr, B xr) const{
		static_assert(HAS_QUERY);
		assert(0 <= ql && ql <= qr && qr <= n);
		assert(0 <= xr);
		if(xr == 0) return {0, T_id};
		xr = min(sigma, xr);
		int cnt = 0;
		T sum = T_id;
		for(auto h = lg; h --; ){
			auto lcnt = data[h].rank0(ql), rcnt = data[h].rank0(qr);
			if(~xr >> h & 1) ql = lcnt, qr = rcnt;
			else{
				cnt += rcnt - lcnt;
				sum = TT(sum, TT(aggregate[h + 1][qr], Tinv(aggregate[h + 1][ql])));
				ql += data[h].zeros - lcnt, qr += data[h].zeros - rcnt;
			}
		}
		return {cnt, sum};
	}
	// Returns the number of occurrences of numbers in [xl, xr) in the interval [ql, qr)
	// O(log(sigma))
	int count(int ql, int qr, B xl, B xr) const{
		assert(xl <= xr);
		if(xl == xr) return 0;
		return count(ql, qr, xr) - count(ql, qr, xl);
	}
	// Returns the number of occurrences of numbers in [xl, xr) in the interval [ql, qr), along with the sum of their values
	// O(log(sigma))
	pair<int, T> count_query(int ql, int qr, B xl, B xr) const{
		static_assert(HAS_QUERY);
		assert(xl <= xr);
		if(xl == xr) return {0, T_id};
		auto [lcnt, lsum] = count_query(ql, qr, xl);
		auto [rcnt, rsum] = count_query(ql, qr, xr);
		return {rcnt - lcnt, TT(rsum, Tinv(lsum))};
	}
	// Find the k-th smallest element in the interval [ql, qr), sigma if no such element
	// O(log(sigma))
	B find_by_order(int ql, int qr, int k) const{
		assert(0 <= k);
		if(k >= qr - ql) return sigma;
		B x = 0;
		for(auto h = lg; h --; ){
			auto lcnt = data[h].rank0(ql), rcnt = data[h].rank0(qr);
			if(k < rcnt - lcnt) ql = lcnt, qr = rcnt;
			else {
				k -= rcnt - lcnt;
				x |= (B)1 << h;
				ql += data[h].zeros - lcnt;
				qr += data[h].zeros - rcnt;
			}
		}
		return x;
	}
	// Find the k-th smallest element in the interval [ql, qr), sigma if no such element, along with the sum of values of the k smallest elements (prioritizing smaller index)
	// O(log(sigma))
	pair<B, T> find_by_order_query(int ql, int qr, int k) const{
		assert(0 <= k);
		k = min(k, qr - ql);
		B x = 0;
		T sum = T_id;
		for(auto h = lg; h --; ){
			auto lcnt = data[h].rank0(ql), rcnt = data[h].rank0(qr);
			if(k < rcnt - lcnt) ql = lcnt, qr = rcnt;
			else {
				k -= rcnt - lcnt;
				x |= (B)1 << h;
				sum = TT(sum, TT(aggregate[h + 1][qr], Tinv(aggregate[h + 1][ql])));
				ql += data[h].zeros - lcnt;
				qr += data[h].zeros - rcnt;
			}
		}
		return {x, TT(sum, TT(aggregate[0][ql + k], Tinv(aggregate[0][ql])))};
	}
	// Find the k-th smallest element in the interval [ql, qr) among elements >= xl, sigma if no such element
	// O(log(sigma))
	B find_by_order(int ql, int qr, B xl, int k) const{
		assert(0 <= ql && ql <= qr && qr <= n);
		assert(0 <= xl && 0 <= k);
		if(xl >= sigma) return sigma;
		k += count(ql, qr, 0, xl);
		if(k >= qr - ql) return sigma;
		return find_by_order(ql, qr, k);
	}
	// Find the k-th smallest element in the interval [ql, qr) among elements >= xl, sigma if no such element, along with the sum of values of the k smallest elements (prioritizing smaller index)
	// O(log(sigma))
	pair<B, T> find_by_order_query(int ql, int qr, B xl, int k) const{
		assert(0 <= ql && ql <= qr && qr <= n);
		assert(0 <= xl && 0 <= k);
		if(xl >= sigma) return {sigma, T_id};
		auto [cnt, sum] = count_query(ql, qr, 0, xl);
		k += cnt;
		auto [x, sum2] = find_by_order_query(ql, qr, k);
		return {x, TT(sum2, Tinv(sum))};
	}
	// Find the smallest element >= x, sigma if no such element
	// O(log(sigma))
	B lower_bound(int ql, int qr, B x) const{
		assert(0 <= x);
		return find_by_order(ql, qr, x, 0);
	}
	// Find the smallest element > x, sigma if no such element
	// O(log(sigma))
	B upper_bound(int ql, int qr, B x) const{
		assert(0 <= x);
		return find_by_order(ql, qr, x + 1, 0);
	}
	// Find the largest element <= x, -1 if no such element
	// O(log(sigma))
	B reverse_lower_bound(int ql, int qr, B x) const{
		assert(0 <= x);
		int cnt = count(ql, qr, x);
		return cnt ? find_by_order(ql, qr, cnt - 1) : -1;
	}
	// Find the largest element < x, -1 if no such element
	// O(log(sigma))
	B reverse_upper_bound(int ql, int qr, B x) const{
		assert(0 <= x);
		int cnt = count(ql, qr, x + 1);
		return cnt ? find_by_order(ql, qr, cnt - 1) : -1;
	}
};
 
template<class B>
auto make_wavelet_matrix(){
	return wavelet_matrix_base<false, B, int, plus<>, negate<>>(plus<>(), 0, negate<>());
}
// Supports query
template<class B, class T = long long, class F = plus<>, class I = negate<>>
auto make_Q_wavelet_matrix(F TT = plus<>(), T T_id = 0, I Tinv = negate<>()){
	return wavelet_matrix_base<true, B, T, F, I>(TT, T_id, Tinv);
}

int main()
{
    ios::sync_with_stdio(false); cin.tie(0);

    int t; cin >> t;
    while (t--) {
        int n, k, q; cin >> n >> k >> q;
        vector a(n, 0);
        for (int &x : a) cin >> x;

        set<int> used, unused;
        for (int i = 0; i < n; ++i)
            unused.insert(i);
        
        vector pos(n+1, 0);
        for (int i = 0; i < n; ++i)
            pos[a[i]] = i;

        vector prv(n, -1), nxt(n, n);
        vector events(n+1, vector<int>());
        for (int i = 1; i <= n; ++i) {
            unused.erase(pos[i]);
            used.insert(pos[i]);

            auto it = unused.lower_bound(pos[i]);
            if (it != end(unused)) nxt[pos[i]] = *it;
            if (it != begin(unused)) prv[pos[i]] = *prev(it);

            auto it2 = end(used);
            if (it != end(unused)) {
                it2 = used.lower_bound(*it);
            }

            int L = max(pos[i]-k, prv[pos[i]]) + 1;
            int R = pos[i];
            if (it2 != end(used)) R = min(R, *it2 - k);
            else R = min(R, n - k);
            if (L <= R) {
                events[L].push_back(i);
                events[R+1].push_back(-i);
            }
        }

        const int LG = 20;
        vector<array<int, LG>> jump(n+1);
        set<int> allowed;
        for (int i = 0; i < n; ++i) {
            for (int x : events[i]) {
                if (x > 0) allowed.insert(x);
                else allowed.erase(-x);
            }

            if (allowed.empty()) jump[i][0] = min(n, i+k);
            else {
                int m = *begin(allowed);
                int j = pos[m];
                jump[i][0] = min(i+k, nxt[j]);
            }
        }
        jump[n][0] = n;
        for (int j = 1; j < LG; ++j) {
            for (int i = 0; i <= n; ++i)
                jump[i][j] = jump[jump[i][j-1]][j-1];
        }

        auto wm = make_wavelet_matrix<int>();
        wm.build(a, n+1);
        while (q--) {
            int l, r, x; cin >> l >> r >> x;
            --l, --r;
            for (int i = LG-1; i >= 0; --i) {
                if (jump[l][i] - l < x) {
                    x -= jump[l][i] - l;
                    l = jump[l][i];
                }
            }

            // x-th smallest element in [l, min(r, jump[l][0]-1)]
            cout << wm.find_by_order(l, min(r+1, jump[l][0]), x-1) << ' ';
        }
        cout << '\n';
    }
}