×

# SHKSTR - Editorial

Practice

Contest

Author: Jitender

Tester: Misha Chorniy

Editorialist: Bhuvnesh Jain

EASY-MEDIUM

# Prerequisites

Tries, Offline querying

# Problem

You a given $N$ strings and $Q$ queries. For each query, given $R$ and string $P$, you need to find the lexicographically smallest string which has the largest common prefix with $S$.

# Explanation

## Subtask 1: N, Q ≤ 1000

A simple brute force which checks each string from index $1$ to $R$ and stores the answer at each step will suffice. Below is a pseudo-code for it:


def find_max_prefix_match(string a, string b):
ans = 0
for i in [0, len(a), len(b) - 1]:
if a[i] == b[i]:
ans += 1
else:
break
return ans

def solve_query(index R, string P):
ans = ""
prefix = 0
for i in [1, n]:
if find_max_prefix_match(S[i], P) > prefix:
prefix = find_max_prefix_match(S[i], P)
ans = S[i]
else if find_max_prefix_match(S[i], P) == prefix:
ans = min(ans, S[i])
return ans



The complexity of the above approach is $O(N * Q * 10)$ in the worst case as the maximum size of the string can be at most 10.

## Subtask 2: N, Q ≤ 100000

The first idea which comes whenever you see string problems deal with prefixes if using tries or hashing. In this problem too, we will use trie for solving the problem. In case you don't know about it, you can read it here.

Let us first try to understand how to find the lexicographically smallest string with largest common prefix with $P$. Assume we have the trie of all the strings build with us. We just start traversing the Trie from the root, one level at a time. Say we are at level $i$, we will try to greedily go to the node whose character matches with our current character, $P[i]$. This will help us to maximise the longest common prefix. The moment we find a mismatch, i.e. a node with current character doesn't exist, we will try to now greedily find the lexicographically find the smallest string. For this, we just keep on traversing down the left-most node in the trie till a complete work is found.

But the above approach works when $R = N$ in all queries as without it we can't identify whether the string we are traversing lies in the required range of the query or not.

There are 2 different approaches to the full solution, Online solution and Offline solution.

### Author/Tester Solution: Offline Approach

The problems where we can easily solve a problem given a full array but need to query for a prefix of the array can be easily handled using offline queries. The idea is as follows:

We first sort the queries based on the index of the array given. We now build out data structure (here trie), incrementally. Say the data structure is built using all the first $i$ elements, we now answer every query which has an index as $i$ in the query.

The pseudo-code for it is below:


queries = []
for i in [1, q]:
r, p = input()
queries.push((r, p, i))
queries.sort()

cur_index = 0
for (r, p, i) in queries:
while (cur_index <= r):
insert_element_to_ds_trie
cur_index += 1
ans[i] = query_from_ds_trie(S)      //parameter r is not required

for i in [1, q]:
print ans[i]



For more details, you can refer to the author's or tester's solution below.

### Editorialist Solution: Online Solution

The idea is simple. With every node in the trie, we keep a vector of indices which it is a part of. Using this vector we can easily decide whether the string we are traversing lies within our required range or not. But before discussing the full solution, we need to be sure that this will fit into memory limits because seeing it in a naive manner seems to consume quadratic memory as each node can have a vector of length $N$.

To prove that the above-modified trie also uses linear memory in order of sum of the length of strings, we see that each index appears in any vector of a node in trie as many characters are there in the string. So, out trie just uses twice the memory that the normal trie (the one in author or tester solution) uses.

Once, the above modified Trie is built, we can answer our queries easily. Since the strings are added incrementally, we are sure that the vector containing the indices will always be in sorted order. To check whether any string at a given node lies in modified range, we can use binary search. But, we can be clever here too, as the binary search will be an overkill. Since the range we want is always a prefix of the array we can just check the first element of the vector and decide whether any string lies in the required range or not. To get a clear picture of the above, you can see the below picture of the trie build from the sample case in the problem. It also contains how the answer to different queries are arrived at.

Once, you are clear with the above idea, you can see the editorialist implementation below for help.

Feel free to share your approach, if it was somewhat different.

# Time Complexity

$O(Q\log{Q} + \text{Sum of length of strings} * \text{ALPHABET})$ for offline solution

$O(\text{Sum of length of strings} * \text{ALPHABET})$ for online solution

where $\text{ALPHABET} =$ number of distinct english character (26 for this problem).

# Space Complexity

$O(\text{Sum of length of strings})$

# AUTHOR'S AND TESTER'S SOLUTIONS:

Author's solution can be found here.

Editorialist's solution can be found here.

This question is marked "community wiki".

6★likecs
3.7k2078
accept rate: 9%

 3 In case if you don't want to use data structure,Sort N+q strings with keep track of indices of q strings and then for every query check strings above and below it in the sorted list upto a condition and find the lexicographically smallest with longest LCP. It works fine. answered 11 Jun, 18:15 39●4 accept rate: 0%
 1 There is no need to store all indices for every index, just store the smallest index which is a part of that node. For suppose at some node we have 2 5 7 as indices, and we have a query for 5, string. If we just store 2, still we can proceed down in the trie as 2 < 5 (and we are searching for string between 1 and 5). It will be memory efficient. answered 11 Jun, 16:10 815●1●12 accept rate: 13% 1 Yes, it will consume less memory but in big $O$ notation it will be same. Just wanted to explain through the example that it will still be same and such modified trie is helpful in general. Actually using it we can solve the problem for general range $[l, r]$ instead of $[1, r]$ but with additional $O(\log{n})$ factor. (11 Jun, 16:17) likecs6★
 1 I used a hashtable to hash all the substrings and keep track of all the indices of the strings having the substring. Sort all the substrings lexicographically. Then when the queries come, iterate through the substrings of P from largest to smallest. For each substring, break once you get an index <= r. Solution here. answered 11 Jun, 18:30 3★potatio 14●1●3 accept rate: 0%
 1 I used an entirely different approach, working in C#. As each string is no more than 10 characters, each one of 26 lower case letters, each entire string can be encoded into a 'long', with 5 bits per letter. Comparisons like 'CompareTo' are then fast. Define a class StringIndex, consisting of an encoded string and its index in the array. Build a sorted list of StringIndex, sorted by the integer encoded string. For each query, find its place in the sorted list by a binary search. From there search forwards until an index within range, and set the common prefix with that string. Then search backwards. Check whether the first one with an index in range has a longer common prefix. Search backwards until there is a shorter common prefix. When no common prefix is found, set to the first string in the list with an index in range. As the number being searched may be much less than the number of strings supplied, extract a series of shorter lists before starting any queries, and then choose the appropriate list to search for each query. My submission may be found at https://www.codechef.com/viewsolution/18780484 It earned 100 points in 0.43 seconds. answered 17 Jun, 06:15 4★david_s 91●1 accept rate: 13%
 0 Just a small memory optimization: We only need to store a vector for the nodes at which strings end. For other nodes, we could just store the index of the first string from the left which passes through that node, since we are anyway going to take the min. Nice problem! And good editorial! :) answered 11 Jun, 16:17 3★akamoha 126●4 accept rate: 20% Similar optimisation as above comment. (11 Jun, 16:42) likecs6★
 0 I implemented a trie for this problem but only for task #3 it is giving WA and for rest 5 test cases it is giving AC can anybody help me out with what is wrong in my code https://www.codechef.com/viewsolution/18767595 answered 11 Jun, 17:58 1●1 accept rate: 0% use insert code option and then place your code (11 Jun, 18:04) Please give link to your submission. Pasting code is tedious and takes lot of visible space. (11 Jun, 18:14)
 0 can anyone explain me traverse_down function of testers code please.....thanks in advance answered 11 Jun, 23:13 59●4 accept rate: 0%
 0 I used similar approach as the second one. Just instead of storing vector of indices I stored lowest index of string which the character is part of. Unfortunately My code is erring out in task 3. Can any body point the bug ? Here's the link. Also nice editorial. Thanks. Just few small corrections though - 1) The tester has actually used approach 2 in his/her code. 2) The editorials' code cannot be opened - Access denied in XML tree. answered 11 Jun, 23:30 3★qwpad 1 accept rate: 0% There was an issue while linking the code. It is updated now. (12 Jun, 08:23) likecs6★ Okay. The editorialists' code was being displayed while clicking on testers' solution. Thanks. Btw Any comments regarding my code ? (12 Jun, 11:30) qwpad3★
 0 my code failed for test case 4 .I did it the same way by storing the lowest index of string. please can anyone check what is wrong. here is the link to my solution- link text answered 11 Jun, 23:46 2★n_kroma 1 accept rate: 0%
 0 I have used almost the same approach as the online one. Instead of a vector, i just stored the first occurence of the alphabet and another field which told me the index of the first word which ends there. But, I am getting a TLE for it. Could you help me find why it is happening? The link to my solution https://www.codechef.com/viewsolution/18809047 answered 12 Jun, 10:23 1●1 accept rate: 0% The test data for small test case for weak. Below is a counter test case for your solution: 2 a a 1 1 b  (17 Jun, 14:45) likecs6★
 0 Why not use map instead of implementing a trie? I understand the space requirement will be more, but it works for the above constraints (also works for most questions related to string queries). Is there any other reason why map is not generally considered? Should there be test cases which only pass for actual tries and not map? That forces the user to learn/understand how to implement them. Just my thoughts on this. Here's my attempt at the offline solution but with map: https://www.codechef.com/viewsolution/18808837 answered 12 Jun, 10:33 42●2 accept rate: 0% @arvindpunk, using map is fine and works in most cases. But beware of the additional $O(\log{N})$ factor it adds to your complexity and can give TLE in some problem. Learning and implementing tries is quite easy. I definitely recommend you to learn it as it is also the basis of harder string algorithms like aho-corasick, suffix tree etc. which use the same structure as its base. Also, tries are useful for xor-minimisation problems. (17 Jun, 14:47) likecs6★
 0 Can you please post the test cases file for this particular problem so that i can know where specifically my program / logic went wrong in the bruteforce method . answered 12 Jun, 10:56 1 accept rate: 0%
 0 How is the leftmost node guaranteed to be lexographically smaller? answered 12 Jun, 18:03 1 accept rate: 0% Read about tries xD (12 Jun, 18:43) Yeah I got that part! (12 Jun, 19:49)
 0 I used trie with storing the minimum index of the inserted string but I am also getting wrong answer for Test Case 4 My code link. Rest of the test cases are correct and fast enough. I might be missing a corner case but still unable to find it though. answered 13 Jun, 08:43 10●1 accept rate: 0%
 0 @arnaagyeya: You can compare your solution with mine. I find your code quite unreadable to find the actual mistake. I'm sure you can understand your mistake after looking at my code. I just read about trie and implemented it so it may have some bugs but the solution worked for this problem. Link to my code: https://www.codechef.com/viewsolution/18893009 answered 13 Jun, 17:57 2★decagon 34●4 accept rate: 0%
 0 How did the setter decide the upper-bound on the number of nodes of trie as (1<<20)?? Can anyone please explain... answered 13 Jun, 22:01 71●6 accept rate: 0% each string can have 10 letters..and there are 10^5 strings.. total characters possible are 10^6 at max... And (1<<20)== 2^20 is smallest multiple of 2 greater than 10^6 (14 Jun, 01:13) @l_returns why we looked for next multiple of 2... isn't the trie a 26-nary tree instead of a binary tree. If we use the formula given in this https://en.wikipedia.org/wiki/K-ary_tree ,parameters should be h=10,k=26 to calculate total number of nodes in trie (14 Jun, 11:29) U r doing it wrong way.. I had just calculated that answer before writing this comment of mine... It takes about 1.468*10^14 or something nodes to complete whole tree... But u need to realize that if we use nodes then we make them as much as we need.. so at max we need 10^6 nodes...so why to waste other nodes... So setter must be using nodes which are needed.. he won't create whole tree when it's not needed.. and constrainsts prove we need 10^6 at max.. I didn't knew the formula u said but I proved that formula using this example.. (14 Jun, 14:00)
 0 @decagon: Thanks for the answer. I could not get where I was wrong as mine is online implementation but your implementation was very efficient. A very nice offline approach for this problem. answered 14 Jun, 08:44 10●1 accept rate: 0%
 0 Is it necessary to store the entire vector of the indices of the strings at every node? Can't we just keep the minimum index of the string at each node along that path and then the actual index of the string at it's leaf node. Would be much more space efficient. answered 14 Jun, 09:19 37●1 accept rate: 0% He talks about that. Read the editorial and discussions above. (14 Jun, 12:05)
 0 Nice explanation. answered 17 Jun, 11:26 1 accept rate: 0%
 0 My code is not passing only the first test case .Can any one tell what is the first testcase? answered 18 Jun, 20:33 1 accept rate: 0%
 0 Using same approach as editorialist,still getting wa. https://ideone.com/eFFVqJ Here I am using unordered_map in trie to reduce unnecessary space but I have tried it directly also still getting wa. I am adding string no. to vector of strings at that position only if string having greater index than the last element in vector is lexicographically smaller than it answered 19 Jun, 18:27 5●2 accept rate: 0%
 0 Even brute force is passing under 1s in pypy... answered 21 Jun, 15:14 1 accept rate: 0%
 0 Can someone please explain the use of the member "leaf_id" in the "trie node" in editorialist's solution (I know it may be a lame question, but a beginner here) answered 02 Jul, 19:29 36●3 accept rate: 9%
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×15,264
×1,626
×591
×179
×68
×42
×26