ANGRY_CHEF - Editorial



Author: darknight_1729
Tester: darknight_1729
Editorialist: darknight_1729




Trie, DP


You have N words. You need to choose K distinct words out of those. To choose a word you need a non empty prefix such that it is not a prefix of any word in the dictionary. You need to minimize the letters we have to type.


This problem can be approached in a few different ways, but the intended algorithm consists of inserting all N words into a trie (while marking which nodes in the trie represent the end of a word), and then breaking the problem down into subproblems with dynamic programming. Let F(i,k) be the minimum number of letters to type in order to text k different words which end at nodes in node i's subtree (or rather, subtrie). We can compute F by recursively traversing the trie, with the final answer being F(root,K).

Let’s establish that D(i) = depth of node i. Note that D(i) is then also the length of the prefix (and possibly complete word) represented by node i.

To establish some simple base cases, F(i,0)=0 for any node i, and F(i,1)=D(i) for any node i besides the root. This is because, if only 1 word will be used from node i's subtree, then the prefix represented by node i won’t be a prefix of any other chosen word, meaning that it can be used to type this word.

For any internal node i and any positive k≤K, we can compute F(i,k) by considering various distributions of k words amongst node i and its children’s subtries. Consider that node i has children c_1,c_2,…,c_m, and we use v_1,v_2,…,v_m words from each of those children’s subtries respectively. If node i represents the end of one of the N given words, then F(i,k)=F(c_1,v_1)+⋯+F(c_m,v_m)+D(i), with v_1+⋯+v_m=k−1. Otherwise, F(i,k)=F(c_1,v_1)+⋯+F(c_m,v_m) with v_1+⋯+v_m=k. Either way, we can consider all of the possible combinations of v_1…_m with a classic instance of dynamic programming across the children. In this fashion, we can in fact compute the values F(i,1…K) all at once in O(mK^2) time.

Every node is the child of at most one other node, meaning that the overall time complexity of this algorithm is O(CK^2), where C is the number of nodes in the trie (which is at most the sum of the lengths of the words).