GENETICS - Editorial






The solution is based on using some data structure which allows to perform all the operations in the problem in O(logn) time. One of those can be Cartesian trees (or treaps). We will use treaps with implicit keys.

Treap with implicit keys are good for this problem. Let’s describe how we can use them to implement all operations needed in the task. We will keep the following information in each node: the base, the sum of each of the bases in the subtrees of this node. The main operations on the Cartesian trees are split and merge. Using those operations we can implement all other such as inserting elements in the tree, deleting elements, updating elements. Also many other rather powerful operations can be perfomed with good complexity. The treap with implicit keys actually resembles an array. The keys being the order of the element in the treap. So such treaps can easily be split and then merged again.

First we need O(nlogn) time to construct a treap (There are also O(n) algorithms, but O(nlogn) is ok as well) for each initial DNA. We will use implicit keys and random priorities. With this the expected depth of each DNA will be logarithmic. Then we can each of the operations in logarithmic expected time in the following way.

• Cross operation: we call D1 = split(DNA1, k1) and D2 = split(DNA2, k2)and then merge(D1[1], D2[2]) and merge(D2[1], D1[2]). Both split and merge operations can be implemented with O(depth) complexity which is O(logn) in our case. Also important thing is that we need to implement persistent treaps, because we need to store all previous DNAs as well. But each cross operation will produce two new DNAs of about the same length. So when we have 10000 DNAs of length less than 2000000000 we can easily run out of memory. However while performing split and merge operation we create about O(depth) new nodes. So the increase in memory is onlu logarithmic.

• Mutate operation: this can easily be performed by deleting the element and then inserting a new one in the same position or just implementing update operation carefully. Once again this can be done in O(depth) time. No new DNA should be produced.

• Count operation: we call D1 = split(DNA, k1), then D2 = split(D1[2], k2-k1+1). Now we have the needed part of the DNA in D2[1]. So we just take the sum of each of the bases from the root of this treap. Once again the complexity is O(depth).

So the whole problem can be solved in O(nlogn + qlnMAXL) time, where MAXL is the maximum length of the DNAs. It must be noted the length of the DNAs can grow exponentially after cross operations so this can be an issue. But treap can give us O(lnMAXL) expected depth so the mentioned above expression can be used and this is good enough.

Cartesian tree is very powerful data structure. I suggest you read more about it in the literature and on the internet.


Can anyone completely explain implicit treaps??
With code(preferably in java)?