Longest Common subsequence

Supposing that we have a string not in the classic form but in a form like that

where the numbers are the frequencies of chars.

How can i work the longest common subsequence? I have worked with arrays of struct but i find some difficulty. Can someone provide code for the LCS part working also with frequencies?

1 Like

Assuming that you are using a classic implementation for LCS, (I’m using Wikipedia as a reference), this is what you could do:

  1. Store the character and frequency as a key:value pair, say in a struct. Hence each string could be represented as a linked list of such structs.
  2. Make the traditional 2D array used in LCS.
  3. When there is a match at a particular cell, instead of appending only the character, append a struct that has the character and the minimum of the two frequencies. (Each cell could be a linked list as well).

Small example: Consider the two strings:

alt text

Legend (in terms of steps followed): Orange -> Blue -> Pink -> Red -> Black

  1. Whenever there is no match, simply take the longer string from the top and left cells.
  2. If there is a match, append the longest possible match to the top left cell value and store in current cell.

Reference: https://en.wikipedia.org/wiki/Longest_common_subsequence_problem

You can use the same DP as for LCS of characters (dp[i][j] is the LCS of the first i characters of the first string and the first j characters of the second one), but on arrays of different objects than characters - those “repeated character” objects in this case.

Where you had dp[i][j] =max(dp[i][j],dp[i-1][j-1]+1), you should replace that 1 by LCS(object i, object j), which is 0 if they have different repeated characters and minimum of their lengths otherwise.

What’s the issue in converting this encoded string back to its original form and then applying LCS algo?

That would be really slower wouldn’t be?