Compressed Trie Tree

Hello, people! In this post, we will discuss a commonly used data structure to store strings, the Compress Trie Tree, also known as Radix Tree or Patricia (Practical Algorithm to Retrieve Information Coded in Alphanumeric) Tree. If you remember, the problem with a Trie Tree is that it consumes a lot of memory. So, due to its memory consumption, developers prefer using a compressed trie tree to get the same functionality at the same runtime complexity in memory-critical situations such as in android apps.

“Compress”-ing the tree

Now, the idea of the compressed trie tree is to convert long chains of single-child edges to one single edge. Example, suppose we have two words “facebook” and “facepalm” in our regular trie tree, it would look like –

Trie Tree

It’s a pretty long one, isn’t it? Now, how about this one below?

compressed trie tree

This one surely looks a lot compact! This is the compressed trie tree. As you can see what we do here is to make long chains of single-child edges into one edge. Unlike a regular trie tree, in a compressed trie tree, the edge labels are strings.

Node in Compressed Trie Tree

For a regular trie tree, our tree node looked something like this –

class Node {
    Node[] children = new Node[26];
    boolean isWordEnd;
}

So, in every node, we had an array of references. The first references corresponded to ‘a’, the second to ‘b’, and so on. So, essentially, we had a mapping of alphabets to references. We had a way of saying, “An edge ‘a’ is denoted by this particular element in the array of references”. Now, in a compressed trie tree, the edges are labeled by strings. Now, we need a way of saying, “An edge ‘face’ is denoted by this particular element in the array of references”. To accomplish this, we re-design our tree node as follows –

class Node {
    Node[] children = new Node[26];
    StringBuilder[] edgeLabel = new StringBuilder[26];
    boolean isEnd;
}

So, what we did is that we added an additional array of Strings along with the array of references. Now edgeLabel[0] will denote the string starting with ‘a’, edgeLabel[1] will denote the string starting with ‘b’, and correspondingly, children[0] will denote the edge with the label edgeLabel[0].

Example, in the above diagram, the root node will have edgeLabel[5] = “face” and children[5] will point to the internal node. The internal node will have edgeLabel[1] = “book” and children[1] will point to the leaf node which will denote the occurrence of the word “facebook”. The same internal node will also have edgeLabel[15] = “palm” and children[15] will point to the leaf node which will denote the occurrence of the word “facepalm”. The rest of the values of edgeLabel and children in the internal node will be null.

The above code is written in Java. For Java, it is much better to use StringBuilder rather than String because we would be doing a lot of string manipulation. Using String will heavily slow down your program. If you are not familiar with StringBuilder, you can refer to my post.

insert() operation

All operations in the compressed trie tree are similar to what we would do in a regular trie tree. Insert operation is the one which will differ the most. In the insert operation, we need to take care of a few cases, they are –

  1. Inserting new node for a new word – This occurs when the starting character of the word is new and there’s no edge corresponding to it. This may occur at root, or after traversing to an internal node.compressed-trie-tree-4
  2. Inserting a prefix of an existing word – Inserting prefix into compressed trie tree
  3. Inserting a word which has a partial match to an existing edge – This occurs when we are trying to insert “this” when “there” is already inserted into the tree. Remember that “there” can have further children too, like if “thereafter” and “therein” are already inserted.breaking words during compressed trie tree insertion

So, for these cases, we would have to break the existing word or the newly inserted word accordingly. The faster we perform these string operations, the faster the insert operation will be.

search() operation

Searching in a compressed trie tree is much like searching. Here, instead of comparing a single character, we compare strings. The following cases will arise –

  • The string we want to search does not exist. Example, searching for “owl” when the tree only has “crow” in it.
  • The string we want to search exists as a prefix. Example, searching for “face” when the tree only has “facebook”.
  • Only the prefix of the target string exists. Converse of the previous case. Searching for “facebook” when the tree only has “face”.
  • The string we want to search matches partially with an existing string. Example, searching for “this” where the tree only has “there”.
  • Another case is when the edge label fully matches to the starting portion of the string. Example, searching for “thereafter” when “thereafter” and “therein” exist in the tree. For this, after a full match with “there”, we traverse to the node which corresponds to that label and then resume our search (searching for “after”).

If we are able to fully traverse the tree via the target string and arrive on a tree node, we check if that node is a word ending or not. If yes, we return true or, we return false. For rest of the cases, return false.

startsWith() operation

The startsWith() operation is a popular operation performed on the compressed trie tree which checks if there’s any word in the tree which starts with the target word. This method would be exactly as the searching method. The minor change with the search operation would be, in this operation, we will just check if we are able to fully traverse the tree via the target string and arrive on a node (which may be the root). If we can we return true, regardless of whether the current node is a word ending or not. This is because, even if it is not a word ending, its children will lead to a node which would be a word ending.

Printing the compressed trie tree

For each edge traversed, add its label to a variable and recursively do this for the child node. While traversing another edge, remove the previously added label and add the new label of the new edge traversing. Print the variable only if the current node is a word ending.

This recursive method should print all the words in the compressed trie tree in a lexicographical order.

Code

Start with your existing trie tree code, modify the node definition and then work on the insert method cases. Once you get the insert correctly, then the rest will work out easily. For the insert cases, you just have to do the string operations and change the references carefully. Try to code those cases. Come back and have a look at the diagrams if you need to.

You can check your code’s correctness with LeetCode’s Implement Trie problem. Try to solve that question using a compressed trie tree. Once you solve it, try to reduce the runtime of your program.

You can refer to my code if you get stuck. 🙂

    

This is the Java implementation. I will update this post with the C/C++ implementation soon.

In terms of runtime complexity, compressed trie tree is same as that of a regular trie tree. In terms of memory, a compressed trie tree uses very few amount of nodes which gives you a huge memory advantage especially for long strings with long common prefixes. In terms of speed, a regular trie tree would be slightly faster because its operations don’t involve any string operations, they are simple loops.

I hope my post has demystified everything regarding a compressed trie tree. Tutorial and code for a compressed trie tree are hard to find. I hope my post saved you the effort of finding further tutorials. Do comment below if you have any doubts. Keep practising! Happy Coding!! 😀

Trie Tree Practise – SPOJ – DICT

Hello people..! In this post we will talk about solving another competitive programming question based on trie tree. I will take up the SPOJ problem – DICT. This is a little harder than the previous trie tree practise problem, PHONELST. Now, read the problem statement carefully a couple of times. Just so that you don’t need to open SPOJ in another tab, I have posted the problem statement below –

Problem Statement
Input Specification
Output Specification
Sample Input
Sample Output

Problem in terms of Trie Tree

Firstly, we will insert the N words into a trie tree. Then, for each K prefix words –

  • We will traverse the trie tree for this word.
  • If it exists, we will lexicographically print the trie tree, with that node as the root. And obviously, we add the prefix to whatever we will print.
  • If the word doesn’t exist at all, that is, while traversing, we would reach a dead-end (no children) node before the prefix word is fully processed, we will simply return from our traversal and print “No match.”.

So, what all do you need to solve this?

  • Trie Tree insertion method.
  • Trie Tree inorder traversal method (lexicographical print)

We can discard any other methods such as delete. We will need another method for searching whether a given word is present in the trie tree or not, in O(L) time, where L is the length of the word. So, take your implementation of trie tree and get it ready for solving the question by making these changes.

searchWord() Method

This is a simple trie tree traversal method, where we traverse the trie tree based on a given word. We look at each character of the word and go to the corresponding edge, in the trie tree. If there is no edge, we return null. If we have reached the end successfully, we return the node where the word ends. Try to code this procedure, you can refer to my code if you are stuck.

C++
struct node * searchWord(struct node * TreeNode, char * word)
{
    while (*word != '\0') {		// while there are alphabets to process
        if (TreeNode->children[*word - CASE] != NULL) {
        	// there is an edge corresponding to the alphabet
            TreeNode = TreeNode->children[*word - CASE];
            ++word;
        } else {
        	// there is no edge corresponding to the alphabet
            break;
        }
    }
 
    if (*word == '\0') {
    	// the word was completely processed
        return TreeNode;
    } else {
    	// word is not there in trie tree
        return NULL;
    }
}

lexicographPrint() Method

This method will have very minor changes from your original method. According to our intuition, we will call this method based on the output of the searchWord method. If it is null, then it is “No match.”. If it is not null, then we begin the lexicographPrint from that node.
This method will carry an extra parameter, which will be the prefix word. Everytime we hit a leaf node, we first print the prefix word and then the remaining word traversed in this method.
Example, in the sample test case, the prefix word was “set”, so, the searchWord would return us, the location of the T node in S → E → T traversal. Then, we begin our lexicographPrint, and when we hit the end of “setter” word, we will print the “set” prefix, and the “ter” word which we gained from the lexicographPrint method.
Try to code these modifications in your code, you can refer to my code if you are stuck.

C++
 
void lexicographPrint(struct Node * trieTree, vector<char> word, char * prefix)
{
    int i;
    bool noChild = true;
 
	if (trieTree->wordEnding && word.size() != 0) {
        vector<char>::iterator itr = word.begin();
		
		printf("%s", prefix);	//	print the prefix
        
		while (itr != word.end()) {
			// print the rest of the word
            printf("%c", *itr);
            ++itr;
        }
        
        printf("\n");
    } 
 
    for (i = 0; i < ALPHABETS; ++i) {
        if (trieTree->children[i] != NULL) {
            noChild = false;
            word.push_back(CASE + i);
            lexicographPrint(trieTree->children[i], word, prefix);
            word.pop_back();
        }
    }
 
    word.pop_back();
}

Putting the pieces together

Now combine your modules and prepare your main function as per the problem statement. You can refer to my code if you are stuck.

    

Word of Caution –

  • The output in the case of a mismatch is “No match.”, not “No match”.
  • The time limits are pretty tight, so your methods should be tidy.

I hope that you were able to solve this problem using a trie tree. Feel free to comment if you have any doubts. If you have any bugs in your code, I’d be glad to help, but don’t comment your entire code in the comment, please leave Ideone or PasteBin links, or if you don’t want to show your code publicly, you can fill up the response form below to mail your code to me. I will respond as soon as I can. Keep practising… Happy Coding…! 🙂

Trie Tree Practise – SPOJ – PHONELST

Hello people..! In this post I will show you how to get started with solving Trie Tree based questions in competitive programming. Learning a data structure is different from solving competitive coding questions based on that data structure. In this post, I take up a very simple question so that your confidence is boosted.
We will look at the SPOJ problem – Phone List. It is a very straight forward problem. Just so that you don’t need to go to SPOJ in a new tab, I’m putting the problem statement here.

Problem Statement

Phone List Given a list of phone numbers, determine if it is consistent in the sense that no number is the prefix of another. Let’s say the phone catalogue listed these numbers:

• Emergency 911

• Alice 97 625 999

• Bob 91 12 54 26

In this case, it’s not possible to call Bob, because the central would direct your call to the emergency line as soon as you had dialled the first three digits of Bob’s phone number. So this list would not be consistent.

Input

The first line of input gives a single integer, 1 <= t <= 40, the number of test cases. Each test case starts with n, the number of phone numbers, on a separate line, 1 <= n <= 10000. Then follows n lines with one unique phone number on each line. A phone number is a sequence of at most ten digits.

Output

For each test case, output “YES” if the list is consistent, or “NO” otherwise.

The Problem in terms of Trie Tree

We are supposed to check if any word is a prefix of any other or not. Now, there might be a hundred ways to solve this problem, but we will do this using a trie tree so that we can get confident in using the data structure, and so that we can attempt tougher ones based on a trie tree. All throughout my explanation, I will be referring to the implementation in my post on Trie Tree.
What we will do to solve this problem is that, we will insert the words into the trie tree one-by-one, and we will check for the prefix word criteria as we are inserting. Now, there are 2 cases.

Case – 1 – Prefix word is already inserted

This is the sample test case. Consider this test case –

Input
face
facebook

So, what I said we will do is that, we will be inserting the words one-by-one. So, when we insert the word “face”, no problems occur. But while inserting the word “facebook”, we would travel to the nodes F → A → C → E. And at the node E, we would have some indication that this node is a leaf node, that is, some word ends here. In my implementation, this can be indicated by –

if (temp->occurrences.size() == 0) {
	// not a leaf node
} else {
	// a leaf node, thus this is
	// the end of the prefix word
}

If we encounter this situation, we know that the result will be NO.

Case 2 – Prefix word is about to be inserted

This is just the opposite of the previous case, consider the test case –

Input
facebook
face

We won’t have any issues while inserting “facebook”. But when inserting “face”, we traverse F → A → C → E. And in the end of insertion, we see that there is a child node to the node E. Which means that there is a bigger word which ends somewhere deep down the trie tree. Which means that we just inserted the prefix word. We could check this by traversing the children array –

for (i = 0; i < ALPHABETS; ++i) {
	if (temp->children[i] != NULL) {
		// The newly inserted word is
		// prefix to another word
	}
}

In this case too, the answer would be a NO. For every other case, the answer would be a YES.

So, try to code this problem. Take your code of the trie tree, remove the unnecessary things first, like the delete function or print function or any others which we won’t need. As fas as I know, the insert function is all that you will need. And try to make the changes required for the test cases.
Your insert function could return a true or a false, depending on whether the insertion encountered a prefix test case or not. Take time to think about it and try to code it. If you get stuck, you can refer to my code below –

    

A word of caution –

  • Even if you did encounter a prefix word very early, don’t break out of the input, as you will disturb the input for the next test case.

I hope that you were able to solve this problem using a trie tree. It is simple and a prefect problem to start your trie tree streak in competitive coding. Feel free to comment if you have any doubts. If you have any bugs in your code, I’d be glad to help, but don’t comment your entire code in the comment, please leave Ideone or PasteBin links, or if you don’t want to show your code publicly, you can fill up the response form below to mail your code to me. I will respond as soon as I can. Keep practising… Happy Coding…! 😀