[Data Structures] Tries (with Full Code)
To watch the youtube video that goes over this topic, click here.
A Trie is a tree data structure that stores sequences in a way such that overlapping elements (in the same position) can share the same node.
For example, "apple" and "apply" are two distinct sequences of characters, but they both share the initial sequence "appl".
A Trie is implemented as a tree. Each node of the tree holds:
Trie's Underlying Data Model
As the picture above shows, a Trie is represented as a tree of nodes. However, unlike a binary trees where each node can just have a left and right pointer, you have to decide how many pointers each Trie node is going to have.
For example, a Trie that holds characters only containing the lowercase English alphabet will need exactly 26 pointers to other nodes. Or maybe our Trie holds phone numbers, in which case we expect only digits and each node holds 10 pointers (one for each digit).
We could hardcode each pointer if we know exactly what our Trie is going to hold.
We can clean this up by using an array (which still requires knowing the mapping. Ex. next maps to the character 'a').
A more flexible approach (but potentially less efficient) is using a hash map, which means your Trie can scale to as many nodes as required without needing to know exactly how many ahead of time.
Now we can handle unexpected character inputs like punctuation (! ? &) or special characters.
We initialize our Trie with an empty root TrieNode, and from that rootnode we add/update nodes as necessary with methods designed to traverse and edit TrieNodes.
Trie Operations (with Pseudocode)
As with most tree data structures, traversing nodes is best done with recursion or iteration. Most of the operations we'll see utilize iteration, but recursion is also fine.
Initializing the Trie
Assuming we have our TrieNode structure set up, we start by initializing an empty node to be the root of our tree.
If we want to insert a sequence into our Trie, we iteratively traverse the Trie for each element in the sequence. If the node doesn't exist for a certain element in the Trie, then we create it. At the last element of the sequence, we mark the node as endOfSequence=true to indicate that node ends a sequence.
To check if a sequence was inserted into our Trie, we iteratively traverse the Trie. If we ever find that one of the elements in the sequence doesn't have a node for it in the Trie, we return false. If we get to the end of the sequence and the node in Trie doesn't have endOfSequence==true then we return false. (This can happen in cases where we inserted longer sequences. Ex. Insert "antler" but check if our Trie contains "ant".)
To check if there's any sequence in our Trie that starts with a sequence, we can iteratively traverse the Trie and make sure each element of the sequence has a node in the Trie. For example, check if any sequences begin with "ant".
(This is very similar to the Contains operation so code is omitted in this blog post. See the full code on github to see how it's implemented.)
The time and space complexity for all of these operations are O(number of elements in the sequence). Tries are very efficient.
To see full Trie code, see the code I wrote on github that creates a Trie class.
When To Use Tries
Tries are really good data structures if you need to:
To test your knowledge, Leetcode has a problem on implementing the methods of a Trie.
Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels.
Comments are closed.
Hi, I'm srcmake. I play video games and develop software.
Pro-tip: Click the "DIRECTORY" button in the menu to find a list of blog posts.
License: All code and instructions are provided under the MIT License.