by Arne Sommer

# An Inverted Prefix with Raku

[69] Published 25. April 2020.

This is my response to the Perl Weekly Challenge #57.

## Challenge #57.1: Invert Tree

 You are given a full binary tree of any height, similar to the one below: Write a script to invert the tree, by mirroring the children of every node, from left to right. The expected output from the tree above would be: The input can be any sensible machine-readable binary tree format of your choosing, and the output should be the same format.

The program «path-sum-conf2», from my answer to last week's Challenge #56.2: Path Sum is a good starting point for this challenge. The first part of the program can be reused, with some minor changes:

File: invert-tree (partial) ```unit sub MAIN (Str :\$tree = "1 | 2 3 | 4 5 6 7", # [1] :\$verbose, :\$quotes); # [2] class BinaryNode { has Int \$.value; has BinaryNode \$.left is rw; # [3] has BinaryNode \$.right is rw; # [3] method swap # [4] { (self.left, self.right) = (self.right, self.left); } } my @btree = \$tree.split("|")>>.words; my @old-nodes; my @new-nodes; for @btree.reverse -> \$row { my @current = @\$row; @old-nodes = @new-nodes; @new-nodes = (); for @current -> \$value { if \$value eq "*" { @new-nodes.push("*"); next; } my \$left = @old-nodes.shift // "*"; \$left = Nil if \$left eq "*"; my \$right = @old-nodes.shift // "*"; \$right = Nil if \$right eq "*"; @new-nodes.push(BinaryNode.new(value => \$value.Int, left => \$left // Nil, right => \$right // Nil)); } } my \$btree = @new-nodes[0]; say \$btree, "\n" if \$verbose; ```

[1] A new default tree.

[2] Use this option if you want quotes around the output.

[3] The task is to swap the left and right children, so these class variables must be writeable (after we have created the object). That is enabled with `is rw`.

[4] Swapping the to children, using lists (instead of a temporary variable) and three single assignments).

See docs.raku.org/routine/is rw for more information about `is rw`.

The challenge states that the tree is a full binary tree, meaning that all the child pointers are in use, except on the bottom (and last) row. The procedure that prints the tree, after the swapping, relies on this fact.

File: invert-tree (the rest) ```traverse(\$btree); # [5] say \$btree, "\n" if \$verbose; say \$quotes ?? '"' ~ tree2string(\$btree) ~ '"' # [6] !! tree2string(\$btree); sub traverse (\$current) # [5] { \$current.swap; # [5a] traverse(\$current.left) if \$current.left.defined; # [5b] traverse(\$current.right) if \$current.right.defined; # [5c] } sub tree2string (\$tree) # [7] { my @level; # [8] my \$level = 0; # [9] sub do-it(\$current, \$level) # [10] { say ":: " ~ \$current.value if \$verbose; @level[\$level].push(\$current.value); # [11] do-it(\$current.left, \$level +1) if \$current.left.defined; # [12] do-it(\$current.right, \$level +1) if \$current.right.defined; # [13] } do-it(\$tree, \$level); # [14] return @level.join(" | ").join(" "); # [15] } ```

[5] Start the show with this call on the top of the tree. It swaps the left and right child (which will work regardless of the actual value, so one, both or none can be defined) [5a]. Then it calls itself recursively on the left child if it is defined [5b], and the right one if it is defined [5c].

[6] Print the result, with quotes if requested.

[7] This procedure traverses the tree and generates an output string identical to the one we used for the input.

[8] This one is a list of lists, where the row number (starting from 0) is the first dimension.

[9] The level (or row number, again starting from 0).

[10] A recursive procedure, called for each node in the tree. Note the arguments, the current node (to register), and the level (row number).

[11] Add the value to the correct row in the list of lists.

[12] Recursively do the left child, if any.

[13] Recursively do the right child, if any.

[14] Kick it off.

[15] Return the string. The first `join` inserts a bar between the rows, and the seconds merges the values in the rows.

Running it:

```\$ raku invert-tree 1 | 3 2 | 7 6 5 4 \$ raku invert-tree --tree="1 | 2 3 | 4 5 6 7" 1 | 3 2 | 7 6 5 4 \$ raku invert-tree -tree="5 | 4 8 | 11 * 13 9 | 7 2 * * * 1" 5 | 8 4 | 9 13 11 | 1 2 7 ```

The last one shows that an unbalanced tree screws up the algorithm. The result is a string that we cannot give back to the program, as it has too few values on the third and fourth row (given the numbers 1,2,3 and 4 for the rows). And it is not an inverted version of the original tree.

### Simplification

We can simplify this swapping by just working on the input string, skipping the tree detour:

File: invert-tree-oneliner ```unit sub MAIN (\$tree = "1 | 2 3 | 4 5 6 7"); say \$tree.split(" | ")>>.words>>.reverse>>.join(" ").join(" | "); ### # 1 # 2 # 3 # 4 # 5 # 6 ```

[1] Starting with the input string,

[2] • Split it on the `|` character (with a space on each side). This gives three strings (if we use the default tree): «1», «2 3», and «4 5 6 7».

[3] • Split each of those three strings into words (i.e. split on space).

[4] • Reverse the three lists (each representing a row in the tree).

[5] • Join the values in the lists with a space between each value. This gives three strings again.

[6] • Join the three strings with « | » between them.

That was an impressive simplification... (We got rid of about 90% of the file size. And even more if we remove the crash bang line in both versions.)

See docs.raku.org/routine/words for more information about `words`.

Running it:

```\$ raku invert-tree-oneliner 1 | 3 2 | 7 6 5 4 ```

This one actually handles unbalanced trees:

```\$ raku invert-tree-oneliner "5 | 4 8 | 11 * 13 9 | 7 2 * * * 1" 5 | 8 4 | 9 13 * 11 | 1 * * * 2 7 ```

Not bad for a one liner...

## Challenge #57.1 Bonus: Pretty Print

 In addition to the above, you may wish to pretty-print your binary tree in a human readable text-based format similar to the following: ``` 1 / \ 3 2 / \ / \ 7 6 5 4 ```

The tricky part is getting the indentation (spaces before the first value) and spacing (spaces between the values), both shown as `-` in the illustrations, right. I have chosen to place the arrows (so to speak) at the same position as the value below them. The challenge did not.

Let us start with figuring out the indentation for binary trees of different heights:

```-1 # row 1 | Indent: 1 /-\ 2-3 # row 2 | Indent: 0 ``` ```---1 # row 1 | Indent: 3 -/---\ -2---3 # row 2 | Indent: 1 /-\-/-\ 4-5-6-7 # row 3 | Indent: 0 ``` ```-------1 # row 1 | Indent: 7 ---/-------\ ---2-------3 # row 2 | Indent: 3 -/---\---/---\ -4---5---6---7 # row 3 | Indent: 1 /-\-/-\-/-\-/-\ 8-9-0-0-0-0-0-0 # row 4 | Indent: 0 ```

The next value in the sequence (0,1,3,7,...) isn't obvious, so we do another one:

```---------------1-------- # row 1 | Indent: 15 -------/---------------\ -------2---------------3 # row 2 | Indent: 7 ---/-------\-------/-------\ ---4-------5-------6-------7 # row 3 | Indent: 3 -/---\---/---\---/---\---/---\ -8---9---0---0---0---0---0---0 # row 4 | Indent: 1 /-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\ 0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0 # row 5 | Indent: 0 ```

I still didn't get it, so I ran the program (which I'll show later) with 6 rows trying out different values. 31 worked. Then I saw the pattern, which can be expressed as a Raku sequence like this:

```my \$indents := (0, { (\$_ * 2) +1 } ... *); # [1] ```

[1] Start with 0, then each value is the previous one multiplied with 2, and finally with 1 added. If we do the row numbering from the bottom, starting with 0, we can use that number as an index on the sequence to get the indentation.

Now if you study the graphs carefully, you should notice that the spacing (the number of spaces between two numbers) is the same value as the indentation on the previous (above) row of numbers. That simplifies matters.

Then the program is pretty easy, as long as we insist that the tree is full, and that the values only use one digit (or character).

File: bonus-tree ```unit sub MAIN (Str \$tree = "1 | 2 3 | 4 5 6 7", :\$dashes); # [1] my \$space = \$dashes ?? "-" !! " "; # [1] my @btree = \$tree.split("|"); # [2] my \$rows = @btree.elems; # [3] my \$row = \$rows -1; # [4] my \$count = 1; # [5] my \$indents := (0, { (\$_ * 2) +1 } ... *); # [6] my \$indent = \$indents[\$row]; # [7] my \$spacing = 0; # [8] for @btree -> \$level # [9] { my @values = \$level.words; # [10] die "wrong number of element on row { \$rows - \$row }: { @values.elems } \ (should have been \$count)" if @values.elems != \$count; # [11] print \$space x \$indent; # [12] say @values.join(\$space x \$spacing); # [13] last if \$row == 0; # [14] \$row--; # [15] \$spacing = \$indent; # [16] \$indent = \$indents[\$row]; # [17] print \$space x \$indent; # [18] my \$line = "/{ \$space x \$spacing}\\{ \$space x \$spacing }" x \$count; # [19] say \$line.trim-trailing; # [19a] \$count *= 2; # [5a] } ```

[1] Specify «--dashes» on the command line to get dashes instead of spaces in the output.

[2] Get the rows (as space separated strings, one for each row).

[3] The number of rows.

[4] Where to start in the indentation sequence (see [6]). We count the rows backwards.

[5] The number of «/\» sequences to print. It doubles for each row [5a].

[6] The indentation sequence.

[7] Get the intial indentation.

[8] The spacing is the indent on the previous row. The first one doesn't need it, as there is only one value (the top node).

[9] For each row in the tree,

[10] Get the values as a list

[11] Terminate the program if the number of elements is wrong.

[12] The indentation.

[13] And the values.

[14] We are finished if this is the last row (so that we don't get a line with arrows at the end).

[15] Count down (towards 0).

[16] The spacing.

[17] Get the next indentation value.

[18] Print the indentation for the arrow line,

[19] and the arrows- without any trailing spaces [19a].

See docs.raku.org/routine/x for more information about the string repetition operator `x`.

Running it:

```\$raku bonus-tree "1 | 2 3" 1 / \ 2 3 \$ raku bonus-tree "1 | 2 3 | 4 5 6 7" 1 / \ 2 3 / \ / \ 4 5 6 7 \$ raku bonus-tree "1 | 2 3 | 4 5 6 7 | 8 9 0 0 0 0 0 0" 1 / \ 2 3 / \ / \ 4 5 6 7 / \ / \ / \ / \ 8 9 0 0 0 0 0 0 \$ raku bonus-tree "1 | 2 3 | 4 5 6 7 | 8 9 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 1 \ 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 \ 6 7 8" 1 / \ 2 3 / \ / \ 4 5 6 7 / \ / \ / \ / \ 8 9 0 0 0 0 0 0 / \ / \ / \ / \ / \ / \ / \ / \ 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 ```

It works, but the top lines are not pretty for big trees. I could have added more rows with arrows to make it pretty, but that would increase the height considerably. And this pyramid isn't really suitable for ascii art anyway.

Now we can run one of them, with dashes enabled, so that we can count them if we are so inclined. Note the trailing dashes on the arrow lines. It isn't nice, but isn't noticeable when the default space character is used.

```\$ raku bonus-tree --dashes "1 | 2 3 | 4 5 6 7 | 8 9 0 0 0 0 0 0 | \ 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1" ---------------1 -------/---------------\--------------- -------2---------------3 ---/-------\-------/-------\------- ---4-------5-------6-------7 -/---\---/---\---/---\---/---\--- -8---9---0---0---0---0---0---0 /-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\- 0-0-0-0-0-0-0-0-1-1-1-1-1-1-1-1 ```

We can fix this by collecting the string (in [24]) in a variable, and apply `trim-trailing` to get rid if the trailing spaces before printing it.

There is also a `trim-leading` function that removes spaces from the beginning only, and a `trim` function that removes spaces from both the beginning and the end.

And we should present the error message if the tree is not full up front, instead of after printing the rows of the tree that are fine:

```\$ raku bonus-tree "1 | 2 3 | 4 5 6 7 8" 1 / \ 2 3 / \ / \ wrong number of element on row 3: 5 (should have been 4) \$ raku bonus-tree-fixed "1 | 2 3 | 4 5 6 7 8" wrong number of element on row 3: 5 (should have been 4) ```

And let us support values using more than one character. The result, for once shown before the code:

```\$ raku bonus-tree-fixed "11 | 22 33 | 44 55 66 77" 11 / \ 22 33 / \ / \ 44 55 66 77 \$ raku bonus-tree-fixed "111 | 222 333 | 444 555 666 777" 111 / \ 222 333 / \ / \ 444 555 666 777 ```

It isn't 100% nice (the two spaces between the values on the last line), but I'll leave it at that. Even if the challenge asked for «pretty print».

File: bonus-tree-fixed ```unit sub MAIN (Str \$tree = "1 | 2 3 | 4 5 6 7", :\$dashes); my \$space = \$dashes ?? "-" !! " "; my @btree = \$tree.split("|"); my \$rows = @btree.elems; my \$row = \$rows -1; my \$count = 1; my \$width = \$tree.split("|")>>.words>>.chars>>.max.max; # [1] ############# 1a ############ 1b #### 1c #### 1e ## 1f ## my \$indents := (0, { (\$_ * 2) +1 } ... *); my \$indent = \$indents[\$row] * \$width; # [2] my \$spacing = 0; for @btree -> \$level # [3] { state \$row = 1; # [3a] state \$target = 1; # [3b] my @values = \$level.words; die "wrong number of element on row \$row: { @values.elems } \ (should have been \$target)" if @values.elems != \$target; \$row++; \$target *=2; } for @btree -> \$level { my @values = \$level.words; print \$space x \$indent; say @values>>.fmt("%-{\$width}s").join(\$space x \$spacing); last if \$row == 0; \$row--; \$spacing = \$indent; \$indent = \$indents[\$row] * \$width; # [2] print \$space x \$indent; my \$line = ( "/".fmt("%-{\$width}s") ~ \$space x \$spacing ~ "\\".fmt("%-{\$width}s") ~ \$space x \$spacing) x \$count; # [4] say \$line.trim-trailing; \$count *= 2; } ```

[1] Get the length (number of characters) of the longest value in the tree. We start with splitting the tree into lines with `split` [1a], then those lines into individual values with `words` [1b], giving a list of lists. Then we apply `chars` on all those values [1c], giving the length instead of the original value in the list of lists. Then we use `max` to get the highest value on each row [1e], thus reducing the data structure to a simple list. The final `max` reduces those values to a single value [1f], and we are done.

[2] Add in the width from [1].

[3] A new loop, where we check that each row in the tree has the expected number of elements, before printing anything. Note the `state` variables ([3a] and [3b]), so that we can declare them inside the loop. They are only initialised the first time the program runs into them.

[4] Note the use of `fmt` to ensure that the values all use the same (the maximum) space. `fmt` is the method version of `sprintf`.

See docs.raku.org/syntax/state for more information about `state`.

### Over Engineering

You may have noticed that the code in [1] can be shortened, as we have the first part of it already:

```my \$width = @btree>>.words>>.chars>>.max.max; ```

That was a red herring. We can actually simplify it considerably, using the initial input string:

```my \$width = \$tree.words>>.chars.max; ```

This one treats the bar characters (`|`) as a value, but as it has a size of 1 that doesn't matter.

The program has been updated with this change.

## Challenge #57.2: Shortest Unique Prefix

 Write a script to find the shortest unique prefix for each each word in the given list. The prefixes will not necessarily be of the same length. Sample Input ```[ "alphabet", "book", "carpet", "cadmium", "cadeau", "alpine" ] ``` Expected Output ```[ "alph", "b", "car", "cadm", "cade", "alpi" ] ```

We start with a hash of the first letters in the words (`%top`), and stack on new letters recursively until we have placed them all. Each node (letter) has the letter, a hash with pointers to next letters (with the letter as the key), and a counter. The counter is important, as it tells us how many words we have registered at any given point in the data structure.

The words given in the challenge gives this data structure (or directed graph):

We go from «alphabet» to «alpha» by following the first graph, stopping when we arrive at a node where the counter is «1» (marked with blue). The rest of the word is in the graph (marked with red), and is ignored as we have arrived at the «shortest unique prefix».

The program has two separate parts, the registration and the lookup. I'll take them one at a time.

### The Registration

File: shortest-unique-prefix (partial) ```unit sub MAIN ( Str \$words = "alphabet book carpet cadmium cadeau alpine", # [1] :\$verbose); my @words = \$words.words; # [2] class Letter # [3] { has \$.letter; # [3a] has \$.count is rw; # [3b] has %.next is rw; # [3c] } my %top; # [4] for @words -> \$word # [5] { my @letters = \$word.comb; # [5a] my \$current = @letters.shift; # [6] my \$pointer; # [7] if ! %top{\$current}.defined # [8] { %top{\$current} = Letter.new(letter => \$current); # [8a] } \$pointer = %top{\$current}; # [9] \$pointer.count++; # [10] for @letters -> \$next # [11] { if ! \$pointer.next{\$next}.defined # [12] { \$pointer.next{\$next} = Letter.new(letter => \$next); # [12a} } \$pointer = \$pointer.next{\$next}; # [13] \$pointer.count++; # [14] } } if \$verbose # [15] { for %top.keys -> \$key { say ":: Top \$key -> " ~ %top{\$key}.count; } } say ":: %top{} " if \$verbose; # [16] ```

[1] A default list of words, if none are given.

[2] Get the words.

[3] We need this class to store the words. Each node has a letter [3a], a count of how many words that share the path so far [3b], and a hash of following letters [3c].

[4] The words do not all start with the same letter, so we need a hash to cellect the first letters.

[5] Iterate over the words, and get the individual letters [5a].

[6] Get the first letter in the word.

[7] A pointer to where we are (in the tree) at any given time when we traverse a word.

[8] If the first letter doesn't exist in our hash of starting letters, create a new node [8a]. Note that we do not touch the counter, as the following code (in [10]) does that.

[9] Set the current pointer.

[10] Increase the counter.

[11] Then we iterate over the rest of the letters in the word.

[12] • if the next letter isn't already available in the tree (as a next letter at the current position), create it [12a]. This is similar to [8].

[13] Update the current pointer. This is similar to [9]

[14] Increase the counter. This is similar to [10]

[15] Some verbose output.

[16] Some more.

Running it, with verbose mode (see [15] and [16] above), to see what we have at this time. I have added an awful lot of newlines (and indentation) to make it sort of readable:

```:: Top a -> 2 :: Top c -> 3 :: Top b -> 1 ::{ :a(Letter.new( letter => "a", count => 2, next => { :l(Letter.new( letter => "l", count => 2, next => { :p(Letter.new( letter => "p", count => 2, next => { :h(Letter.new( letter => "h", count => 1, next => { :a(Letter.new( letter => "a", count => 1, next => { :b(Letter.new( letter => "b", count => 1, next => { :e(Letter.new( letter => "e", count => 1, next => { :t(Letter.new( letter => "t", count => 1, next => {}))}))}))}))})), :i(Letter.new( letter => "i", count => 1, next => { :n(Letter.new( letter => "n", count => 1, next => { :e(Letter.new( letter => "e", count => 1, next => {}))}))}))}))}))})), :b(Letter.new( letter => "b", count => 1, next => { :o(Letter.new( letter => "o", count => 1, next => { :o(Letter.new( letter => "o", count => 1, next => { :k(Letter.new( letter => "k", count => 1, next => {}))}))}))})), :c(Letter.new( letter => "c", count => 3, next => { :a(Letter.new( letter => "a", count => 3, next => { :d(Letter.new( letter => "d", count => 2, next => { :e(Letter.new( letter => "e", count => 1, next => { :a(Letter.new( letter => "a", count => 1, next => { :u(Letter.new( letter => "u", count => 1, next => {}))}))})), :m(Letter.new( letter => "m", count => 1, next => { :i(Letter.new( letter => "i", count => 1, next => { :u(Letter.new( letter => "u", count => 1, next => { :m(Letter.new( letter => "m", count => 1, next => {}))}))}))}))})), :r(Letter.new( letter => "r", count => 1, next => { :p(Letter.new( letter => "p", count => 1, next => { :e(Letter.new( letter => "e", count => 1, next => { :t(Letter.new( letter => "t", count => 1, next => {}))}))}))}))}))}))} ```

Note that I have ignored the input format specified in the challenge, as the one I have used is easier both to type on the command line and to parse.

You can add the following code to the program now, if you want to see that we actually can get the data. It is not present in the file:

```if \$verbose { say %top<a>.count; say %top<a>.next<l>.count; say %top<a>.next<l>.next<p>.count; say %top<a>.next<l>.next<p>.next<h>.count; say %top<a>.next<l>.next<p>.next<i>.count; } ```

It will fail if you specify a different set of words that don't match, so do remove the code from the program afterwards.

### The Lookup

File: shortest-unique-prefix (the rest) ```print "[ "; # [17] print '"' ~ look-up(\$_) ~ '" ' for @words; # [18] say "]"; # [17] sub look-up(\$word) # [19] { my @letters = \$word.comb; # [20] my \$letter = @letters.shift; # [21] my \$shortest = \$letter; # [22] my \$pointer = %top{\$letter}; # [23] for @letters -> \$letter # [24] { last if \$pointer.count == 1; # [25] \$pointer = \$pointer.next{\$letter}; # [26] \$shortest ~= \$letter; # [27] } return \$shortest; # [28] } ```

[17] The challenge wanted brackets around the words, so here they are.

[18] Print the result.

[19] This one does the job of looking them up, one at a time.

[20] Get the individual letters.

[21] Get the first letter.

[22] This string holds the letters so far (the path).

[23] This variable now points to the first letter.

[24] For each of the following letters,

[25] • we are done if the counter is 1, meaning that the word we are looking up is the only one to have reached the current node.

[26] • advance the pointer to the next letter.

[27] • add the letter to the string (see [22]).

[28] We have a shortest match, return it.

Running it:

```\$ raku shortest-unique-prefix [ "alph" "b" "car" "cadm" "cade" "alpi" ] \$ raku shortest-unique-prefix "this is not a very informative text alas" [ "th" "is" "n" "a" "v" "in" "te" "al" ] ```

Note the word «a» that is reported correctly, as we have reached the last letter in the word. Even if the count is not 1 at that point in the tree.

But what happens if we repeat a word?

```\$ raku shortest-unique-prefix "the end is the best" [ "the" "e" "i" "the" "b" ] ```

Oops. The algorithm doesn't catch this, and gives the whole word. This is easy to fix, though:

File: shortest-unique-prefix-fixed (changes only) ```for @words.unique -> \$word # [1] ```

[1] Remove duplicate values from the list.

See docs.raku.org/routine/unique for more information about `unique`.

Running it:

```\$ raku shortest-unique-prefix-fixed "the end is the best" [ "t" "e" "i" "t" "b" ] ```

And that's it.