United States of Anagrams
Multigrams

by Arne Sommer

United States of Anagrams with Raku - Part 3: Multigrams

[200.3] Published 11. September 2022.

[ Index | Permutations | Canonical | Multigrams | The Program | On My Wig ]

Multi Word Anagrams

We have already dealt with multiple words in the source of the anagram (e.g. translated «West Virginia» to «westvirginia»). But what about multiple words in the result?

Let us illustrate the difficulty, starting with a three letter word (e.g. «ABC»). Each of the boxes represents a set of characters (or a potential word on canonical form, i.e. sorted). The rows (after the arrows) represent the possibilites. The number of combinations (5) is not too bad.

Then let us take a look at a four letter word (e.g. «ABCD»). Now we get 15 possible combinations.

The number of combinations will increase pretty fast. How fast is not something that I care to compute. Suffice to say that it does not scale well when we increase the length of the word (the subject).

We are going to try to solve this with recursion.

A First Test

Let us start with a program taking a word, and giving us back all the possible two word combinations (including the original word, thus actually a one word combination) - all on canonical form.

File: pick-partial
#! /usr/bin/env raku

unit sub MAIN ($string, :v(:$verbose));

pick-partial($string.comb.sort.join).map({ "@_[0];@_[1]".say });  # [1]

sub pick-partial ($string)                                        # [2]
{
  my %seen;                                                       # [3]

  return gather                                                   # [4]
  {
    for 1 .. $string.chars -> $c                                  # [5]
    {
      _pick_partial("", $string, $c);                             # [6]
    }
  }

  sub _pick_partial ($done, $todo, $length)                       # [7]
  {
    if $length == 0                                               # [8]
    {
      take ($done, $todo) unless %seen{$done};                    # [9]
      %seen{$done} = True;                                        # [10]
      return;                                                     # [11]
    }

    my @todo = $todo.comb;                                        # [12]

    for ^@todo.elems -> $index                                    # [13]
    {
      my @copy = @todo.clone;                                     # [14]
      my $add = @copy.splice($index, 1);                          # [15]
      my $new = $done ~ $add;                                     # [16]
      return if $new.comb.sort.join ne $new;                      # [17]
      _pick_partial($new, @copy.join, $length -1);                # [18]
    }
  }
}

[1] Get all the two word combinations, and print each one on a separate line with a semicolon between the words (using map).

[2] The procedure doing the work.

[3] A list of already seen words.

[4] I have chosen to use gather/take to collect (return) the values.

See my Raku Gather, I Take article or docs.raku.org/syntax/gather take for more information about gather/take.

[5] Iterate over the length of the input string; thus giving the number of characters to place in the first word.

[6] Calling a recursive helper procedure that does the actual work. The first argument is the already computed word (initially ""). The second is the leftover letters (i.e. unused in the first word) The third is the number of characters to use when setting up the next words.

[7] The recursive helper function.

[8] No more characters to select?

[9] If so, return (with take) the words, unless seen already. This kicks in if we have repeating characters - as e.g. «AA» is the same as «AA».

[10] Enure that we do not get more instances of this sentence later on.

[11] We are done with this recursion (as there are no more letters left).

[12] The remaining letters, as a list.

[13] Iterate over the indices.

[14] Get a copy of the list from [12], so that we do not clobber up the original (with splice in [15]).

[15] Remove one character from the array (and remember that we iterate over all of them), with splice.

See docs.raku.org/routine/splice for more information about splice.

[16] Add the new letter to the word.

[17] Return from this recursion if the word is on non-canonical form (i.e. «BA» instead of «AB») as they are identical.

[18] Then we use recursion to get the remaining letters.

Running it:

$ ./pick-partial abc
a;bc
b;ac
c;ab
ab;c
ac;b
abc;

$ ./pick-partial cba
a;bc
b;ac
c;ab
ab;c
ac;b
abc;

$ ./pick-partial abcd
a;bcd
b;acd
c;abd
d;abc
ab;cd
ac;bd
ad;bc
abc;d
abd;c
abcd;

$ ./pick-partial aabb
a;abb
b;aab
aa;bb
ab;ab
aab;b
aabb;

$ ./pick-partial aaaa
a;aaa
aa;aa
aaa;a
aaaa;

How can we handle the rest of the daunting challenge? The trick is recursion. Well, even more recursion. Let us consider the two word solution for «ABCD»:

The idea is to regard the second word (shown in gray) as a remainder instead of a second word. Then we can do a recursive call on those remainders - with the first word (in yellow) at the front of a list of resulting words. When we run out of recursive parts to add, we have a list of words that form a (potential) sentence.

We can show this by filling in the missing part (or a part of it) of the illustration above:

Instead of returning the gray boxes (in the first illustration), the program recursively replaces them with the result shown in the second illustration. The green «CD» will also be replaced, but this will not lead to duplicates as they start out with different permutations: (AB CD vs A B CD).

File: pick-even-more-partial
#! /usr/bin/env raku

unit sub MAIN ($string, :v(:$verbose));

pick-partial($string.comb.sort.join)>>.say;

sub pick-partial ($string)
{
  my %seen;

  return gather
  {
    for 1 .. $string.chars -> $c
    {
      _pick_partial( (), "", $string, $c);                  # [1]
    }
  }

  sub _pick_partial (@done, $done is copy, $todo, $length)  # [1]
  {
    if $length == 0
    {
      if $todo eq ""
      {
        my @new = @done.clone;
	@new.push: $done;

        my $new = @new.join(" ");
	unless %seen{$new}
	{
	  take $new if $new eq @new.sort.join(" ");
	  %seen{$new} = True;
	}
      }
      else
      {
        my @done2 = @done.clone;
	@done2.push: $done;
	for 1 .. $todo.chars -> $c
	{
          _pick_partial(@done2, "", $todo, $c);             # [1]
        }
      }
    }

    my @todo = $todo.comb;

    for ^@todo.elems -> $index
    {
      my @copy = @todo.clone;
      my $add = @copy.splice($index, 1);
      my $new = $done ~ $add;
      return if $new.comb.sort.join ne $new;
      _pick_partial(@done, $new, @copy.join, $length -1);   # [1]
    }
  }
}

[1] The extra argument @done is the list of words we have picked so far, and $done is the current one.

The rest of the program should be straight forward.

Running it:

$ ./pick-even-more-partial abc
a b c
a bc
ab c
ac b
abc

$ ./pick-even-more-partial abcd
a b c d
a b cd
a bc d
a bd c
a bcd
ab c d
ab cd
ac b d
ac bd
ad b c
ad bc
abc d
abd c
abcd

$ ./pick-even-more-partial aaa
a a a
a aa
aaa

Note that all the word candidates are on canonical form, so it is easy to check them against the (also canonical) dictionary.

We can (and indeed should) do the dictionary lookup for each word that we add to @done, so that we can abort computing the rest of the words if we have a non-word up front.

We'll do just that, in the next part.

[ Index | Permutations | Canonical | Multigrams | The Program | On My Wig ]