United States of Anagrams
The Program

by Arne Sommer

United States of Anagrams with Raku - Part 4: The Program

[200.4] Published 11. September 2022.

[ Index | Permutations | Canonical | Multigrams | The Program | On My Wig ]

I had a hunch (ouch) that this program would be extremely slow on long words (and yes - it sure is), so chose to skip the automatic lookup of US states this time. Specify the word(s) manually on the command line, lean back and wait for the result. The upside is that you can specify anything, not just a US state...

The Program

File: multigrams
#! /usr/bin/env raku

unit sub MAIN
(
  $string,
  :d(:$dictionary) where $dictionary.IO.r = 'dict-US.txt',
  :v(:$verbose)
);

my %anagram;

for $dictionary.IO.lines -> $word
{
  %anagram{$word.lc.comb.sort.join}.push: $word;
}

my $length = $string.chars;

my @sentences = pick-all($string);

for @sentences -> $sentence
{
  say $sentence.words.map({ %anagram{$_}.join("|") }).join(" ");
}

sub pick-all ($string)
{
  my $canonical = $string.lc.comb.grep( * ~~ /<[a .. z]>/ ).sort.join;

  say ":Canonical: $string -> $canonical" if $verbose;

  my %seen;

  return gather
  {
    for 1 .. $canonical.chars -> $c
    {
      _pick_partial( (), "", $canonical, $c);
    }
  }

  sub _pick_partial (@done, $done is copy, $todo, $length)
  {
    if $length == 0
    {
      if $todo eq ""
      {
        if %anagram{$done}
	{
          my @new = @done.clone;
	  @new.push: $done;

          my $new = @new.join(" ");
	  unless %seen{$new}
	  {
	    take $new if $new eq @new.sort.join(" ");
	    %seen{$new} = True;
	  }
        }
      }
      else
      {
        my @done2 = @done.clone;
	if %anagram{$done}
	{
	  @done2.push: $done;
	  for 1 .. $todo.chars -> $c
	  {
            _pick_partial(@done2, "", $todo, $c);
          }
	}
      }
    }

    my @todo = $todo.comb;

    for ^@todo.elems -> $index
    {
      my @copy = @todo.clone;
      my $add = @copy.splice($index, 1);
      my $new = $done ~ $add;
      next if $new.comb.sort.join ne $new; # AB -> BA

      _pick_partial(@done, $new, @copy.join, $length -1);
    }
  }
}

This program should not need any explanation, as long as you have read the prior parts of thsi article.

Let us run it, even though that is the topic of the next part.

I have chosen to show only some of the permutations (with a proposed sentence highlighted in white). The first one gave 61 suggestions, and the second 130:

$ ./multigrams "arne sommer"
man remorse          # -> remorse man
armor semen          # -> armor semen
earner|nearer moms   # -> nearer moms
manors|ransom mere   # -> mere ransom

$ ./multigrams "raku and perl"
a par|rap den|end lurk
a lard punker                # -> a punker lard
a lark pruned                # -> a pruned lark
a paler|pearl drunk          # -> a paler drunk
a plank ruder                # -> a ruder plank
a prank lured|ruled          # -> ruled a prank
and pare|pear|rape|reap lurk # -> lurk and rape
parked lunar                 # -> parked lunar

If the canonical form only gives one real word, the program prints that. If we get several (alternate) words, they are printed with a vertical bar between them.

The actual number of suggestions will depend on the dictionary and the words you have chosen to ignore (added to the «ignore list»).

Note that duplicate words in the output is ok:

$ ./multigrams toto
to to
toot

The Ketchup Principle

A problem with the program is that it will compute all the anagrams, before printing them (all at once). We really should fix this, as it is annoying when the waiting time is large.

I will do so in a future follow up article. Unless somebody else has a go at it first... (Hint: I tackled a similar issue some weekly challenges ago.)

I have not forgotten the US states. We'll look into them in the next and final part.

[ Index | Permutations | Canonical | Multigrams | The Program | On My Wig ]