[ Index | Permutations | Canonical | Multigrams | The Program | On My Wig ]
This first part is basically based on Rakugrams, my response to the Perl Weekly Challenge #5 from April 2019.
A
A's
AMD
AMD's
AOL
AOL's
AWS
AWS's
Aachen
Aachen's
The procedure reading the dictionary looks like this:
sub get-dictionary ($file where $file.IO.r)
{
return $file.IO.lines.grep(* !~~ /\W/)>>.lc.Set; # [1]
}
[1] We do not want any words ending with 's
, and
grep( * !~~ /\W/)
will remove any word with one or more non-word characters.
The remaining words are coerced to lowercase (with >>.lc
) for easier
comparison. Then we coerce the list of words into a Set
, which is a hash
like structure. A real hash would have required a value for each key (which typically
would be «1» in Perl or «True» in Raku).
See
docs.raku.org/type/Set
for more information about the Set
type, and
docs.raku.org/routine/Set
for more information about the Set
method.
We can use the 50 states of USA as the words (subjects), and this csv file saves us the job of typing them. Note that it includes Washington DC, even though it is not a state.
File: states.csv (first 10 lines only)
"State","Abbreviation"
"Alabama","AL"
"Alaska","AK"
"Arizona","AZ"
"Arkansas","AR"
"California","CA"
"Colorado","CO"
"Connecticut","CT"
"Delaware","DE"
"District of Columbia","DC"
Here is a procedure reading this file, returning a list of states (without the quotes). Note that the header line (the first line in the state file) will result in «State» in the list of states. We'll get rid of that later.
sub get-states ($file where $file.IO.r)
{
return $file.IO.lines.map({ /\" (.*!) \"\,/ && $0.Str });
}
permutations
method:
> say "abc".comb.permutations>>.join.join(", ");
abc, acb, bac, bca, cab, cba
> say "abcd".comb.permutations>>.join.join(", ");
abcd, abdc, acbd, acdb, adbc, adcb, bacd, badc, bcad, bcda, bdac, bdca, cabd,\
cadb, cbad, cbda, cdab, cdba, dabc, dacb, dbac, dbca, dcab, dcba
A lot of gibberish, so using a dictionary as filter is essential.
See
docs.raku.org/routine/permutations for more information about permutations
.
(«West Virginia» can give «west virgin i a» - which are all legal words, but they do not make sense as a sentence, regardless of how we reorder the words. So this is in fact not a legal anagram.)
This is brute force, but it works. The number of letters in the subject is an issue, as the number of permutations increases exponentially:
> say "abc".comb.permutations>>.join.elems; # -> 6
> say "abcd".comb.permutations>>.join.elems; # -> 24
> say "abcde".comb.permutations>>.join.elems; # -> 120
> say "abcdef".comb.permutations>>.join.elems; # -> 720
We'll get back to this issue when we run the program. Hope for the best, expect the worst.
(Note that the numbers above are the faculty of the number of characters. Raku does not have a
faculty operator, but it is easy to define one. On the other hand, you can type it as this
[*] 1..6
(in the case of length 6) - instead of actually generating all the
permutations as I did above.)
Note that the number of unique combinations will be lower if we have repeating characters. E.g.:
> say "AAcdef".comb.permutations>>.join.unique.elems; # -> 360
> say "AACCef".comb.permutations>>.join.unique.elems; # -> 180
#! /usr/bin/env raku
unit sub MAIN
(
:c(:$csv) where $csv.IO.r = 'states.csv',
:d(:$dictionary) where $dictionary.IO.r = '/usr/share/dict/american-english',
:v(:$verbose)
);
my $dict = get-dictionary($dictionary); # [1]
my @states = get-states($csv)[1..Inf]; # [2]
say ":States: { @states.join(", ")} " if $verbose;
for @states -> $state
{
my $compact = $state.lc.split(" ").join; # [3]
my @anagrams;
say ":State: $state ($compact)" if $verbose;
for $compact.comb.permutations>>.join.unique -> $candidate # [4]
{
next if $candidate eq $compact; # [5]
@anagrams.push: $candidate if $dict{$candidate}; # [6]
}
say "$state ($compact): { @anagrams.elems } { @anagrams.elems ?? \
"({ @anagrams.join(", ") })" !! "" }"; # [7]
}
sub get-states ($file where $file.IO.r)
{
return $file.IO.lines.map({ /\" (.*!) \"\,/ && $0.Str });
}
sub get-dictionary ($file where $file.IO.r)
{
return $file.IO.lines.grep(* !~~ /\W/)>>.lc.Set;
}
[1] Load the dictionary.
[2] Load the list of states. Note the starting index of «1», so that we skip the «State» from the header line.
[3] Coece the word(s) to lower case, and remove spaces. (There are more efficient ways of removing spaces, but this is probably efficient enough.)
[4] We use .unique
to get rid of duplicate candidates, which we will
get if we have duplicate letters in the input word.
[5] Skip the anagram if it is the word itself.
[6] A legal word (in the dictionary)? if so, add it to the list of anagrams.
[7] Print the state name, and the list of anagrams - if any.
Running it:
$ ./usa-permutations-unique
- Alabama (alabama)
- Alaska (alaska)
- Arizona (arizona)
- Arkansas (arkansas)
- California (california)
- Colorado (colorado)
^C
«California» took some time, and I killed the program when it got to «Connecticut» (which has one more letter than «California»; 11 vs 10). This means that «Connecticut» has 11 times more permututations for the program to go through (if we disregard repeating letters). Even if we had the patience to wait for «Connecticut», we would run out of it when we come to «District of Columbia» (at 18 characters when we remove the spaces).
The program clearly needs a cut off point, so that we can get some of the states at least. Let us make the program a little bit faster as well, while we are at it:
File: usa-permutations-squish
#! /usr/bin/env raku
unit sub MAIN
(
:c(:$csv) where $csv.IO.r = 'states.csv',
:d(:$dictionary) where $dictionary.IO.r = '/usr/share/dict/american-english',
:v(:$verbose),
:l(:$limit) = 10
);
my $dict = get-dictionary($dictionary);
my @states = get-states($csv)[1..Inf];
say ":States: { @states.join(", ")} " if $verbose;
for @states -> $state
{
my $compact = $state.lc.split(" ").join;
my @anagrams;
if $state.chars > $limit
{
say "? $state ($compact) skipped";
next;
}
for $compact.comb.sort.permutations>>.join.squish -> $candidate # [1]
{
next if $candidate eq $compact;
@anagrams.push: $candidate if $dict{$candidate};
}
@anagrams = @anagrams.unique;
say @anagrams.elems
?? "+ $state ($compact): { @anagrams.join(", ") }"
!! "- $state ($compact)";
}
sub get-states ($file where $file.IO.r)
{
return $file.IO.lines.map({ /\" (.*!) \"\,/ && $0.Str });
}
sub get-dictionary ($file where $file.IO.r)
{
return $file.IO.lines.grep(* !~~ /\W/)>>.lc.Set;
}
[1] The problem with unique
, as used in the original program, is
that it has to store the values as they occur, so that duplicates can be removed. The list can
grow pretty large. It does not affect the status as a Sequence, as shown here:
> say (1...Inf).unique.head(10); # -> (1 2 3 4 5 6 7 8 9 10)
If we sort the input, the output will also be sorted - and any duplicates will
come sequentially. Then we can use squish
instead to get rid of them.
See
docs.raku.org/routine/unique
for more information about unique
.
See
docs.raku.org/routine/squish
for more information about squish
.
Running it:
$ ./usa-permutations-squish
- Alabama (alabama)
- Alaska (alaska)
- Arizona (arizona)
- Arkansas (arkansas)
- California (california)
- Colorado (colorado)
? Connecticut (connecticut) skipped
- Delaware (delaware)
? District of Columbia (districtofcolumbia) skipped
- Florida (florida)
- Georgia (georgia)
- Hawaii (hawaii)
- Idaho (idaho)
- Illinois (illinois)
- Indiana (indiana)
- Iowa (iowa)
- Kansas (kansas)
- Kentucky (kentucky)
- Louisiana (louisiana)
+ Maine (maine): anime
- Montana (montana)
- Nebraska (nebraska)
- Nevada (nevada)
? New Hampshire (newhampshire) skipped
- New Jersey (newjersey)
- New Mexico (newmexico)
- New York (newyork)
? North Carolina (northcarolina) skipped
? North Dakota (northdakota) skipped
- Ohio (ohio)
- Oklahoma (oklahoma)
- Oregon (oregon)
- Maryland (maryland)
? Massachusetts (massachusetts) skipped
- Michigan (michigan)
+ Minnesota (minnesota): nominates
? Mississippi (mississippi) skipped
- Missouri (missouri)
? Pennsylvania (pennsylvania) skipped
? Rhode Island (rhodeisland) skipped
? South Carolina (southcarolina) skipped
? South Dakota (southdakota) skipped
- Tennessee (tennessee)
+ Texas (texas): taxes
- Utah (utah)
- Vermont (vermont)
- Virginia (virginia)
- Washington (washington)
? West Virginia (westvirginia) skipped
- Wisconsin (wisconsin)
- Wyoming (wyoming)
We got 3 anagrams, highlighted manully with white text. (Maine => anime, Minnesota => nominates, Texas => taxes). The program skipped 12 states. Those states may - or may not - result in an anagram, if we choose to have a go at them (and have the patience to wait for the result). But for now, we simply do not know now.
Permutations are way too time consuming when we have many letters in the subject of the anagram. But we can do it much more efficiently, with a radically different approach... And we will do just that in the next part.
[ Index | Permutations | Canonical | Multigrams | The Program | On My Wig ]