This article has been moved from «perl6.eu» and updated to reflect the language rename in 2019.
This is my response to the Perl Weekly Challenge #14.
Write a script to generate Van Eck’s sequence starts with 0. For more information, please check out wikipedia page. This challenge was proposed by team member Andrezgz. |
The first value in the Van Eck sequence is zero. Each subsequent value is calculated like this:
A walk through of the first ten values may make it easier to understand:
Sequence | Search for | Found | The next value |
0 | 0 | No | 0 |
0, 0 | 0 | Yes | 1 |
0, 0, 1 | 1 | No | 0 |
0, 0, 1, 0 | 0 | Yes | 2 |
0, 0, 1, 0, 2 | 2 | No | 0 |
0, 0, 1, 0, 2, 0 | 0 | Yes | 2 |
0, 0, 1, 0, 2, 0, 2 | 2 | Yes | 2 |
0, 0, 1, 0, 2, 0, 2, 2 | 2 | Yes | 1 |
0, 0, 1, 0, 2, 0, 2, 2, 1 | 1 | Yes | 6 |
0, 0, 1, 0, 2, 0, 2, 2, 1, 6 |
I have chosen to delegate the actual calculation to a helper procedure «van-help»:
File: van-eck-list
unit sub MAIN (Int $limit = 10); # [1]
my @van-eck = (0); # [2]
@van-eck.push(van-help(@van-eck)) for ^($limit -1); # [3]
say @van-eck.join(", "); # [4]
sub van-help(@array is copy) # [5]
{
my $last = @array.pop; # [6]
if any(@array) == $last # [7]
{
for @array.reverse # [8]
{
state $delta++; # [9]
return $delta if $_ == $last; # [10]
}
}
return 0; # [11]
}
[1] The sequence is infinite, so we have to stop at an explicit location. This will give us the first 10 numbers by default.
[2] The first number in the sequence is zero.
[3] Add the next number, one at a time, nine times by default. Combined with the initial one (from [2]), we get ten.
[4] Print them.
[5] This helper function takes a partial Van Eck sequence as input, and returns the next number. The input is set up with «is copy» so that we can change it (in [6]).
[6] Remove the last element from the sequence.
[7] If any of the remaining elements are equal to the one we removed,
[8] Loop through the values, from the end.
[9] • Keep a count of how many iterations in the loop.
[10] • Return the distance between the two values when we find the first match.
[11] If the test in [7] fails, we return 0 (meaning that it is the first time the number occurs in the sequence).
A variable declared with
state
gives us a state variable. The assignment in the declaration
(the initialization) is only done once, so we get a counter. See
docs.raku.org/syntax/state for more
information about «state».
Running it:
$ raku van-eck-list
0, 0, 1, 0, 2, 0, 2, 2, 1, 6
$ raku van-eck-list 20
0, 0, 1, 0, 2, 0, 2, 2, 1, 6, 0, 5, 0, 2, 6, 5, 4, 0, 5, 3
And that's it, really. As a list, as shown in the Wikipedia article the challenge refers to.
But this list form is extremely unhelpful if we dig a little deeper, and discover the https://oeis.org/A181391/b181391.txt list of the 100.000 first elements in the list - and we want to check that we have gotten e.g. the 100th value right.
Adding the indices is straightforward(ish):
File: van-eck-verbose (changes only)
unit sub MAIN (Int $limit = 10, :$verbose = False);
my @van-eck = (0);
@van-eck.push(van-help(@van-eck)) for ^($limit -1);
if $verbose
{
my $count = 1;
say "{ $count++ }: $_" for @van-eck;
}
else
{
say @van-eck.join(", ");
}
The program behaves as before, unless we use the «--verbose» flag:
$ van-eck-verbose --verbose 20
1: 0
2: 0
3: 1
4: 0
5: 2
6: 0
7: 2
8: 2
9: 1
10: 6
11: 0
12: 5
13: 0
14: 2
15: 6
16: 5
17: 4
18: 0
19: 5
20: 3
We should let the program download the sequence file, and compare the values with the ones from our sequence generator one by one.
But before doing that, let us see how the program handles long sequences. I'm not optimistic (and will show why in the next section).
$ van-eck-verbose --verbose 10000
$ van-eck-verbose --verbose 100000
The first one takes 9 minutes, and the second one takes 13 hours (and some minutes). I have shown the commands only, not the output.
There are several reasons:
The solution to the first one is to inline the helper function, and the second is fixed by storing the last position of the values we have in the sequence in a hash (lookup by the number, and the value is the position in the array). Then it is simply a hash lookup, and not a search. Hashes are optimized for lookup, so are extremely fast.
File: van-eck
unit sub MAIN (Int $limit where 2 <= $limit <= 100000 = 10, # [1]
:$verbose = False, :$test = False); # [1]
my @van-eck = (0); # [2]
my %seen; # [3]
for ^($limit -1) -> $pos # [4]
{
%seen{@van-eck[*-1]}.defined # [5]
?? @van-eck.push: $pos - %seen{@van-eck[*-1]} # [5a]
!! @van-eck.push: 0; # [5b]
%seen{@van-eck[*-2]} = $pos; # [6]
}
if $test # [7]
{
use LWP::Simple; # [8]
my $ok = 0;
my $error = 0;
my $index = 1;
for LWP::Simple.get('https://oeis.org/A181391/b181391.txt').lines -> $line
{ # [8]
last if $index > $limit; # [9]
my $my = @van-eck[$index -1]; # [10]
my $def = $line.words[1]; # [10a]
if $my == $def # [10b]
{
say "$index: $my == $def (OK)";
$ok++;
}
else
{
say "$index: $my != $def (Error)";
$error++;
}
$index++;
}
say "\nOK: $ok"; # [11]
say "Error: $error"; # [11]
}
elsif $verbose # [7]
{
my $count = 1;
say "{ $count++ }: $_" for @van-eck;
}
else # [7]
{
say @van-eck.join(", ");
}
[1] The sequence file has 100.000 entries, so I have set that up as the limit. Invoke the test with the new «--test» argument.
[2] The first value in the sequence.
[3] This hash keeps the last position of every value we have in the sequence as we generate values one at a time.
[4] If the limit is 10, we get the values 0 .. 8 in this loop.
[5] If we have seen the value in question,
[5a] • use the distance to the current position as the new value,
[5b] • and if not, use 0.
[6] Update the position of the last value. It must not be used in the distance calculation in step [5a], so we do it here. Post-factum.
[7] I have used «if/elsif/else» instead of multiple dispatch this time.
[8] I use the «LWP::Simple» module to fetch the seqence. (On my Ubuntu/Linux machine this failed because of a missing system library; installing the «libssl-dev» package resolved the problem.) Iterate through the values, one at a time.
[9] Stop when we have reached the specified limit.
[10] Get the value from my sequence, and compare it with the one we downloaded [10a, 10b]. Count the correct values and errors, and report them as we go along.
[11] Print the result.
Running it:
$ raku van-eck-test --test
1: 0 == 0 (OK)
2: 0 == 0 (OK)
3: 1 == 1 (OK)
4: 0 == 0 (OK)
5: 2 == 2 (OK)
6: 0 == 0 (OK)
7: 2 == 2 (OK)
8: 2 == 2 (OK)
9: 1 == 1 (OK)
10: 6 == 6 (OK)
OK: 10
Error: 0
Running it, and testing the full set of 100000 values (without showing the values):
$ raku van-eck --test 100000
...
OK: 100000
Error: 0
So I think we can conclude that my algorithm generates the correct values. (Both of them, actually, But disregard the first one.) The program took about 5 seconds to run on my machine.
Running the minimalistic program is of course faster (as it doesn't download the file from Internet, and have less output). This takes about 1.5 seconds:
$ raku van-eck 100000
Lesson Learned: A poorly designed algorithm can cause incredible inefficient code.
Using only the official postal (2-letter) abbreviations for the 50 U.S. states, write a
script to find the longest English word you can spell? Here is the list of U.S. states
abbreviations as per wikipedia
page. This
challenge was proposed by team member Neil Bowers.
|
The first task is obtaining the states, as a data structure (a hash). This page https://www.perlmonks.org/?node_id=60745 gives it in Perl.
I have modified it for Raku:
File: us-states:
my %states =
"AK Alaska LA Louisiana OH Ohio
AL Alabama MA Massachusetts OK Oklahoma
AR Arkansas MD Maryland OR Oregon
AZ Arizona ME Maine PA Pennsylvania
CA California MI Michigan RI Rhode Island
CO Colorado MN Minnesota SC South Carolina
CT Connecticut MO Missouri SD South Dakota
DE Delaware MS Mississippi TN Tennessee
FL Florida MT Montana TX Texas
GA Georgia NC North Carolina UT Utah
HI Hawaii ND North Dakota VA Virginia
IA Iowa NE Nebraska VT Vermont
ID Idaho NH New Hampshire WA Washington
IL Illinois NJ New Jersey WI Wisconsin
IN Indiana NM New Mexico WV West Virginia
KS Kansas NV Nevada WY Wyoming
KY Kentucky NY New York".split(/\s\s+/).hash;
say %states.keys.sort;
We split on two or more spaces, so that the single space inside a two word state name is ignored by «split».
Running it (with a newline inserted to make it fit the article width):
$ raku us-states
(AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS \
MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY)
Then we need a dictionary, and we did just that back in Challenge #5.1 (See my Aruk Masgnara (or «Rakugrams») article).
The «dictionary-lookup2» program is almost what we want (look it up in the article if you want to know more), and it is easy to modify it to fit the new requirements.
This time we don't need a hash (or Set, actually). The challenge wanted the longest word, so we sort the dictionary by word size:
File: dictionary-list-by-length
my $dictionary = "/usr/share/dict/british-english"; # [1]
.say for get-dictionary($dictionary); # [2]
sub get-dictionary ($file where $file.IO.r) # [3]
{
return $file.IO.lines.grep(* !~~ /\W/)>>.uc.sort({ $^b.chars cmp $^a.chars });
} # 4 ########## # 5 ########### # 6 # # 7 ############################
[1] The dictionary file. Change the file name if you want to use another file.
[2] Load the dictionary (in [3]) and print it, one word at a time.
[3] Note the «where» clause to get a better error message if the file doesn't exist, or isn't readable..
[4] Read the file, one line at a time.
[5] • only use lines with nothing but letters.
[6] • turn the words to uppercase, as the state codes are in uppercase.
[7] • sort them, by the length. We have swapped the placeholder variables «$^a» and «$^b» around so that the longest word comes first.
Running it (and showing the first 10 lines only):
$ raku dictionary-list-by-length
COUNTERREVOLUTIONARIES
ELECTROENCEPHALOGRAPHS
ELECTROENCEPHALOGRAMS
ELECTROENCEPHALOGRAPH
ANDRIANAMPOINIMERINA
COUNTERREVOLUTIONARY
ELECTROENCEPHALOGRAM
UNCHARACTERISTICALLY
CHLOROFLUOROCARBONS
COUNTERINTELLIGENCE
...
Now we can do the full program:
File: state-words
unit sub MAIN ($dictionary where $dictionary.IO.e && $dictionary.IO.r # [1]
= "/usr/share/dict/british-english", :$all);
my %states =
"AK Alaska LA Louisiana OH Ohio
AL Alabama MA Massachusetts OK Oklahoma
AR Arkansas MD Maryland OR Oregon
AZ Arizona ME Maine PA Pennsylvania
CA California MI Michigan RI Rhode Island
CO Colorado MN Minnesota SC South Carolina
CT Connecticut MO Missouri SD South Dakota
DE Delaware MS Mississippi TN Tennessee
FL Florida MT Montana TX Texas
GA Georgia NC North Carolina UT Utah
HI Hawaii ND North Dakota VA Virginia
IA Iowa NE Nebraska VT Vermont
ID Idaho NH New Hampshire WA Washington
IL Illinois NJ New Jersey WI Wisconsin
IN Indiana NM New Mexico WV West Virginia
KS Kansas NV Nevada WY Wyoming
KY Kentucky NY New York".split(/\s\s+/).hash;
sub get-dictionary ($file where $file.IO.r) # [2]
{
return $file.IO.lines>>.uc # [2a]
.grep({ .chars %% 2 }) # [2b]
.grep(* !~~ /\W/) # [2c]
.sort({ $^b.chars cmp $^a.chars }); # [2d]
}
my $found = 0;
for get-dictionary($dictionary) -> $word # [3]
{
last if $word.chars < $found; # [3a]
check-word($word); # [4]
last if $found && !$all; # [5]
}
say "\nWord length: $found."; # [6]
sub check-word ($word) # [4]
{
my @parts = $word.comb(2); # [7]
return unless %states{$_} for @parts; # [8]
say "{ @parts.map({ %states{$_} }).join(" + ") } = $word"; # [9]
$found = $word.chars; # [10]
}
[1] I use an english dictionatty file, but specify another file if needed. Use the «--all» flag to get all the matches (with the same length), and not hust the first one.
[2] Read the dictionary file, force the words to uppercase [2a], remove words with an odd number of characters [2b] (as the state codes are two characters long, and any combination of them has an even length), remove any word with a non character on them [2c], and finally sort them by size, with the longest first [2d].
[3] Iterate through the words in the dictionary (starting with the longest word), and abort if the current word length is smaller than the length of the first match [3a].
[4] «check-word» does the job (and sets «$found»).
[5] Abort if we have shown the first match, and the user hasn't asked for all («--all»).
[6] Report the length of the word(s) as well.
[7] Split the words in strings of two and two characters with
comb(2)
. We have already ensured that the word length is even, so
all the parts will have the correct length.
[8] Iterate through the partial strings of the word, and return if the current one doesn't match a state identifer.
[9] If they all match, we have matching word. Print it.
[10] and record the word length.
See docs.raku.org/routine/comb for more information about «comb».
Running it:
$ raku state-words
Colorado + North Carolina + Oregon + Delaware = CONCORDE
Word length: 8.
$ raku state-words --all
Colorado + North Carolina + Oregon + Delaware = CONCORDE
Georgia + New York + Maine + Delaware = GANYMEDE
Massachusetts + North Dakota + Arkansas + Indiana = MANDARIN
California + Louisiana + Michigan + Nebraska = CALAMINE
Massachusetts + Indiana + Louisiana + North Dakota = MAINLAND
Massachusetts + Louisiana + Rhode Island + Alabama = MALARIAL
Massachusetts + North Dakota + Arkansas + Indiana = MANDARIN
Maine + Missouri + Rhode Island + Alabama = MEMORIAL
Missouri + Oregon + Louisiana + North Dakota = MOORLAND
Word length: 8.
The duplicate hits for «MANDARIN» is caused by two entries in the dictionary: «Mandarin» and «mandarin».
The easiest way of removing duplicate values from a list is
with the unique
method. The change is trivial:
sub get-dictionary ($file where $file.IO.r)
{
return $file.IO.lines>>.uc.grep({ .chars %% 2 })
.grep(* !~~ /\W/).unique
.sort({ $^b.chars cmp $^a.chars });
}
See docs.raku.org/routine/unique for more information about «unique».
The dictionary I use doesn't have the word «Calamari» (which is shown exactly like this in the challenge, inconsistent with the other words which are in upper case letters only).
The actual result(s) depend on the dictionary file.
$ raku state-words --all "/usr/share/dict/american-english"
It gives exactly the same result as the English one.
The French dictionary;
$ raku state-words -all /usr/share/dict/french
Arkansas + Missouri + Rhode Island + California + Indiana = ARMORICAIN
Massachusetts + North Dakota + Arkansas + Indiana + Alabama = MANDARINAL
Word length: 10.
Longer words than in English.
The German dictionary:
$ raku state-words --all "/usr/share/dict/ngerman"
Colorado + North Carolina + Oregon + Delaware = CONCORDE
Massachusetts + North Dakota + Arkansas + Indiana = MANDARIN
Massachusetts + South Carolina + Hawaii + Nebraska = MASCHINE
Virginia + North Dakota + Alabama + Indiana = VANDALIN
Word length: 8.
The Italian dictionary:
$ raku state-words -all /usr/share/dict/italian
Rhode Island + Colorado + Michigan + North Carolina + Iowa + Virginia \
+ Missouri = RICOMINCIAVAMO
Rhode Island + Colorado + North Carolina + Illinois + Iowa + Virginia \
+ Missouri = RICONCILIAVAMO
Word length: 14.
14 characters is quite impressive. It would have been even more impressive if I had recognized them as words...