Datefinder General, A Raku Wordgame

by Arne Sommer

Datefinder General, A Raku Wordgame

[46] Published 14. December 2019

This is my response to the Perl Weekly Challenge #38.

Challenge #38.1 - Date Finder

Create a script to accept a 7 digits number, where the first number can only be 1 or 2. The second and third digits can be anything 0-9. The fourth and fifth digits corresponds to the month i.e. 01,02,03…,11,12. And the last 2 digits respresents the days in the month i.e. 01,02,03….29,30,31. Your script should validate if the given number is valid as per the rule and then convert into human readable format date.

Rules

  1. If 1st digit is 1, then prepend 20 otherwise 19 to the 2nd and 3rd digits to make it 4-digits year.
  2. The 4th and 5th digits together should be a valid month.
  3. The 6th and 7th digits together should be a valid day for the above month.
For example, the given number is 2230120, it should print 1923-01-20.

Substrings

The parts of the date are substrings of the specified input string, so we can collect them with «substr»:

File: datefinder-substr
multi sub MAIN (Str $date where                                  # [1] [2]
  $date.chars == 7 &&                                            # [2a]
  $date.substr(0,1) eq '1'|'2' &&                                # [2b]
  try Date.new("{ $date.substr(0,1) eq '1' ?? '20' !! '19' }" ~  # [2c]
               "{ $date.substr(1,2) }-{ $date.substr(3,2) }-" ~
               "{ $date.substr(5,2) }").defined)
{
  say "{ $date.substr(0,1) == 1 ?? '20' !! '19' }{ $date.substr(1,2) }-" ~
      "{ $date.substr(3,2) }-{ $date.substr(5,2) }";             # [3]
}

multi sub MAIN (Str $invalid)                                    # [1a]
{
  say "Not a valid date.";
}

[1] The first «multi sub MAIN» kicks in if we have a valid date, and the second one (1a) kicks in otherwise. (The challenge doesn't say what to do when we have an illegal date, so I have chosen this output.)

[2] The «where» clauses check for: correct length (2a), that the first character is a 1 or 2 (with an or-junction) (2b) and that it is accepted by «Date.new» as a valid date (2c). «try» is there to prevent the program from raising an Exception if the date is invalid. The final «.defined» was added to make the code work, but I don't know why it is required.

[3] Print the (valid) date.

See docs.raku.org/routine/substr for more information about «substr».

Running it:

$ raku datefinder-substr 0230120
Not a valid date.

$ raku datefinder-substr 1230120
2023-01-20

$ raku datefinder-substr 2230120
1923-01-20

$ raku datefinder-substr 2240229
1924-02-29

$ raku datefinder-substr  2230229
Not a valid date.

The last one would have been «1923-02-20», which fails as the date doesn't exist.

The duplication of the all the «substr» clauses doesn't look good, but we can fix that with a procedure. I have moved the «where» clauses to a «subset» (a custom type) at the same time:

File: datefinder-subset
my $string;                           # [1]

subset PerlWeeklyDate of Str where
  $_.chars == 7 &&
  $_.substr(0,1) eq '1'|'2' &&
  ( $string = string2date($_) ) &&   # [2]
  try Date.new($string).defined;     # [3]

multi sub MAIN (PerlWeeklyDate $date)
{
  say $string;
}

multi sub MAIN (Str $invalid)
{
  say "Not a valid date.";
}

sub string2date (Str $string)
{
  return
    "{ $string.substr(0,1) eq '1' ?? '20' !! '19' }{ $string.substr(1,2) }-" ~
    "{ $string.substr(3,2) }-{ $string.substr(5,2) }";
}

[1] I have introduced a global variable,

[2] where the «subset» stores the resulting date string - which may be illegal),

[3] check if we have a valid date,

[4] and print the date string.

Grammar

Substrings certainly get the job done, but Raku has Grammars that are ideal for splitting strings up into smaller parts:

File: datefinder-grammar
unit sub MAIN (Str $date);

grammar PerlWeeklyDate                      # [1]
{
  token TOP       { <century> <year> <month> <day> }                      # [3]
  token century   { [ 1 | 2 ] }                                           # [3a]
  token year      { <digit> <digit> }                                     # [3b]
  token month     { 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 |09|10|11|12 }  # [3c]
  token day       { [ 0 <pos-digit> | 1 <digit> | 2 <digit> | 30 | 31 ] } # [3d]
  token digit     { [ 0 | <pos-digit> ] } 
  token pos-digit { [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ] } 
}

my $result = PerlWeeklyDate.parse($date);    # [2]

if $result                                   # [2]
{
  my $datestring =
    "{ $result<century> == 1 ?? '20' !! '19' }{ $result<year> }" ~
    ## # 3a ################################### # 3b ####

    "{ $result<month> }-{ $result<day> }";
    ## # 3c ############## # 3d ################

  if try Date.new($datestring)                # [4]
  {
    say $datestring;                          # [4a]
    exit;                                     # [4b]
  }
}

say "Not a valid date.";                      # [5]

[1] The Grammar is defined like this, with «TOP» as the straing point.

[2] Invoking the Grammar.

[3] The different parts are fetched by name.

[4] Using «Date.new» to validate the date string. We have to wrap it in «try» to prevent a failure from aborting the program. Note the «exit» so that the error message (5) is printed in both error cases (where the Grammar failed (2) and where «Date.new» (4) failed).

[5] Not a valid date, say so.

I have used explicit digits instead of «\d» as that character class will match all sorts of Unicode digits.

See docs.raku.org/language/grammars for more information about Grammars.

Regex

It is easy to replace the Grammar with a Regex:

File: datefinder-regex
unit sub MAIN (Str $date);

if $date ~~ /^
            $<century> = (<[01]>) 
            $<year>    = (<[0..9]><[0..9]>) 
	    $<month>   = (<[01]><[0..9]>)
	    $<day>     = (<[0123]><[0..9]>)
         $/
{
  my $datestring =
    = "{ $<century> == 1 ?? '20' !! '19' }{ $<year> }-{ $<month> }-{ $<day> }";
				  
  if try Date.new($datestring)
  {
    say $datestring;
    exit;
  }
}

say "Not a valid date.";

The named matches makes it easier to see what is going on (compraed with normal numbered match objects), and makes it similar looking to the Granmmar version. Also note the part for month (1c) that will match values from 01 to 19, and day that will match values from 01 to 39.

The Grammar version had better validation; month: 01 to 12 and day: 01 to 31. This problem doesn't really matter as both versions will validate 31st February. The «Date.new» call does the final validation for us.

The Regex version is shorter than the Grammar version, and is quite good as I used named matches. In this case I prefer the Regex version.

Testing

You may have noticed that I only tested the first program. Running several programs with all sorts of legal and illegal dates manually is cumbersome, and it is easy to miss one special case (and typically that one will be wrong). So of course I wrote an automated test framework.

File: datefinder-test
sub wrap ($program, $arg)                  # [7]
{
  my $proc = run $program, $arg, :out;     # [8]
  return $proc.out.slurp(:close).chomp;    # [9]
}

my @programs =                             # [2]
<
  ./datefinder-substr
  ./datefinder-subset
  ./datefinder-grammar
  ./datefinder-regex
>;

my %to-test =                              # [3]
(
  '0230120'  => "Not a valid date.",
  '1230120'  => "2023-01-20", 
  '1230120s' => "Not a valid date.",
  '2230120'  => "1923-01-20",
  '2240229'  => "1924-02-29", 
  '2230229'  => "Not a valid date.",
  '111111'   => "Not a valid date.",
  '111111s'  => "Not a valid date.",
);
  
use Test;                                   # [1]

for @programs -> $program                   # [4]
{
  for %to-test.keys -> $arg                 # [5]
  {
    is(wrap($program, $arg), %to-test{$arg}, "$program $arg %to-test{$arg}");
  }                                         # [6]
}

done-testing;

[1] Load the «Test» module.

[2] The programs to run. Note the leading «./» as they reside in the current directory.

[3] The tests, as arguments to the programs - and the expected answer. Note that the hash gives is the keys in random order, so the test will not run in the same order if you run the program a second time.

[4] Iterate over the programs,

[5] and the dates.

[6] Run and verify the test.

[7] The test suit has been written to call procedures (usually in modules) in the program, so we must apply this wrapper to execute external programs.

[8] «run» returns a «Proc» (for Process) object. Output that would normally go to the screen are redirected to the «Proc» object by specifying the «:out» argument.

[9] The we can read all the output (with «slurp») from the «$proc.out» filehandle. The «:close» arguments instructs the program to close the filehandle automatically when we have reached the end. And finally, we add a trailing «chomp» to get rid of the trailing newline in the outpout.

See docs.raku.org/routine/run for more information about «run».

See docs.raku.org/language/testing for information about writing and running test, and docs.raku.org/type/Test for more information about the «Test» module.

Running it:

$ raku datefinder-test
ok 1 - ./datefinder-substr 2230120 1923-01-20
ok 2 - ./datefinder-substr 1230120 2023-01-20
ok 3 - ./datefinder-substr 2240229 1924-02-29
ok 4 - ./datefinder-substr 111111 Not a valid date.
ok 5 - ./datefinder-substr 111111s Not a valid date.
ok 6 - ./datefinder-substr 1230120s Not a valid date.
ok 7 - ./datefinder-substr 2230229 Not a valid date.
ok 8 - ./datefinder-substr 0230120 Not a valid date.
ok 9 - ./datefinder-subset 2230120 1923-01-20
ok 10 - ./datefinder-subset 1230120 2023-01-20
ok 11 - ./datefinder-subset 2240229 1924-02-29
ok 12 - ./datefinder-subset 111111 Not a valid date.
ok 13 - ./datefinder-subset 111111s Not a valid date.
ok 14 - ./datefinder-subset 1230120s Not a valid date.
ok 15 - ./datefinder-subset 2230229 Not a valid date.
ok 16 - ./datefinder-subset 0230120 Not a valid date.
ok 17 - ./datefinder-grammar 2230120 1923-01-20
ok 18 - ./datefinder-grammar 1230120 2023-01-20
ok 19 - ./datefinder-grammar 2240229 1924-02-29
ok 20 - ./datefinder-grammar 111111 Not a valid date.
ok 21 - ./datefinder-grammar 111111s Not a valid date.
ok 22 - ./datefinder-grammar 1230120s Not a valid date.
ok 23 - ./datefinder-grammar 2230229 Not a valid date.
ok 24 - ./datefinder-grammar 0230120 Not a valid date.
ok 25 - ./datefinder-regex 2230120 1923-01-20
ok 26 - ./datefinder-regex 1230120 2023-01-20
ok 27 - ./datefinder-regex 2240229 1924-02-29
ok 28 - ./datefinder-regex 111111 Not a valid date.
ok 29 - ./datefinder-regex 111111s Not a valid date.
ok 30 - ./datefinder-regex 1230120s Not a valid date.
ok 31 - ./datefinder-regex 2230229 Not a valid date.
ok 32 - ./datefinder-regex 0230120 Not a valid date.
1..32

All 4 programs pass the 8 tests. Feel free to add more tests.

Note that the tests will take some time (about 6 seconds on my computer), as each one fires up a new instance of Raku. But it is faster than running them manually.

Challenge #38.2 - Word Game

Lets assume we have tiles as listed below, with an alphabet (A..Z) printed on them. Each tile has a value, e.g. A (1 point), B (4 points) etc. You are allowed to draw 7 tiles from the lot randomly. Then try to form a word using the 7 tiles with maximum points altogether. You don’t have to use all the 7 tiles to make a word. You should try to use as many tiles as possible to get the maximum points.

For example, A (x8) means there are 8 tiles with letter A.

1 point: A (x8), G (x3), I (x5), S (x7), U (x5), X (x2), Z (x5)

2 points: E (x9), J (x3), L (x3), R (x3), V (x3), Y (x5)

3 points: F (x3), D (x3), P (x5), W (x5)

4 points: B (x5), N (x4)

5 points: T (x5), O (x3), H (x3), M (x4), C (x4)

10 points: K (x2), Q (x2)

We need a dictionary, but thankfully I already looked into that in my response to Challenge 5.1. This stand alone program is copied from there without changes. It gives us a copy of a dictionary more suitable for lookups:

File: mkdictionary
my %source =
  <UK> => "/usr/share/dict/british-english",
  <US> => "/usr/share/dict/american-english",
  <DE> => "/usr/share/dict/ngerman";

unit sub MAIN (Str $language where %source{$language}.defined);

my @lines = %source{$language}.IO.lines.grep(* !~~ /\W/);

spurt "dict-$language.txt", $language eq "DE"
 ?? @lines.join("\n") ~ "\n"
 !! "A\nI\n" ~ @lines.grep( {.chars > 1 } ).join("\n") ~ "\n";

Running it with a valid language argument (UK) and an invalid (EN):

$ raku mkdictionary UK

$ raku mkdictionary EN
Usage:
  mkdictionary <language>

See my Perl 6 Anagrams article for a description of this program.

The word game program is quite straight forward:

File: wordgame (partial)
unit sub MAIN (:$length = 7,       # [1]
               :$verbose, 
               :$dictionary where $dictionary.IO.r = "dict-UK.txt");

my %value =                        # [2]
(
  A =>  1, G =>  1, I => 1, S => 1, U => 1, X => 1, Z => 1,
  E =>  2, J =>  2, L => 2, R => 2, V => 2, Y => 2,
  F =>  3, D =>  3, P => 3, W => 3,
  B =>  4, N =>  4,
  T =>  5, O =>  5, H => 5, M => 5, C => 5,
  K => 10, Q => 10
);

my %count =                        # [3]
(
  A => 8, G => 3, I => 5, S => 7, U => 5, X => 2, Z => 5,
  E => 9, J => 3, L => 3, R => 3, V => 3, Y => 5, F => 3,
  D => 3, P => 5, W => 5, B => 5, N => 4, T => 5, O => 3,
  H => 3, M => 4, C => 4, K => 2, Q => 2
);

sub get-dictionary ($file where $file.IO.r)  # [4]
{
  return $file.IO.lines.grep(* !~~ /\W/)>>.uc.Set;
}

my %dict = get-dictionary($dictionary);      # [4]

[1] The program takes three named optional parameters. «--length» allows us to override the default maximum length of 7 characters, «--verbose» gives verbose (or debug) output, and «--dictionary» allows us to specify another dictionary file (than tehe default english one).

[2] This hash has the letters and their values.

[3] This hash has the letters and their numbers.

[4] This procedure loads the dictionary file. The «grep» XXXXXX. We apply «uc» to get the words in upper case, and the «>>.» syntax (instead of «.») says that the method shall be applied to all the elements. The final «Set» coerces the list (given by «IO.lines») to a Set (which is a hash where the values are True or False only; and False is given for non-existing keys).

File: wordgame (partial)
my @letters = %value.keys.map({ $_ xx %count{$_} }).sort.flat; # [5]

say "- Letters: " ~ @letters.join if $verbose;                 # [6]

my @selection = @letters.pick($length);                        # [7]

say "- Selected: { @selection.join } with length $length" if $verbose; # [8]

[5] Get the list of letters to choose from. Each letters occurs as many times as the value in «%count». The «map» gives us a list for each letter, with as many copies of the letter as required.

[6] Print the list (as a string) to show that we got it right.

[7] Select (or indeed, pick) the required number of characters from the list of letters.

[8] Print the selected (pick'ed) letters (and the length).

See docs.raku.org/routine/xx for more information about the list repetition operator «xx».

See docs.raku.org/routine/pick for more information about «pick».

Note that we could have used the «Bag» type (see docs.raku.org/type/Bag) to store the letters, and have the count as the weight. Using «pick» on a «Bag» will take the weight into account. But this will not give us multiple copies of the same letter - so is useless in this case.

File: wordgame (partial)
my @legal-words = @selection.combinations>>.join.unique.grep({ %dict{$_} });
                                                            # [9]
say "- Legal words: @legal-words[]" if $verbose;

my %candidates;

for @legal-words -> $word                                   # [10]
{
  %candidates{$word} = $word.comb.map({ %value{$_} }).sum;  # [11]
}

my $max-val = %candidates.values.max;                       # [12]

say "Most valuable word(s) at $max-val points:";            # [12a]

say %candidates.keys.grep({ %candidates{$_} == $max-val }).sort.join(", "), ".";
                                                            # [13]

[9] Start with the selected letters (in «@selection»), apply «combinations» to get all possible combinations (from length 0 to the size of «@selection») as a list of lists. Then we apply «join» on each sublist to turn them into strings (and thus a list of strings). On that list we apply «unique» to get rid of duplicates, and «grep» to get rid og unknown words, leaving us with a list of valid words.

[10] Iterate over the valid words,

[11] • Calculate the value (in points) for each word. We start with a string (in «$word»), applies «comb» to get a list of single characters, uses «map» to turn each eletter into the value (in points), and finally «sum» on the list of values to get the grand total.

[12] Get the hightest value, and print it (12a).

[13] We start with the legal words («@candidates.keys»), applies «grep» to keep only the words with this length (12), sorts the list, prints the result with a «,» between the words, and terminates with a «.». Not that we can have one or more equally valid words as the result.

Running it:

$ raku wordgame
Most valuable word(s) at 12 points:
BMW.

$ raku wordgame
Most valuable word(s) at 6 points:
CI, CS, YIP.

$ raku wordgame
Most valuable word(s) at 7 points:
BP.

$ raku wordgame
Most valuable word(s) at 6 points:
NSA.

$ raku wordgame
Most valuable word(s) at 13 points:
COP.

We got at most a three letter word, so where did the seven letter words go? Verbose mode to the rescue:

$ raku wordgame --verbose
- Letters: AAAAAAAABBBBBCCCCDDDEEEEEEEEEFFFGGGHHHIIIIIJJJKKLLLMMMMNNNN\
OOOPPPPPQQRRRSSSSSSSTTTTTUUUUUVVVWWWWWXXYYYYYZZZZZ
- Selected: TVSVSNH with length 7
- Legal words: TS TH SN SH SVN
Most valuable word(s) at 10 points:
TH.

$ raku wordgame --verbose
- Letters: AAAAAAAABBBBBCCCCDDDEEEEEEEEEFFFGGGHHHIIIIIJJJKKLLLMMMMNNNN\
OOOPPPPPQQRRRSSSSSSSTTTTTUUUUUVVVWWWWWXXYYYYYZZZZZ
- Selected: BIURDKF with length 7
- Legal words: I ID IF UR BID BUR BUD IRK BIRD
Most valuable word(s) at 13 points:
IRK.

$ raku wordgame --verbose
- Letters: AAAAAAAABBBBBCCCCDDDEEEEEEEEEFFFGGGHHHIIIIIJJJKKLLLMMMMNNNN\
OOOPPPPPQQRRRSSSSSSSTTTTTUUUUUVVVWWWWWXXYYYYYZZZZZ
- Selected: OFXABFZ with length 7
- Legal words: A OF OX OB OZ FA OFF OAF
Most valuable word(s) at 11 points:
OFF.

The problem is simply that the randomly chosen letters doesn't make for good words. (Note that the first and third running have duplicate letters; «S» (the first) and «F» (the third).)

We can try with more letters:

$ raku wordgame --verbose --length=10
- Letters: AAAAAAAABBBBBCCCCDDDEEEEEEEEEFFFGGGHHHIIIIIJJJKKLLLMMMMNNNN\
OOOPPPPPQQRRRSSSSSSSTTTTTUUUUUVVVWWWWWXXYYYYYZZZZZ
- Selected: CGBTWEEMUB with length 10
- Legal words: CU GE BE TB WE WM WU EM MU CUB GTE GEE GEM BTU BEE TEE \
TUB WEE WEB EMU TWEE TEEM
Most valuable word(s) at 14 points:
TEEM.

$ raku wordgame --verbose --length=10
- Letters: AAAAAAAABBBBBCCCCDDDEEEEEEEEEFFFGGGHHHIIIIIJJJKKLLLMMMMNNNN\
OOOPPPPPQQRRRSSSSSSSTTTTTUUUUUVVVWWWWWXXYYYYYZZZZZ
- Selected: JSSZTFPIAU with length 10
- Legal words: I A SI TI FA PI PA PU AU STU SPA TIA
Most valuable word(s) at 7 points:
STU, TIA.

$ raku wordgame --verbose --length=10
- Letters: AAAAAAAABBBBBCCCCDDDEEEEEEEEEFFFGGGHHHIIIIIJJJKKLLLMMMMNNNN\
OOOPPPPPQQRRRSSSSSSSTTTTTUUUUUVVVWWWWWXXYYYYYZZZZZ
- Selected: ESFPHEESTY with length 10
- Legal words: ES EH SH SE FE PH PT HE TY SPY SHE SHY SEE SET STY FEE \
FEY PEE PET HES HEY ESPY SHES SEES FEES FEET FEST PEES PEST SHEET
Most valuable word(s) at 15 points:
SHEET.

Not much better. Note the middle one, where we have two words with the same value (15 points).

Crazy Bonus

The problem (so to speak) with the «wordgame» is that it is so random. It selects 7 (or another number, if given) letters at random, and the result is different each time we run the program.

We got a high score of 15 points above, but what is the best value we possibly can obtain for a word (withf maximum length 7) where we can choose from the entire list of letters?

We can amend the program do do just that...

File: wordgame-absolute (changes only)
say "- Letters: " ~ @letters.join if $verbose;

my @legal-words
  = @letters.combinations(1 .. $length)>>.join.unique.grep({ %dict{$_} });

I have removed the step involving «@selection» (and the two lines of code between the lines shown above). And added the lengths to the «combinations» call, as we do not want longer words than the specified upper limit.

Note that «combinations» without arguments will give us every combination from size zero up to the length of the list we applied it on. If we specify a single value as argument, we will get the combinations with exactly that number of elemensts. To get shorter lists, we specify the entire interval by a Range (in this case «1 .. $length»).

I am fully aware that the higher the number the longer time this will take, so I'll start easy:

$ raku wordgame-absolute --length=1
Most valuable word(s) at 1:
A, I

$ raku wordgame-absolute --length=2
Most valuable word(s) at 15:
KW.

$ raku wordgame-absolute --length=3
Most valuable word(s) at 15:
COO, COT, HOT, KW, MOO.

$ raku wordgame-absolute --length=4
Most valuable word(s) at 26:
KNOT.

$ raku wordgame-absolute --length=5
Most valuable word(s) at 26:
KNOT.

The timings are also of interest:

Length   Time   Increase   WordsPoints
10.9s-A, I1
20.95s5%KW13
31.5s50%COO, COT, HOT, MOO     15
431s2000%KNOT24
595m18000%KNOT24

I stopped at 5 letters, as the time usage for 6 letters would be excessive. (If we assume a 100_000% increase, the time usage would be about 2 months. For 7 letters we get about 160 years if we assume a 1_000_000% increase.) The program is simply not up to the task for higher numbers.

You may have heard that Raku supports parallelism out if the box, and a program can use that with a little care. (The «>>.join» call used in the code can be executed in parallell, so we have actually taken the first tiny step.) But even if we come up with a clever way of parallelising the entire program, it wouldn't give us a useful result. A machine with 1800 cores would still use 1 month on the task.

And that't it.