by Arne Sommer

# Mirrored Hash with Raku

[148] Published 3. October 2021.

This is my response to the Perl Weekly Challenge #132.

## Challenge #132.1: Mirror Dates

You are given a date (yyyy/mm/dd).

Assuming, the given date is your date of birth. Write a script to find the mirror dates of the given date.

`Dave Cross` has built cool site that does something similar.

Assuming today is 2021/09/22.

Example 1: ```Input: 2021/09/18 Output: 2021/09/14, 2021/09/26 On the date you were born, someone who was your current age, would have been born on 2021/09/14. Someone born today will be your current age on 2021/09/26. ``` Example 2: ```Input: 1975/10/10 Output: 1929/10/27, 2067/09/05 On the date you were born, someone who was your current age, would have been born on 1929/10/27. Someone born today will be your current age on 2067/09/05. ``` Example 3: ```Input: 1967/02/14 Output: 1912/07/08, 2076/04/30 On the date you were born, someone who was your current age, would have been born on 1912/07/08. Someone born today will be your current age on 2076/04/30. ```

The notion of the day you were born is a red herring. The only useful information to be had from that herring is that the specified date must be before the current date, as you would not have been born yet otherwise.

Let is use an illustration: We start with the current date (in black) and the user specified date (in red). We get the difference (in days), and subtract it from the user specified date to get the lower date (in green), and add it to the current date to get the upper date (in blue):

Then the program:

File: mirror-dates ```#! /usr/bin/env raku subset SlashDate where * ~~ /^\d\d\d\d\/\d\d\/\d\d\$/; # [1] my \$fmt = { sprintf "%04d/%02d/%02d", .year, .month, .day }; # [2] unit sub MAIN (SlashDate \$date = "2021/09/18", # [3] SlashDate :\$today = Date.new(DateTime.now, formatter => \$fmt).Str, # [4] :v(\$verbose)); my \$date-iso = \$date.subst("/", "-", :g); # [5] my \$earlier = Date.new(\$date-iso, formatter => \$fmt); # [6] my \$now = Date.new(\$today.subst("/", "-", :g), formatter => \$fmt); # [7] die "The date (\$earlier) should be earlier than today (\$now)" unless \$earlier < \$now; # [8] my \$diff = \$now - \$earlier; # [9] say ": Today: \$now" if \$verbose; say ": Difference: \$diff days" if \$verbose; my \$first = \$earlier.earlier(day => \$diff); # [10] my \$last = \$now.later( day => \$diff); # [11] say "\$first, \$last"; # [12] ```

[1] A custom type, set up `subset`, for the peculiar (non-iso) date format specified in the challenge.

[2] Anonymous procedure used when printing a Date object, using the reqired format.

[3] The specified date. Note the custom type (from [1]) handling compliance issues.

[4] Use this optionally (named) argument to specify the current date, as the program is difficult to test otherwise. Default to the currenr date. Note the formatter specification, so that simply printing the object will give us what the challenge requested.

[5] Convert the date string to iso standard, as the Date object requires it.

[6] Get the date object for the specified date.

[7] Get today's date.

[8] The specified date should be in the past.

[9] The number of days between the two dates. Subtraction on two `Date` objects does the trick.

[10] Get the first date; in the past.

[11] Get the second date, in the future.

[12] Print the dates.

See docs.raku.org/language/typesystem#index-entry-subset-subset for more information about `subset`.

Running it:

```\$ ./mirror-dates 2021/09/18 2021/09/05, 2021/10/14 ```

Not what you expected? Verbose mode to the rescue:

```\$ ./mirror-dates -v 2021/09/18 : Today: 2021/10/01 : Difference: 13 days 2021/09/05, 2021/10/14 ```

The problem is the notion of today. But the program can overcome that problem, removing the element of entropy. Let us try again, using the same current date as the challenge:

```\$ ./mirror-dates -today=2021/09/22 -v 2021/09/18 : Today: 2021/09/22 : Difference: 4 days 2021/09/14, 2021/09/26 \$ ./mirror-dates -today=2021/09/22 -v 1975/10/10 : Today: 2021/09/22 : Difference: 16784 days 1929/10/27, 2067/09/05 \$ ./mirror-dates -today=2021/09/22 -v 1967/02/14 : Today: 2021/09/22 : Difference: 19944 days 1912/07/08, 2076/04/30 ```

Looking good.

## Challenge #132.2: Hash Join

Write a script to implement Hash Join algorithm as suggested by wikipedia.

1. For each tuple r in the build input R
1. Add r to the in-memory hash table
2. If the size of the hash table equals the maximum in-memory size:
1. Scan the probe input S, and add matching join tuples to the output relation
2. Reset the hash table, and continue scanning the build input R
2. Do a final scan of the probe input S and add the resulting join tuples to the output relation
Example: ```Input: @player_ages = ( [20, "Alex" ], [28, "Joe" ], [38, "Mike" ], [18, "Alex" ], [25, "David" ], [18, "Simon" ], ); @player_names = ( ["Alex", "Stewart"], ["Joe", "Root" ], ["Mike", "Gatting"], ["Joe", "Blog" ], ["Alex", "Jones" ], ["Simon","Duane" ], ); Output: Based on index = 1 of @players_age and index = 0 of @players_name. 20, "Alex", "Stewart" 20, "Alex", "Jones" 18, "Alex", "Stewart" 18, "Alex", "Jones" 28, "Joe", "Root" 28, "Joe", "Blog" 38, "Mike", "Gatting" 18, "Simon", "Duane" ```

I have ignored the «in-memory size», as it does not make (much) sense with the amount of the memory used in modern computers.

File: hash-join ```#! /usr/bin/env raku unit sub MAIN (:v(:\$verbose)); my @player_ages = ( # [1] [20, "Alex" ], [28, "Joe" ], [38, "Mike" ], [18, "Alex" ], [25, "David" ], [18, "Simon" ], ); my @player_names = ( # [2] ["Alex", "Stewart"], ["Joe", "Root" ], ["Mike", "Gatting"], ["Joe", "Blog" ], ["Alex", "Jones" ], ["Simon","Duane" ], ); hash-join(@player_ages, 1, @player_names, 0); # [3] sub hash-join (@table1 is copy, \$index1 is copy, @table2 is copy, \$index2 is copy) # [4] { if @table1.elems > @table2.elems # [5] { (@table1, @table2) = (@table2, @table1); # [5a} (\$index1, \$index2) = (\$index2, \$index1); # [5b] } my %table1; # [6] for @table1 -> \$row # [7] { my @data = @\$row; # [7a] my \$id = @data.splice(\$index1, 1); # [8] say ": Table 1 Row (add): \$id -> { @data.join(", ") }" if \$verbose; %table1{\$id}.push: @data; # [9] } my %result; # [10] for @table2 -> \$row # [11] { my @data = @\$row; # [11a] my \$id = @data.splice(\$index2, 1); # [12] if \$verbose { say ": Table 1 Row (scan): \$id -> { @(%table1{\$id}).join(", ") }"; say ": Table 2 Row (scan): \$id -> { @data.join(", ") }"; } for @(%table1{\$id}) -> \$row # [13] { %result{\$id}.push: (@\$row, @data).flat; # [14] say ": Result Table Row: \$id -> { @\$row.join(", ") } + \ { @data.join(", ") }" if \$verbose; } } for %result.keys.sort -> \$id # [15] { my @rows = @(%result{\$id}); # [16] for @rows -> \$row # [17] { say nicify(|(\$id, @\$row)); # [18] } } } sub nicify (*@data) # [19} { return @data.map({ \$_ ~~ Numeric ?? \$_ !! '"' ~ \$_ ~ '"' }).join(", "); } ```

[1] The first table.

[2] The second table.

[3] Join (and print) the two specified tables.

[4] Note the use of `is copy` so that we can change the variables. The default behaviour is a read-only value.

See docs.raku.org/type/Signature#index-entry-trait_is_copy more information about `is copy`.

[5] The wikipedia article advises starting with the smallest relation. The procedure will start with the table with the lowest number of rows, by swapping them [5a, 5b] if the second one is smaller.

[6] The first table, as a hash, will end up here.

[7] Iterate over the first table, row by row. Get the row content as an array [7a].

[8] Extract (remove) the ID value from the list.

See docs.raku.org/routine/splice more information about `splice`.

[9] Add the data (sans ID) to the hash, with the ID as key. Note the value, `@data`. This is a list, but lists are fine as values. Also note the `push` which will set up the has value to be a list (if not already a list), with the given list as the first entry. This makes it possible to have several values, which we need.

[10] The result of joining the two tables will end up here.

[11] Iterate over the second table.

[12] Extract the ID.

[13] Iterate over the values (rows) from the first table with the current ID.

[14] • Add the two lists to the new has, with the ID as key. Note the `flat` to merge the two lists (into a single list).

See docs.raku.org/routine/flat for more information about `flat`.

[15] Iterate over the results.

[16] Get the row(s) (the values).

[17] For each row,

[18] • pass it on to «nicify» to format it. Note the prefix `|` that flattens the content of the parens.

[19] `map` is a fancy loop, quite useful for showing off. The idea is just to quote non numeric numbers - as in the example.

See docs.raku.org/routine/map for more information about `map`.

Running it:

```\$ ./hash-join "Alex", 20, "Stewart" "Alex", 18, "Stewart" "Alex", 20, "Jones" "Alex", 18, "Jones" "Joe", 28, "Root" "Joe", 28, "Blog" "Mike", 38, "Gatting" "Simon", 18, "Duane" ```

The order of the columns is not quite right, but that is fixable. But before tackling that, let us have a go at verbose mode:

```\$ ./hash-join -v : Table 1 Row (add): Alex -> 20 : Table 1 Row (add): Joe -> 28 : Table 1 Row (add): Mike -> 38 : Table 1 Row (add): Alex -> 18 : Table 1 Row (add): David -> 25 : Table 1 Row (add): Simon -> 18 : Table 1 Row (scan): Alex -> 20, 18 : Table 2 Row (scan): Alex -> Stewart : Result Table Row: Alex -> 20 + Stewart : Result Table Row: Alex -> 18 + Stewart : Table 1 Row (scan): Joe -> 28 : Table 2 Row (scan): Joe -> Root : Result Table Row: Joe -> 28 + Root : Table 1 Row (scan): Mike -> 38 : Table 2 Row (scan): Mike -> Gatting : Result Table Row: Mike -> 38 + Gatting : Table 1 Row (scan): Joe -> 28 : Table 2 Row (scan): Joe -> Blog : Result Table Row: Joe -> 28 + Blog : Table 1 Row (scan): Alex -> 20, 18 : Table 2 Row (scan): Alex -> Jones : Result Table Row: Alex -> 20 + Jones : Result Table Row: Alex -> 18 + Jones : Table 1 Row (scan): Simon -> 18 : Table 2 Row (scan): Simon -> Duane : Result Table Row: Simon -> 18 + Duane "Alex", 20, "Stewart" "Alex", 18, "Stewart" "Alex", 20, "Jones" "Alex", 18, "Jones" "Joe", 28, "Root" "Joe", 28, "Blog" "Mike", 38, "Gatting" "Simon", 18, "Duane" ```

Then we can have a go at the column order. The problem is that we print the ID first, regardless of the original order. That is easy to fix:

File: hash-join-sorted ```#! /usr/bin/env raku unit sub MAIN (:v(:\$verbose)); my @player_ages = ( [20, "Alex" ], [28, "Joe" ], [38, "Mike" ], [18, "Alex" ], [25, "David" ], [18, "Simon" ], ); my @player_names = ( ["Alex", "Stewart"], ["Joe", "Root" ], ["Mike", "Gatting"], ["Joe", "Blog" ], ["Alex", "Jones" ], ["Simon","Duane" ], ); hash-join-sorted(@player_ages, 1, @player_names, 0); sub hash-join-sorted (@table1, \$index1, @table2, \$index2) # [1] { my %table1; for @table1 -> \$row { my @data = @\$row; my \$id = @data[\$index1]; # [2] say ": Table 1 Row (add): \$id -> { @data.join(", ") }" if \$verbose; %table1{\$id}.push: @data; } my %result; for @table2 -> \$row { my @data = @\$row; my \$id = @data.splice(\$index2, 1); if \$verbose { say ": Table 1 Row (scan): \$id -> { @(%table1{\$id}).join(", ") }"; say ": Table 2 Row (scan): \$id -> { @data.join(", ") }"; } for @(%table1{\$id}) -> \$row { %result{\$id}.push: (@\$row, @data).flat; say ": Result Table Row: \$id -> { @\$row.join(", ") } + \ { @data.join(", ") }" if \$verbose; } } for %result.keys.sort -> \$id { my @rows = @(%result{\$id}); for @rows -> \$row { say nicify(@\$row); # [3] } } } sub nicify (*@data) { return @data.map({ \$_ ~~ Numeric ?? \$_ !! '"' ~ \$_ ~ '"' }).join(", "); } ```

[1] The `is copy` and the swapping of the arrays has gone, as it did not really give that much. Except the possibility to have the column order royally screwed up.

[2] Get the ID, but leave it intact in the array.

[3] The ID is now part of the first table (`@\$row`), so we skip adding it.

Running it:

```\$ ./hash-join-sorted 20, "Alex", "Stewart" 18, "Alex", "Stewart" 20, "Alex", "Jones" 18, "Alex", "Jones" 28, "Joe", "Root" 28, "Joe", "Blog" 38, "Mike", "Gatting" 18, "Simon", "Duane" ```

That's better.

Note that the order of the rows differ from the one given in the challenge. I do not think that it matters very much, but I'll look into it later anyway.

With verbose mode:

```\$ ./hash-join-sorted -v : Table 1 Row (add): Alex -> 20, Alex : Table 1 Row (add): Joe -> 28, Joe : Table 1 Row (add): Mike -> 38, Mike : Table 1 Row (add): Alex -> 18, Alex : Table 1 Row (add): David -> 25, David : Table 1 Row (add): Simon -> 18, Simon : Table 1 Row (scan): Alex -> 20 Alex, 18 Alex : Table 2 Row (scan): Alex -> Stewart : Result Table Row: Alex -> 20, Alex + Stewart : Result Table Row: Alex -> 18, Alex + Stewart : Table 1 Row (scan): Joe -> 28 Joe : Table 2 Row (scan): Joe -> Root : Result Table Row: Joe -> 28, Joe + Root : Table 1 Row (scan): Mike -> 38 Mike : Table 2 Row (scan): Mike -> Gatting : Result Table Row: Mike -> 38, Mike + Gatting : Table 1 Row (scan): Joe -> 28 Joe : Table 2 Row (scan): Joe -> Blog : Result Table Row: Joe -> 28, Joe + Blog : Table 1 Row (scan): Alex -> 20 Alex, 18 Alex : Table 2 Row (scan): Alex -> Jones : Result Table Row: Alex -> 20, Alex + Jones : Result Table Row: Alex -> 18, Alex + Jones : Table 1 Row (scan): Simon -> 18 Simon : Table 2 Row (scan): Simon -> Duane : Result Table Row: Simon -> 18, Simon + Duane 20, "Alex", "Stewart" 18, "Alex", "Stewart" 20, "Alex", "Jones" 18, "Alex", "Jones" 28, "Joe", "Root" 28, "Joe", "Blog" 38, "Mike", "Gatting" 18, "Simon", "Duane" ```

Placing the procedures in a module is a good idea:

File: lib/HashJoin.rakumod ```unit module HashJoin; our sub hash-join-sorted (@table1, \$index1, @table2, \$index2, :\$verbose) { … } our sub hash-join (@table1 is copy, \$index1 is copy, @table2 is copy, \$index2 is copy, :\$verbose) { … } sub nicify (*@data) { – } ```

You have seen the procedure content before.

Then the program, doing the same as «hash-join-sorted» (so nothing really exciting):

File: hash-join-module ```#! /usr/bin/env raku use lib "lib"; use HashJoin; unit sub MAIN (:v(:\$verbose)); my @player_ages = ( [20, "Alex" ], [28, "Joe" ], [38, "Mike" ], [18, "Alex" ], [25, "David" ], [18, "Simon" ], ); my @player_names = ( ["Alex", "Stewart"], ["Joe", "Root" ], ["Mike", "Gatting"], ["Joe", "Blog" ], ["Alex", "Jones" ], ["Simon","Duane" ], ); HashJoin::hash-join-sorted(@player_ages, 1, @player_names, 0, :\$verbose); ```

Running it gives the expected result. Feel free to take my word for it.

The versions so far print the result, which on the face of it is ok. But it makes it impossible to do something with the result, as in merge a third table. So let us do a slight rewrite, supporting that.

And have a go at the order of the rows at the same time...

The row order rewrite actually makes the task easier, as we can skip the first step and do them both in one fell swoop.

File: lib/HashJoin.rakumod (changes only) ```our sub merge (@table1, \$index1, @table2, \$index2, :\$verbose) { my %table1; my @ids; # [3] my %ids; my %result; for @table1 -> \$row1 # [1] { my @data1 = @\$row1; my \$id1 = @data1[\$index1]; @ids.push: \$id1 unless any(@ids) eq \$id1; # [3a] say ": Table 1 Row (scan): \$id1 -> { @data1.join(", ") }" if \$verbose; for @table2 -> \$row2 # [2] { my @data2 = @\$row2; my \$id2 = @data2.splice(\$index2, 1); next unless \$id1 eq \$id2; # [2a] %ids{\$id2} = True; # [3b] %result{\$id1}.push: (@data1, @data2).flat; # [4] say ": Result Table Row: \$id1 -> { @data1.join(", ") } + \ { @data2.join(", ") }" if \$verbose; } } my @result; for @ids -> \$id # [5] { next unless %ids{\$id}; # [5b] @result.append: @(%result{\$id}); } return @result; # [6] } our sub print-it(*@data) # [7] { for @data -> \$row { say nicify(@\$row); } } ```

[1] Iterate over the first table.

[2] Iterate over the second table, skipping rows that does not match (the same ID) [2a].

[3] A list of unique IDs, in the order we find them in the first table [3a]. The hash variant holds the IDs we get from joining the two tables [3b].

[4] Add the combined row to the result, as an array element in the hash with the specified ID.

[5] Iterate over the IDs from the first table (i.e. the correct order), and skip those without a joined row [5a].

[6] Return the data, instead of printing it.

[7] The printing part, as a separate procedure.

The program:

File: hash-join-module2 ```#! /usr/bin/env raku use lib "lib"; use HashJoin; unit sub MAIN (:v(:\$verbose)); my @player_ages = ( [20, "Alex" ], [28, "Joe" ], [38, "Mike" ], [18, "Alex" ], [25, "David" ], [18, "Simon" ], ); my @player_names = ( ["Alex", "Stewart"], ["Joe", "Root" ], ["Mike", "Gatting"], ["Joe", "Blog" ], ["Alex", "Jones" ], ["Simon","Duane" ], ); my @joined = HashJoin::merge(@player_ages, 1, @player_names, 0, :\$verbose); HashJoin::print-it(@joined); ```

Running it:

```\$ ./hash-join-module2 -v 20, "Alex", "Stewart" 20, "Alex", "Jones" 18, "Alex", "Stewart" 18, "Alex", "Jones" 28, "Joe", "Root" 28, "Joe", "Blog" 38, "Mike", "Gatting" 18, "Simon", "Duane" ```

The order is correct now.

With verbose mode:

```\$ ./hash-join-module2 -v : Table 1 Row (scan): Alex -> 20, Alex : Result Table Row: Alex -> 20, Alex + Stewart : Result Table Row: Alex -> 20, Alex + Jones : Table 1 Row (scan): Joe -> 28, Joe : Result Table Row: Joe -> 28, Joe + Root : Result Table Row: Joe -> 28, Joe + Blog : Table 1 Row (scan): Mike -> 38, Mike : Result Table Row: Mike -> 38, Mike + Gatting : Table 1 Row (scan): Alex -> 18, Alex : Result Table Row: Alex -> 18, Alex + Stewart : Result Table Row: Alex -> 18, Alex + Jones : Table 1 Row (scan): David -> 25, David : Table 1 Row (scan): Simon -> 18, Simon : Result Table Row: Simon -> 18, Simon + Duane 20, "Alex", "Stewart" 20, "Alex", "Jones" 18, "Alex", "Stewart" 18, "Alex", "Jones" 28, "Joe", "Root" 28, "Joe", "Blog" 38, "Mike", "Gatting" 18, "Simon", "Duane" ```

Let us join three tables, two at a time. I have added additional columns to the tables as well, just to show off.

File: hash-join-module3 ```#! /usr/bin/env raku use lib "lib"; use HashJoin; unit sub MAIN (:v(:\$verbose)); my @player_ages = ( [20, "Alex", "May", 29 ], [28, "Joe", "August", 2 ], [38, "Mike", "June", 12 ], [18, "Alex", "January", 29 ], [25, "David", "March", 14 ], [18, "Simon", "June", 10 ], ); my @player_names = ( ["Alex", "Stewart", "Oslo" ], ["Joe", "Root", "Berlin" ], ["Mike", "Gatting", "London" ], ["Joe", "Blog", "Bern" ], ["Alex", "Jones", "Zürich" ], ["Simon","Duane", "Oxford" ], ); my @countries = ( ["Norway", "Oslo" ], ["Germany", "Berlin" ], ["England", "London" ], ["Switzerland", "Bern" ], ["Switzerland", "Zürich" ], ["England", "Oxford" ], ); my @joined = HashJoin::merge(@player_ages, 1, @player_names, 0, :\$verbose); my @joined2 = HashJoin::merge(@joined, 5, @countries, 1); HashJoin::print-it(@joined2); ```

Running it:

```\$ ./hash-join-module3 20, "Alex", "May", 29, "Stewart", "Oslo", "Norway" 18, "Alex", "January", 29, "Stewart", "Oslo", "Norway" 20, "Alex", "May", 29, "Jones", "Zürich", "Switzerland" 18, "Alex", "January", 29, "Jones", "Zürich", "Switzerland" 28, "Joe", "August", 2, "Root", "Berlin", "Germany" 28, "Joe", "August", 2, "Blog", "Bern", "Switzerland" 38, "Mike", "June", 12, "Gatting", "London", "England" 18, "Simon", "June", 10, "Duane", "Oxford", "England" ```

Let us try with a second duplicate (Birmingham):

File: hash-join-module4 ``` #! /usr/bin/env raku use lib "lib"; use HashJoin; unit sub MAIN (:v(:\$verbose)); my @player_ages = ( [20, "Alex", "May", 29 ], [28, "Joe", "August", 2 ], [38, "Mike", "June", 12 ], [18, "Alex", "January", 29 ], [25, "David", "March", 14 ], [18, "Simon", "June", 10 ], ); my @player_names = ( ["Alex", "Stewart", "Oslo" ], ["Joe", "Root", "Berlin" ], ["Mike", "Gatting", "London" ], ["Joe", "Blog", "Bern" ], ["Alex", "Jones", "Zürich" ], ["Simon","Duane", "Birmingham" ], ); my @countries = ( ["Norway", "Oslo" ], ["Germany", "Berlin" ], ["England", "London" ], ["Switzerland", "Bern" ], ["Switzerland", "Zürich" ], ["England", "Birmingham" ], ["USA", "Birmingham" ], ); my @joined = HashJoin::merge(@player_ages, 1, @player_names, 0, :\$verbose); my @joined2 = HashJoin::merge(@joined, 5, @countries, 1); HashJoin::print-it(@joined2); ```

Running it:

```\$ ./hash-join-module4 20, "Alex", "May", 29, "Stewart", "Oslo", "Norway" 18, "Alex", "January", 29, "Stewart", "Oslo", "Norway" 20, "Alex", "May", 29, "Jones", "Zürich", "Switzerland" 18, "Alex", "January", 29, "Jones", "Zürich", "Switzerland" 28, "Joe", "August", 2, "Root", "Berlin", "Germany" 28, "Joe", "August", 2, "Blog", "Bern", "Switzerland" 38, "Mike", "June", 12, "Gatting", "London", "England" 18, "Simon", "June", 10, "Duane", "Birmingham", "England" 18, "Simon", "June", 10, "Duane", "Birmingham", "USA" ```

And that's it.