Hexa Diff
with Raku

by Arne Sommer

Hexa Diff with Raku

[185] Published 29. May 2022.

This is my response to the Perl Weekly Challenge #166.

Challenge #166.1: Hexadecimal Words

As an old systems programmer, whenever I needed to come up with a 32-bit number, I would reach for the tired old examples like 0xDeadBeef and 0xC0dedBad. I want more!

Write a program that will read from a dictionary and find 2- to 8-letter words that can be “spelled” in hexadecimal, with the addition of the following letter substitutions:
  • o ⟶ 0 (e.g., 0xf00d = “food”)
  • l ⟶ 1
  • i ⟶ 1
  • s ⟶ 5
  • t ⟶ 7
You can use your own dictionary or you can simply open ../../../data/dictionary.txt (relative to your script’s location in our GitHub repository) to access the dictionary of common words from Week #161.

Optional Extras (for an 0xAddedFee, of course!)
  1. Limit the number of “special” letter substitutions in any one result to keep that result at least somewhat comprehensible. (0x51105010 is an actual example from my sample solution you may wish to avoid!)
  2. Find phrases of words that total 8 characters in length (e.g., 0xFee1Face), rather than just individual words.
File: hexawords-olist       # [0]
#! /usr/bin/env raku

unit sub MAIN (:d(:$dictionary) where $dictionary.IO.r = 'dictionary.txt'); # [1]

my @dict
  = $dictionary.IO.lines.grep( 1 < *.chars <= 8 ).grep( * ~~ /^<[abcdefolist]>+$/);
	                                    				    # [2]

@dict.map({ say "0x" ~ TR/olist/01157/ });                                  # [3]

[0] See if you can figure out the «olist» part of the filename.

[1] I have placed (a copy of) the dictionary in the current directory. Use the «d» command line option to override the location and/or filename.

[2] Read the file, into an array (consisting of one entry per row in the file - which happen to contain one word each). Then we use grep to remove words with more than 8 characters, followed by another grep that ensures valid letters (a-f, i,l,o,s and t) only.

[3] Then we use map to translate the non-hex letters to a digit. Note the use of non-destructive transliteration (TR//) instead of in-place transliteration (tr//). Also note that we do not use the return value (of the map) at all, but print the result (with say) inside the block.

See docs.raku.org/language/operators#TR///_non-destructive_transliteration for more information about TR///.

See docs.raku.org/syntax/tr/// for more information about the in-place transliteration operatortr///.

Running it:

$ ./hexawords-olist
0xa
0xaba7e
 ...
0x7075
0x7077ed

We got 1463 words, which is way too much to wade through. Most of them are unreadable, and I dislike the mapping between some digits and letters (s => 5, t => 7), and do think that «1» shold be «l» (lowercase L) only, and not «i» (lowercase I).

Here is a modified version that can do just that, as well as the prescribed translations:

File: hexawords
#! /usr/bin/env raku

unit sub MAIN
(
  :d(:$dictionary) where $dictionary.IO.r = 'dictionary.txt',
  :p(:$pure),      # [1]
);

my @dict = $pure   # [2]
  ?? $dictionary.IO.lines.grep( 1 < *.chars <= 8 ).grep( * ~~ /^<[abcdeflo]>+$/)
  !! $dictionary.IO.lines.grep( 1 < *.chars <= 8 ).grep( * ~~ /^<[abcdefilost]>+$/);

@dict.map({ say "0x" ~ TR/olist/01157/ });  # [3]

[1] Use the «p» (pure) option to get the (in my view) obvious character to letter mapping.

[2] The result is a list of words with legal letters only, and the letters depend on pure mode.

[3] Do not let this confuse you. If we use pure mode, the «unpure» letters «i», «s» and «t» do not occur in the dictionary (courtesy of [2]) so we will not end up with «unpure» numbers in the words.

Running it without the «p» option gives the same result as before; 1463 words.

Running it with the «p» option gives 199 words:

$ ./hexawords -p
0xab1e
0xab0de
0xaccede
0xacceded
0xacc01ade
0xace
0xaced
0xad
0xadd
0xadded
0xad0
0xad0be
0xaffab1e
0xa1e
0xa11
0xa100f
0xbabb1e
0xbabb1ed
0xbabe
0xbad
0xbade
0xbaff1e
0xbaff1ed
0xba1d
0xba1ded
0xba1e
0xba1ed
0xba11
0xba11ad
0xba11ed
0xbe
0xbead
0xbeaded
0xbed
0xbedded
0xbee
0xbeef
0xbeefed
0xbefa11
0xbefe11
0xbe11
0xbe11ed
0xb1ab
0xb1abbed
0xb1ade
0xb1ed
0xb1eed
0xb10b
0xb10bbed
0xb10c
0xb100d
0xb100ded
0xb0a
0xb0b
0xb0bbed
0xb0de
0xb0ded
0xb01d
0xb00
0xb00ed
0xcab
0xcabbed
0xcab1e
0xcab1ed
0xcaca0
0xcad
0xca1f
0xca11
0xca11ab1e
0xca11ed
0xcc
0xcede
0xceded
0xce11
0xce110
0xc1ad
0xc1ef
0xc10d
0xc0a1
0xc0a1ed
0xc0b
0xc0bb1e
0xc0c0a
0xc0d
0xc0dded
0xc0de
0xc0ded
0xc0ffee
0xc01
0xc01d
0xc00
0xc00ed
0xc001
0xc001ed
0xdab
0xdabbed
0xdabb1e
0xdabb1ed
0xdad
0xdead
0xdeaf
0xdea1
0xdecade
0xdec0de
0xdec0ded
0xdeed
0xdeeded
0xdeface
0xdefaced
0xd0
0xd0d0
0xd0e
0xd01e
0xd01ed
0xd011
0xd011ed
0xd00d1e
0xd00d1ed
0xebb
0xebbed
0xee1
0xe1f
0xfab1e
0xfacade
0xface
0xfaced
0xfad
0xfade
0xfaded
0xfa11
0xfed
0xfee
0xfeeb1e
0xfeed
0xfee1
0xfe11
0xfe11ed
0xf1ea
0xf1ed
0xf1ee
0xf1eece
0xf1eeced
0xf100d
0xf100ded
0xf0a1
0xf0a1ed
0xf0ca1
0xf0e
0xf01d
0xf01ded
0xf00d
0xf001
0xf001ed
0x1ab
0x1abe1
0x1abe1ed
0x1ace
0x1aced
0x1ad
0x1ade
0x1aded
0x1ad1e
0x1ad1ed
0x1ead
0x1eaded
0x1eaf
0x1eafed
0x1ed
0x1ee
0x10ad
0x10adab1e
0x10aded
0x10af
0x10afed
0x10b
0x10bbed
0x10be
0x10ca1
0x10ca1e
0x1011
0x1011ed
0x0af
0x0b0e
0x0dd
0x0de
0x0f
0x0ff
0x0ffed
0x0ff10ad
0x01d

I'll leave it at that.

Challenge #166.2: K-Directory Diff

Given a few (three or more) directories (non-recursively), display a side-by-side difference of files that are missing from at least one of the directories. Do not display files that exist in every directory.

Since the task is non-recursive, if you encounter a subdirectory, append a /, but otherwise treat it the same as a regular file.

Example:
Given the following directory structure:
dir_a:
Arial.ttf  Comic_Sans.ttf  Georgia.ttf  Helvetica.ttf  Impact.otf
Verdana.ttf  Old_Fonts/

dir_b:
Arial.ttf  Comic_Sans.ttf  Courier_New.ttf  Helvetica.ttf  Impact.otf
Tahoma.ttf  Verdana.ttf

dir_c:
Arial.ttf  Courier_New.ttf  Helvetica.ttf  Impact.otf  Monaco.ttf
Verdana.ttf
The output should look similar to the following:
dir_a          | dir_b           | dir_c
-------------- | --------------- | ---------------
Comic_Sans.ttf | Comic_Sans.ttf  |
               | Courier_New.ttf | Courier_New.ttf
Georgia.ttf    |                 |
               |                 | Monaco.ttf
Old_Fonts/     |                 |
               | Tahoma.ttf      |

Note that we are not concerned about file sizes, so we can get away with empty files. Which we can get with the Unix «touch» command. A little shell script can set up a directory structure, exactly as given in the challenge. This makes testing easier, as the challenge gives the expected result.

File: setup.sh
#! /bin/sh

mkdir dir_a dir_a/Old_Fonts/ dir_b dir_c 
cd dir_a
touch Arial.ttf Comic_Sans.ttf Georgia.ttf Helvetica.ttf Impact.otf Verdana.ttf

cd ../dir_b
touch Arial.ttf Comic_Sans.ttf Courier_New.ttf Helvetica.ttf Impact.otf Tahoma.ttf \
  Verdana.ttf

cd ../dir_c
touch Arial.ttf Courier_New.ttf Helvetica.ttf Impact.otf Monaco.ttf Verdana.ttf

Let us start experimenting in REPL:

> "dir_a".IO.dir.join("\n")
dir_a/Impact.otf
dir_a/Comic_Sans.ttf
dir_a/Old_Fonts
dir_a/Arial.ttf
dir_a/Helvetica.ttf
dir_a/Verdana.ttf
dir_a/Georgia.ttf

The IO.dir method gives us the content of a directory, as a list of IO objects. They stringify (when we print them) to the filename. The path is uncluded, as we did not tun the command in the directory itself.

See docs.raku.org/routine/dir for more information about IO.dir

So far, so good. But we do not want the directory part (in the output). We can get rid of that with the IO.basename method:

> "dir_a".IO.dir>>.basename.join("\n")
Impact.otf
Comic_Sans.ttf
Old_Fonts
Arial.ttf
Helvetica.ttf
Verdana.ttf
Georgia.ttf

See docs.raku.org/routine/basename for more information about IO.basename.

That is better.

Note that the indir function could have been used here instead. See docs.raku.org/routine/indir for more information about indir.

We were asked to add a slash after directories, and stringification (of IO objects) does not do that for us. But we can do it manually (with map):

> "dir_a".IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).join("\n")
Impact.otf
Comic_Sans.ttf
Old_Fonts/
Arial.ttf
Helvetica.ttf
Verdana.ttf
Georgia.ttf

Then we do this for the three directories, and compare the result. Using hashes is the obvious way (but a Set saves us some typing):

File: k-dir-diff
 /usr/bin/env raku

unit sub MAIN ($dir1 where $dir1.IO.d && $dir1.IO.r = 'dir_a',   # [1]
               $dir2 where $dir2.IO.d && $dir2.IO.r = 'dir_b',
               $dir3 where $dir3.IO.d && $dir3.IO.r = 'dir_c',
);

my %dir1 = $dir1.IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).Set; # [2]
my %dir2 = $dir2.IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).Set;
my %dir3 = $dir3.IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).Set;

my %all = %dir1 (|) %dir2 (|) %dir3;         # [3]

my $max-length = (%all.keys>>.chars.max,     # [4]
                  $dir1.chars, $dir2.chars, $dir3.chars).max;

say "{ $dir1.fmt("%-{$max-length}s") } | \
     { $dir2.fmt("%-{$max-length}s") } | \
     { $dir3.fmt("%-{$max-length}s") }";     # [5]

say "-" x $max-length * 3 + 6;               # [6]

for sort keys %all -> $file                  # [7]
{
  next if %dir1{$file} && %dir2{$file} && %dir3{$file};  # [8]

  say "{ (%dir1{$file} ?? $file !! '').fmt("%-{$max-length}s") } | "  # [9]
    ~ "{ (%dir2{$file} ?? $file !! '').fmt("%-{$max-length}s") } | "
    ~ "{ (%dir3{$file} ?? $file !! '').fmt("%-{$max-length}s") }";
}

[1] The three directories, with default values as in the example. Ensure that they are directories (IO.d) and readable (IO.r).

[2] Get the list of files/directories, and turn it into a Set, a hash like structure. The assignment coerces the Set to hash for us, so the Set does not live very long.

[3] Get a list of all the files/directories, using the Set Union operator (|).

See docs.raku.org/routine/(|), infix ∪ for more information about the Set Union operator (|).

[4] The longest file name, used for padding purposes. Note that the trailing slash for directories is included, and the directory names (used in the header) are included as well.

[5] Print the directories, nicely padded.

[6] Note the use of the string repetition operator x to generate the separator line.

See docs.raku.org/routine/x for more information about the string repetition operator x.

[7] Iterate over the files, alphabetically.

[8] Skip files that sre present in all three hashes (and directories).

[9] Print the file, if present and spaces otherwise, nicely padded.

Running it:

$ ./k-dir-diff
dir_a           | dir_b           | dir_c          
---------------------------------------------------
Comic_Sans.ttf  | Comic_Sans.ttf  |                
                | Courier_New.ttf | Courier_New.ttf
Georgia.ttf     |                 |                
                |                 | Monaco.ttf     
Old_Fonts/      |                 |                
                | Tahoma.ttf      |                

Spot on.

And that's it.

Perhaps not.

The challenge specified «three or more» directories. So we should do that:

We need some more files and directories. This shell script will supply them for us:

File: setup.sh
#! /bin/sh

mkdir dir_d dir_e 
cd dir_d
touch  Arial.ttf FrutigerBold.ttf

cd ../dir_e
touch Arial.ttf Impact.otf HelveticaModern.ttf

Note that «Arial.ttf» is present in all the directories (including the previous lot).

File: k-dir-diff-multi
#! /usr/bin/env raku

unit sub MAIN (*@dirs where @dirs.elems >= 3 && all(@dirs) ~~ .IO.d    # [1]
                                             && all(@dirs) ~~ .IO.r);

my $dir-count = @dirs.elems;                                           # [2]

my %dir;                                                               # [3]

for @dirs -> $dir                                                      # [4]
{
  %dir{$dir} = $dir.IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).Set;
}

my %all = %dir.values>>.List.flat.Bag;                                 # [5]

my $max-length = (%all.keys>>.chars.max, @dirs>>.chars.max).max;       # [6]

my @head;

for @dirs -> $dir
{
  @head.push: $dir.fmt("%-{$max-length}s");
}

say @head.join(" | ");

say "-" x (3 + $max-length) * $dir-count;

for sort keys %all -> $file
{
  next if %all{$file} == $dir-count;                                   # [7]

  my @row;

  for @dirs -> $dir
  {
    @row.push: "{ (%dir{$dir}{$file} ?? $file !! '').fmt("%-{$max-length}s") }";
  }

  say @row.join(" | ");
}

[1] A slurpy array (the *@) to hold the directory names. Note the where clauses; the first one enforces at least three arguments, and the second and third ensures that they are directories that we can read.

[2] The number of directories, to be used when we check if the file is present in all the directories.

[3] This variable holds all the files. The keys are the directory names (so duplicates will sort of work).

[4] Iterate over the directories, and place the Set of files in the hash. Note that this does not coerce the Set to a hash, as in the previous program.

[5] All the files. We get them by using .values on the hash - or rather, this gives us a list (a sequence, really) with a list of values for each directory. We want a flattened list, but flat will not work here. Applying .List before flattening does the trick (as explained in the documentation for flat). Then we coerce the list to a Bag, which is a hash like structure where the list elements are the keys - and the frequency is the value. Thus we have an easy way of checking for files present in all the directories.

See docs.raku.org/routine/flat for more information about flat.

See docs.raku.org/type/Bag for more information about the Bag type.

[6] The longest file and directory name. The first part gives us the number of keys for all the files (and embedded directories, if any), and the second part gives the same for the directories. I have applied max three times, to avoid the need for flattening.

[7] Skip files present in all the directories (using the Bag set up in [5]).

Running it, first on the old three directories to see that we get the same result as before:

$ ./k-dir-diff-multi dir_a dir_b dir_c
dir_a           | dir_b           | dir_c          
---------------------------------------------------
Comic_Sans.ttf  | Comic_Sans.ttf  |                
                | Courier_New.ttf | Courier_New.ttf
Georgia.ttf     |                 |                
                |                 | Monaco.ttf     
Old_Fonts/      |                 |                
                | Tahoma.ttf      |                

(We do.)

Then with the two additional directories added to the mix:

$ ./k-dir-diff-multi dir_a dir_b dir_c dir_d dir_e
dir_a               | dir_b               | dir_c               | dir_d               | dir_e              
--------------------------------------------------------------------------------------------------------------
Comic_Sans.ttf      | Comic_Sans.ttf      |                     |                     |                    
                    | Courier_New.ttf     | Courier_New.ttf     |                     |                    
                    |                     |                     | FrutigerBold.ttf    |                    
Georgia.ttf         |                     |                     |                     |                    
Helvetica.ttf       | Helvetica.ttf       | Helvetica.ttf       |                     |                    
                    |                     |                     |                     | HelveticaModern.ttf
Impact.otf          | Impact.otf          | Impact.otf          |                     | Impact.otf         
                    |                     | Monaco.ttf          |                     |                    
Old_Fonts/          |                     |                     |                     |                    
                    | Tahoma.ttf          |                     |                     |                    
Verdana.ttf         | Verdana.ttf         | Verdana.ttf         |                     |                    

Note that the number of dashes in the second row is slightly wrong. Also note that we have used a lot of padding for the first three directories, even though the long filenames only occur in the last two.

And that's it.