This is my response to the Perl Weekly Challenge #166.
0xDeadBeef
and 0xC0dedBad
. I want more!
../../../data/dictionary.txt
(relative to your script’s location in our GitHub repository) to access the dictionary of common
words from Week #161.
0xFee1Face
),
rather than just individual words.
#! /usr/bin/env raku
unit sub MAIN (:d(:$dictionary) where $dictionary.IO.r = 'dictionary.txt'); # [1]
my @dict
= $dictionary.IO.lines.grep( 1 < *.chars <= 8 ).grep( * ~~ /^<[abcdefolist]>+$/);
# [2]
@dict.map({ say "0x" ~ TR/olist/01157/ }); # [3]
[0] See if you can figure out the «olist» part of the filename.
[1] I have placed (a copy of) the dictionary in the current directory. Use the «d» command line option to override the location and/or filename.
[2] Read the file, into an array (consisting of one entry per row in the file - which
happen to contain one word each). Then we use grep
to remove words with more
than 8 characters, followed by another grep
that ensures valid letters
(a-f, i,l,o,s and t) only.
[3]
Then we use map
to translate the non-hex letters to a
digit. Note the use of non-destructive transliteration (TR//
) instead of
in-place transliteration (tr//
). Also note that we do not use the return
value (of the map
) at all, but print the result (with say
)
inside the block.
See
docs.raku.org/language/operators#TR///_non-destructive_transliteration
for more information about TR///
.
See
docs.raku.org/syntax/tr///
for more information about the in-place transliteration operatortr///
.
Running it:
$ ./hexawords-olist
0xa
0xaba7e
...
0x7075
0x7077ed
We got 1463 words, which is way too much to wade through. Most of them are unreadable, and I dislike the mapping between some digits and letters (s => 5, t => 7), and do think that «1» shold be «l» (lowercase L) only, and not «i» (lowercase I).
Here is a modified version that can do just that, as well as the prescribed translations:
File: hexawords#! /usr/bin/env raku
unit sub MAIN
(
:d(:$dictionary) where $dictionary.IO.r = 'dictionary.txt',
:p(:$pure), # [1]
);
my @dict = $pure # [2]
?? $dictionary.IO.lines.grep( 1 < *.chars <= 8 ).grep( * ~~ /^<[abcdeflo]>+$/)
!! $dictionary.IO.lines.grep( 1 < *.chars <= 8 ).grep( * ~~ /^<[abcdefilost]>+$/);
@dict.map({ say "0x" ~ TR/olist/01157/ }); # [3]
[1] Use the «p» (pure) option to get the (in my view) obvious character to letter mapping.
[2] The result is a list of words with legal letters only, and the letters depend on pure mode.
[3] Do not let this confuse you. If we use pure mode, the «unpure» letters «i», «s» and «t» do not occur in the dictionary (courtesy of [2]) so we will not end up with «unpure» numbers in the words.
Running it without the «p» option gives the same result as before; 1463 words.
Running it with the «p» option gives 199 words:
$ ./hexawords -p
0xab1e
0xab0de
0xaccede
0xacceded
0xacc01ade
0xace
0xaced
0xad
0xadd
0xadded
0xad0
0xad0be
0xaffab1e
0xa1e
0xa11
0xa100f
0xbabb1e
0xbabb1ed
0xbabe
0xbad
0xbade
0xbaff1e
0xbaff1ed
0xba1d
0xba1ded
0xba1e
0xba1ed
0xba11
0xba11ad
0xba11ed
0xbe
0xbead
0xbeaded
0xbed
0xbedded
0xbee
0xbeef
0xbeefed
0xbefa11
0xbefe11
0xbe11
0xbe11ed
0xb1ab
0xb1abbed
0xb1ade
0xb1ed
0xb1eed
0xb10b
0xb10bbed
0xb10c
0xb100d
0xb100ded
0xb0a
0xb0b
0xb0bbed
0xb0de
0xb0ded
0xb01d
0xb00
0xb00ed
0xcab
0xcabbed
0xcab1e
0xcab1ed
0xcaca0
0xcad
0xca1f
0xca11
0xca11ab1e
0xca11ed
0xcc
0xcede
0xceded
0xce11
0xce110
0xc1ad
0xc1ef
0xc10d
0xc0a1
0xc0a1ed
0xc0b
0xc0bb1e
0xc0c0a
0xc0d
0xc0dded
0xc0de
0xc0ded
0xc0ffee
0xc01
0xc01d
0xc00
0xc00ed
0xc001
0xc001ed
0xdab
0xdabbed
0xdabb1e
0xdabb1ed
0xdad
0xdead
0xdeaf
0xdea1
0xdecade
0xdec0de
0xdec0ded
0xdeed
0xdeeded
0xdeface
0xdefaced
0xd0
0xd0d0
0xd0e
0xd01e
0xd01ed
0xd011
0xd011ed
0xd00d1e
0xd00d1ed
0xebb
0xebbed
0xee1
0xe1f
0xfab1e
0xfacade
0xface
0xfaced
0xfad
0xfade
0xfaded
0xfa11
0xfed
0xfee
0xfeeb1e
0xfeed
0xfee1
0xfe11
0xfe11ed
0xf1ea
0xf1ed
0xf1ee
0xf1eece
0xf1eeced
0xf100d
0xf100ded
0xf0a1
0xf0a1ed
0xf0ca1
0xf0e
0xf01d
0xf01ded
0xf00d
0xf001
0xf001ed
0x1ab
0x1abe1
0x1abe1ed
0x1ace
0x1aced
0x1ad
0x1ade
0x1aded
0x1ad1e
0x1ad1ed
0x1ead
0x1eaded
0x1eaf
0x1eafed
0x1ed
0x1ee
0x10ad
0x10adab1e
0x10aded
0x10af
0x10afed
0x10b
0x10bbed
0x10be
0x10ca1
0x10ca1e
0x1011
0x1011ed
0x0af
0x0b0e
0x0dd
0x0de
0x0f
0x0ff
0x0ffed
0x0ff10ad
0x01d
I'll leave it at that.
/
,
but otherwise treat it the same as a regular file.
dir_a:
Arial.ttf Comic_Sans.ttf Georgia.ttf Helvetica.ttf Impact.otf
Verdana.ttf Old_Fonts/
dir_b:
Arial.ttf Comic_Sans.ttf Courier_New.ttf Helvetica.ttf Impact.otf
Tahoma.ttf Verdana.ttf
dir_c:
Arial.ttf Courier_New.ttf Helvetica.ttf Impact.otf Monaco.ttf
Verdana.ttf
The output should look similar to the following:
dir_a | dir_b | dir_c
-------------- | --------------- | ---------------
Comic_Sans.ttf | Comic_Sans.ttf |
| Courier_New.ttf | Courier_New.ttf
Georgia.ttf | |
| | Monaco.ttf
Old_Fonts/ | |
| Tahoma.ttf |
Note that we are not concerned about file sizes, so we can get away with empty files. Which we can get with the Unix «touch» command. A little shell script can set up a directory structure, exactly as given in the challenge. This makes testing easier, as the challenge gives the expected result.
File: setup.sh#! /bin/sh
mkdir dir_a dir_a/Old_Fonts/ dir_b dir_c
cd dir_a
touch Arial.ttf Comic_Sans.ttf Georgia.ttf Helvetica.ttf Impact.otf Verdana.ttf
cd ../dir_b
touch Arial.ttf Comic_Sans.ttf Courier_New.ttf Helvetica.ttf Impact.otf Tahoma.ttf \
Verdana.ttf
cd ../dir_c
touch Arial.ttf Courier_New.ttf Helvetica.ttf Impact.otf Monaco.ttf Verdana.ttf
Let us start experimenting in REPL:
> "dir_a".IO.dir.join("\n")
dir_a/Impact.otf
dir_a/Comic_Sans.ttf
dir_a/Old_Fonts
dir_a/Arial.ttf
dir_a/Helvetica.ttf
dir_a/Verdana.ttf
dir_a/Georgia.ttf
The IO.dir
method gives us the content of a directory,
as a list of IO
objects. They stringify (when we print them) to the
filename. The path is uncluded, as we did not tun the command in the directory itself.
See
docs.raku.org/routine/dir for more information about IO.dir
So far, so good. But we do not want the directory part (in
the output). We can get rid of that with the IO.basename
method:
> "dir_a".IO.dir>>.basename.join("\n")
Impact.otf
Comic_Sans.ttf
Old_Fonts
Arial.ttf
Helvetica.ttf
Verdana.ttf
Georgia.ttf
See
docs.raku.org/routine/basename
for more information about IO.basename
.
That is better.
Note that the indir
function could have been used here instead. See
docs.raku.org/routine/indir for
more information about indir
.
We were asked to add a slash after directories, and stringification (of IO
objects) does not do that for us. But we can do it manually (with map
):
> "dir_a".IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).join("\n")
Impact.otf
Comic_Sans.ttf
Old_Fonts/
Arial.ttf
Helvetica.ttf
Verdana.ttf
Georgia.ttf
Then we do this for the three directories, and compare the result. Using hashes
is the obvious way (but a Set
saves us some typing):
/usr/bin/env raku
unit sub MAIN ($dir1 where $dir1.IO.d && $dir1.IO.r = 'dir_a', # [1]
$dir2 where $dir2.IO.d && $dir2.IO.r = 'dir_b',
$dir3 where $dir3.IO.d && $dir3.IO.r = 'dir_c',
);
my %dir1 = $dir1.IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).Set; # [2]
my %dir2 = $dir2.IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).Set;
my %dir3 = $dir3.IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).Set;
my %all = %dir1 (|) %dir2 (|) %dir3; # [3]
my $max-length = (%all.keys>>.chars.max, # [4]
$dir1.chars, $dir2.chars, $dir3.chars).max;
say "{ $dir1.fmt("%-{$max-length}s") } | \
{ $dir2.fmt("%-{$max-length}s") } | \
{ $dir3.fmt("%-{$max-length}s") }"; # [5]
say "-" x $max-length * 3 + 6; # [6]
for sort keys %all -> $file # [7]
{
next if %dir1{$file} && %dir2{$file} && %dir3{$file}; # [8]
say "{ (%dir1{$file} ?? $file !! '').fmt("%-{$max-length}s") } | " # [9]
~ "{ (%dir2{$file} ?? $file !! '').fmt("%-{$max-length}s") } | "
~ "{ (%dir3{$file} ?? $file !! '').fmt("%-{$max-length}s") }";
}
[1] The three directories, with default values as in the example. Ensure that
they are directories (IO.d
) and readable (IO.r
).
[2] Get the list of files/directories, and turn it into a Set
,
a hash like structure. The assignment coerces the Set to hash for us, so the
Set does not live very long.
[3] Get a list of all the files/directories, using the Set
Union operator (|)
.
See
docs.raku.org/routine/(|), infix ∪ for more information about the Set Union operator (|)
.
[4] The longest file name, used for padding purposes. Note that the trailing slash for directories is included, and the directory names (used in the header) are included as well.
[5] Print the directories, nicely padded.
[6] Note the use of the string repetition operator x
to generate the
separator line.
See
docs.raku.org/routine/x for more information about the string repetition
operator x
.
[7] Iterate over the files, alphabetically.
[8] Skip files that sre present in all three hashes (and directories).
[9] Print the file, if present and spaces otherwise, nicely padded.
Running it:
$ ./k-dir-diff
dir_a | dir_b | dir_c
---------------------------------------------------
Comic_Sans.ttf | Comic_Sans.ttf |
| Courier_New.ttf | Courier_New.ttf
Georgia.ttf | |
| | Monaco.ttf
Old_Fonts/ | |
| Tahoma.ttf |
Spot on.
And that's it.
Perhaps not.
The challenge specified «three or more» directories. So we should do that:
We need some more files and directories. This shell script will supply them for us:
File: setup.sh#! /bin/sh
mkdir dir_d dir_e
cd dir_d
touch Arial.ttf FrutigerBold.ttf
cd ../dir_e
touch Arial.ttf Impact.otf HelveticaModern.ttf
Note that «Arial.ttf» is present in all the directories (including the previous lot).
File: k-dir-diff-multi#! /usr/bin/env raku
unit sub MAIN (*@dirs where @dirs.elems >= 3 && all(@dirs) ~~ .IO.d # [1]
&& all(@dirs) ~~ .IO.r);
my $dir-count = @dirs.elems; # [2]
my %dir; # [3]
for @dirs -> $dir # [4]
{
%dir{$dir} = $dir.IO.dir.map({ .d ?? .basename ~ '/' !! .basename }).Set;
}
my %all = %dir.values>>.List.flat.Bag; # [5]
my $max-length = (%all.keys>>.chars.max, @dirs>>.chars.max).max; # [6]
my @head;
for @dirs -> $dir
{
@head.push: $dir.fmt("%-{$max-length}s");
}
say @head.join(" | ");
say "-" x (3 + $max-length) * $dir-count;
for sort keys %all -> $file
{
next if %all{$file} == $dir-count; # [7]
my @row;
for @dirs -> $dir
{
@row.push: "{ (%dir{$dir}{$file} ?? $file !! '').fmt("%-{$max-length}s") }";
}
say @row.join(" | ");
}
[1] A slurpy array (the *@
) to hold the directory names. Note the
where
clauses; the first one enforces at least three arguments, and
the second and third ensures that they are directories that we can read.
[2] The number of directories, to be used when we check if the file is present in all the directories.
[3] This variable holds all the files. The keys are the directory names (so duplicates will sort of work).
[4] Iterate over the directories, and place the Set of files in the hash. Note that this does not coerce the Set to a hash, as in the previous program.
[5]
All the files. We get them by using .values
on the
hash - or rather, this gives us a list (a sequence, really) with a list of values
for each directory. We want a flattened list, but flat
will not
work here. Applying .List
before flattening does the trick (as
explained in the documentation for flat
). Then we coerce the list
to a Bag
, which is a hash like structure where the list elements
are the keys - and the frequency is the value. Thus we have an easy way of
checking for files present in all the directories.
See
docs.raku.org/routine/flat
for more information about flat
.
See
docs.raku.org/type/Bag
for more information about the Bag
type.
[6] The longest file and directory name. The first part gives us the number of keys
for all the files (and embedded directories, if any), and the second part gives
the same for the directories. I have applied max
three times, to avoid
the need for flattening.
[7] Skip files present in all the directories (using the Bag set up in [5]).
Running it, first on the old three directories to see that we get the same result as before:
$ ./k-dir-diff-multi dir_a dir_b dir_c
dir_a | dir_b | dir_c
---------------------------------------------------
Comic_Sans.ttf | Comic_Sans.ttf |
| Courier_New.ttf | Courier_New.ttf
Georgia.ttf | |
| | Monaco.ttf
Old_Fonts/ | |
| Tahoma.ttf |
(We do.)
Then with the two additional directories added to the mix:
$ ./k-dir-diff-multi dir_a dir_b dir_c dir_d dir_e
dir_a | dir_b | dir_c | dir_d | dir_e
--------------------------------------------------------------------------------------------------------------
Comic_Sans.ttf | Comic_Sans.ttf | | |
| Courier_New.ttf | Courier_New.ttf | |
| | | FrutigerBold.ttf |
Georgia.ttf | | | |
Helvetica.ttf | Helvetica.ttf | Helvetica.ttf | |
| | | | HelveticaModern.ttf
Impact.otf | Impact.otf | Impact.otf | | Impact.otf
| | Monaco.ttf | |
Old_Fonts/ | | | |
| Tahoma.ttf | | |
Verdana.ttf | Verdana.ttf | Verdana.ttf | |
Note that the number of dashes in the second row is slightly wrong. Also note that we have used a lot of padding for the first three directories, even though the long filenames only occur in the last two.
And that's it.