This is my response to the Perl Weekly Challenge #32.
Contributed by Neil Bowers
Create a script that either reads standard input or one or more files specified on the command-line. Count the number of times and then print a summary, sorted by the count of each entry. So with the following input in file example.txt
the script would display something like:
For extra credit, add a -csv option to your script, which would generate:
|
I'll start with a file only version, and look into the standard input support later:
File: line-counter-file
unit sub MAIN ($file where $file.IO.f && $file.IO.r = "example.txt");
# [1] ############################### # [2] ########
my %input = $file.IO.lines.Bag; # [3]
say "$_\t%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });
# [6] ############# # [4] ######### # [5] ################################
[1]
The input is a filename, and the file must exists (IO.f
)
and be readable (IO.r
).
[2] If we don't specify a file name, use the default value.
[3]
We read in the entire file (.IO.lines
), and
turn the resulting array into a Bag. A Bag is a version of a hash, where the values
in the array end up as the keys, and the values are
the number of times they occured in the array. So we don't have to count them ourselves.
[4] We iterate over the keys (in the Bag).
[5] A Bag is an unsorted data structure, so the
default behaviour is returning the keys in random order. I have chosen to sort them
on the values (the counters). I have reversed the placeholder variables
($^a
and $^b
) to get the largest value first.
[6] Print the keys and values, with a tab added for cheap tabulation. Note that this will not work out if the keys differ greatly in size.
See docs.perl6.org/type/Bag for more information about the Bag type.
Running it, with the sample file given in the challange (either explicitly or implicitly), gives the expected result:
$ raku line-counter example.txt
apple 3
cherry 2
banana 1
$ raku line-counter
apple 3
cherry 2
banana 1
$*ARGFILES
special (dynamic) variable:
File: line-counter-argfiles
my %input = $*ARGFILES.lines.Bag;
say "$_:\t%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });
Running it:
$ raku line-counter-argfiles example.txt
apple: 3
cherry: 2
banana: 1
$ cat example.txt example.txt | raku line-counter-argfiles
apple: 6
cherry: 4
banana: 2
$*ARGFILES
gives us whatever is entered on standard input, if any, and the
content of the files specified on the command line otherwise. If we specify neither, it
waits for user input which we can enter until we press «Control-d». This isn't very
user friendly.
Note that if we wrap $*ARGFILES
in a MAIN procedure, it will ignore any files specified on the command line, and insist
on standard input. See
docs.perl6.org/type/IO::ArgFiles#$*ARGFILES
for details. This was added in Raku version 6.d, so earlier versions (which you
definitely shouldn't use) treats $*ARGFILES
the same in- and outside of
MAIN.
In addition to the seemingly mute mode (when we don't specify a file, and don't pipe standard in), we got rid of the default filename when we ditched MAIN. Let's rectify both:
File: line-counter-main
multi sub MAIN ($file where $file.IO.f && $file.IO.r = "example.txt")
{
say "[file]";
my %input = $file.IO.lines.Bag;
say "$_:\t%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });
}
multi sub MAIN ()
{
say "[argfiles]";
my %input = $*ARGFILES.lines.Bag;
say "$_:\t%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });
}
Yes, the program is silly as it replicates way too much code. But it doesn't work as intended, and that is a greater concern. The second «multi MAIN» is never called. That is because the first one accepts no values, and uses the default filename in that case.
The solution is probably to go back to «line-counter-argfiles», and accept the problem of no default filename. (And we can probably get away with that, as the challenge don't specify what do to in such a situation.)
So let's do that, but factor out the duplicated code to a new procedure:
File: line-counter-csv
multi sub MAIN ($file where $file.IO.f && $file.IO.r, :$csv = False)
{
line-counter($file.IO.lines.Bag, $csv);
}
multi sub MAIN (:$csv = False)
{
line-counter($*ARGFILES.lines.Bag, $csv);
}
sub line-counter (%input, $csv)
{
my $separator = $csv ?? "," !! "\t";
say "$_$separator%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });
}
Running it:
$ raku line-counter-csv example.txt
apple 3
cherry 2
banana 1
$ raku line-counter-csv --csv example.txt
apple,3
cherry,2
banana,1
$ raku line-counter-csv -csv example.txt
apple,3
cherry,2
banana,1
$ cat example.txt example.txt | raku line-counter-csv
apple 6
cherry 4
banana 2
$ cat example.txt example.txt | raku line-counter-csv -csv
apple,6
cherry,4
banana,2
$ raku line-counter-csv
^C
Note that named Boolean arguments can be specified with either one or two leading hypens; e.g. «-csv» and «--csv».
The last example seemingly hangs as we haven't specified any files or given it any input. So I killed the program with «Control-c».
apple
banana
apple
cherry
cherry
apple
junkfood
junkfood
$ raku line-counter-csv example2.txt
apple 3
junkfood 2
cherry 2
banana 1
$ raku line-counter-csv example2.txt
apple 3
cherry 2
junkfood 2
banana 1
The cheap tabulation looks really cheap here, so I'll take a look at that as well:
File: line-counter-csv-fixed
multi sub MAIN ($file where $file.IO.f && $file.IO.r, :$csv = False)
{
line-counter($file.IO.lines.Bag, $csv);
}
multi sub MAIN (:$csv = False)
{
line-counter($*ARGFILES.lines.Bag, $csv);
}
sub line-counter (%input, $csv)
{
my $max = %input.keys>>.chars.max; # [1]
######### # [1a] #### # [1b] # [1c]
for %input.keys.sort({ %input{$^b} <=> %input{$^a} || $^a cmp $^b }) # [2]
# [2a] #################### # [2b] ####
{
say $csv # [3]
?? "$_,%input{$_}" # [3a]
!! "{ $_ }{ " " x ($max - .chars) } { %input{$_} }"; # [3b]
}
}
[1] The length of the longest identifier. We start with all the keys
(aa), then apply «chars» on each element in the list (ab), and collapse it into a
single value with «max» (ac). Note that >>.
invokes a method on
all the elements separately, and not on the whole list as with a normal .
call.
[2] If the first part (2a) gives equal, we go on to the second part (2b).
[3] If csv, use a comma (3a). If not, pad with spaces (3b).
See docs.perl6.org/routine/x for information about the String repetition operator «x», used here to get the padding.
Running it to show that it works as intended:
$ raku line-counter-csv-fixed example2.txt
apple 3
cherry 2
junkfood 2
banana 1
$ raku line-counter-csv-fixed example2.txt
apple 3
cherry 2
junkfood 2
banana 1
Contributed by Neil Bowers
Write a function that takes a hashref where the keys are labels and the values are integer or floating point values. Generate a bar graph of the data and display it to stdout. The input could be something like:
And would then generate something like this:
If you fancy then please try this as well: (a) the function could let you specify
whether the chart should be ordered by (1) the labels, or (2) the values.
|
Let's dive straight in. The challenge gives part of the code, which works equally well in Perl and Raku:
File: abc-unsorted
my $data = { apple => 3, cherry => 2, banana => 1 };
generate_bar_graph($data);
sub generate_bar_graph ($data)
{
my $max = %($data).keys>>.chars.max;
for %($data).kv -> $label, $count
{
say "{ " " x ($max - $label.chars) }$label | { "#" x 4 * $count }";
}
}
Running it:
$ raku abc-unsorted
banana | ####
cherry | ########
apple | ############
$ raku abc-unsorted
banana | ####
apple | ############
cherry | ########
$ raku abc-unsorted
cherry | ########
banana | ####
apple | ############
unit sub MAIN (Str :$sort where $sort eq any("", "values", "labels") = ""); # [1]
my $data = { apple => 3, cherry => 2, banana => 1 };
generate_bar_graph($data, $sort); # [2]
sub generate_bar_graph ($data, $sort) # [2]
{
my $max = %($data).keys>>.chars.max; # [3]
my @keys = %($data).keys; # [4]
if $sort eq "values" # [5]
{
@keys = @keys.sort({ %($data){$^b} cmp %($data){$^a} });
}
elsif $sort eq "labels" # [6]
{
@keys = @keys.sort;
}
for @keys -> $label # [7]
{
say "{ " " x ($max - $label.chars) }$label | { "#" x 4 * %($data){$label} }";
}
}
[1] Use the named parameter «sort» to set thesort order. The values are «values» and
«labels» (in addition to an empty string for random order). I use a junction
(any
) to list the legal values, and the assignment is the default value
(which is used if we don't specify one on the command line).
[2] We pass on the sort parameter.
[3] The length of the longest identifier. See note #3 of «line-counter-csv-fixed» above for a detailed explanation.
[4] Get the keys, in random order. (And note that the line above could be simplified to
my $max = @keys>>.chars.max;
if we move it after this line.)
[5] Sort by values, if requested.
[6] Sort by labels, if requested.
[7] We use the length of the longest identifier to prexit the labels with spaces, giving a right justified column.
Running it:
$ raku abc --sort=values
apple | ############
cherry | ########
banana | ####
$ raku abc --sort=values
apple | ############
cherry | ########
banana | ####
$ raku abc --sort=labels
apple | ############
banana | ####
cherry | ########
$ raku abc --sort=labels
apple | ############
banana | ####
cherry | ########
$ raku abc
cherry | ########
banana | ####
apple | ############
$ raku abc
banana | ####
apple | ############
cherry | ########
$ raku abc --sort=foo
Usage:
abc [--sort=]
We should test floating point numbers:
File: abc-float (changes only):
my $data = { apple => pi, cherry => e, banana => 0.3, junkfood => 0.6 };
Running it:
$ raku abc-float
cherry | ##########
banana | #
apple | ############
junkfood | ##
«pi» and «e» are numerical constants. Try this in REPL mode if you want to see the actual values:
$ raku
To exit type 'exit' or '^D'
> pi
3.141592653589793
> e
2.718281828459045
And that's it.