The Raku Instance Bar

[40] Published 2. November 2019

This is my response to the Perl Weekly Challenge #32.

Challenge 32.1: Count Instances

Contributed by Neil Bowers

Create a script that either reads standard input or one or more files specified on the command-line. Count the number of times and then print a summary, sorted by the count of each entry.

So with the following input in file example.txt

apple
banana
apple
cherry
cherry
apple

the script would display something like:

apple     3
cherry    2
banana    1

For extra credit, add a -csv option to your script, which would generate:

apple,3
cherry,2
banana,1

I'll start with a file only version, and look into the standard input support later:

File: line-counter-file

unit sub MAIN ($file where $file.IO.f && $file.IO.r = "example.txt");
               # [1] ###############################  # [2] ########

my %input = $file.IO.lines.Bag; # [3]

say "$_\t%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });
# [6] #############  # [4] ######### # [5] ################################

[1] The input is a filename, and the file must exists (IO.f) and be readable (IO.r).

[2] If we don't specify a file name, use the default value.

[3] We read in the entire file (.IO.lines), and turn the resulting array into a Bag. A Bag is a version of a hash, where the values in the array end up as the keys, and the values are the number of times they occured in the array. So we don't have to count them ourselves.

[4] We iterate over the keys (in the Bag).

[5] A Bag is an unsorted data structure, so the default behaviour is returning the keys in random order. I have chosen to sort them on the values (the counters). I have reversed the placeholder variables ($^a and $^b) to get the largest value first.

[6] Print the keys and values, with a tab added for cheap tabulation. Note that this will not work out if the keys differ greatly in size.

See docs.perl6.org/type/Bag for more information about the Bag type.

Running it, with the sample file given in the challange (either explicitly or implicitly), gives the expected result:

$ raku line-counter example.txt
apple	3
cherry	2
banana	1

$ raku line-counter
apple	3
cherry	2
banana	1

Standard Input

This is quite straightforward with the $*ARGFILES special (dynamic) variable:

File: line-counter-argfiles

my %input = $*ARGFILES.lines.Bag;

say "$_:\t%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });

Running it:

$ raku line-counter-argfiles example.txt 
apple:	3
cherry:	2
banana:	1

$ cat example.txt example.txt | raku line-counter-argfiles 
apple:	6
cherry:	4
banana:	2

$*ARGFILES gives us whatever is entered on standard input, if any, and the content of the files specified on the command line otherwise. If we specify neither, it waits for user input which we can enter until we press «Control-d». This isn't very user friendly.

Note that if we wrap $*ARGFILES in a MAIN procedure, it will ignore any files specified on the command line, and insist on standard input. See docs.perl6.org/type/IO::ArgFiles#$*ARGFILES for details. This was added in Raku version 6.d, so earlier versions (which you definitely shouldn't use) treats $*ARGFILES the same in- and outside of MAIN.

In addition to the seemingly mute mode (when we don't specify a file, and don't pipe standard in), we got rid of the default filename when we ditched MAIN. Let's rectify both:

File: line-counter-main

multi sub MAIN ($file where $file.IO.f && $file.IO.r = "example.txt")
{
  say "[file]";
  my %input = $file.IO.lines.Bag;
  say "$_:\t%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });
}

multi sub MAIN ()
{
  say "[argfiles]";
  my %input = $*ARGFILES.lines.Bag;
  say "$_:\t%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });
}

Yes, the program is silly as it replicates way too much code. But it doesn't work as intended, and that is a greater concern. The second «multi MAIN» is never called. That is because the first one accepts no values, and uses the default filename in that case.

The solution is probably to go back to «line-counter-argfiles», and accept the problem of no default filename. (And we can probably get away with that, as the challenge don't specify what do to in such a situation.)

CSV

Except that the CSV option screams out for a «MAIN» to handle the optional argument...

So let's do that, but factor out the duplicated code to a new procedure:

File: line-counter-csv

multi sub MAIN ($file where $file.IO.f && $file.IO.r, :$csv = False)
{
  line-counter($file.IO.lines.Bag, $csv);
}

multi sub MAIN (:$csv = False)
{
  line-counter($*ARGFILES.lines.Bag, $csv);
}

sub line-counter (%input, $csv)
{
  my $separator = $csv ?? "," !! "\t";
  say "$_$separator%input{$_}" for %input.keys.sort( { %input{$^b} <=> %input{$^a} });
}

Running it:

$ raku line-counter-csv example.txt 
apple	3
cherry	2
banana	1

$ raku line-counter-csv --csv example.txt 
apple,3
cherry,2
banana,1

$ raku line-counter-csv -csv example.txt 
apple,3
cherry,2
banana,1

$ cat example.txt example.txt | raku line-counter-csv 
apple	6
cherry	4
banana	2

$ cat example.txt example.txt | raku line-counter-csv -csv
apple,6
cherry,4
banana,2

$ raku line-counter-csv
^C

Note that named Boolean arguments can be specified with either one or two leading hypens; e.g. «-csv» and «--csv».

The last example seemingly hangs as we haven't specified any files or given it any input. So I killed the program with «Control-c».

Sorted, but not sorted

Note that the nice sort by value will not know what to do if we have two or more identical values, so they will come out in random order. As shown here:

File: example2.txt

apple
banana
apple
cherry
cherry
apple
junkfood
junkfood

$ raku line-counter-csv example2.txt
apple	3
junkfood	2
cherry	2
banana	1

$ raku line-counter-csv example2.txt
apple	3
cherry	2
junkfood	2
banana	1

The cheap tabulation looks really cheap here, so I'll take a look at that as well:

File: line-counter-csv-fixed

multi sub MAIN ($file where $file.IO.f && $file.IO.r, :$csv = False)
{
  line-counter($file.IO.lines.Bag, $csv);
}

multi sub MAIN (:$csv = False)
{
  line-counter($*ARGFILES.lines.Bag, $csv);
}

sub line-counter (%input, $csv)
{
  my $max = %input.keys>>.chars.max;    # [1]
  ######### # [1a] #### # [1b]  # [1c]

  for %input.keys.sort({ %input{$^b} <=> %input{$^a} || $^a cmp $^b }) # [2]
                         # [2a] ####################    # [2b] #### 
  {
    say $csv                                                 # [3]
      ?? "$_,%input{$_}"                                     # [3a]
      !! "{ $_ }{ " " x ($max - .chars) } { %input{$_} }";   # [3b]
  }    
}

[1] The length of the longest identifier. We start with all the keys (aa), then apply «chars» on each element in the list (ab), and collapse it into a single value with «max» (ac). Note that >>. invokes a method on all the elements separately, and not on the whole list as with a normal . call.

[2] If the first part (2a) gives equal, we go on to the second part (2b).

[3] If csv, use a comma (3a). If not, pad with spaces (3b).

See docs.perl6.org/routine/x for information about the String repetition operator «x», used here to get the padding.

Running it to show that it works as intended:

$ raku line-counter-csv-fixed example2.txt
apple    3
cherry   2
junkfood 2
banana   1

$ raku line-counter-csv-fixed example2.txt
apple    3
cherry   2
junkfood 2
banana   1

Challenge 32.2: ASCII bar chart

Contributed by Neil Bowers

Write a function that takes a hashref where the keys are labels and the values are integer or floating point values. Generate a bar graph of the data and display it to stdout.

The input could be something like:

$data = { apple => 3, cherry => 2, banana => 1 };
generate_bar_graph($data);

And would then generate something like this:

 apple | ############
cherry | ########
banana | ####

If you fancy then please try this as well: (a) the function could let you specify whether the chart should be ordered by (1) the labels, or (2) the values.

Let's dive straight in. The challenge gives part of the code, which works equally well in Perl and Raku:

File: abc-unsorted

my $data = { apple => 3, cherry => 2, banana => 1 };

generate_bar_graph($data);

sub generate_bar_graph ($data)
{
  my $max = %($data).keys>>.chars.max;

  for %($data).kv -> $label, $count
  {
    say "{ " " x ($max - $label.chars) }$label | { "#" x 4 * $count }"; 
  }
}

Running it:

$ raku abc-unsorted
banana | ####
cherry | ########
 apple | ############

$ raku abc-unsorted
banana | ####
 apple | ############
cherry | ########

$ raku abc-unsorted
cherry | ########
banana | ####
 apple | ############

Sorted output

The data is stored in a hash, so the output order is random. I didn't add sortings (as done in challenge 32.1) as this challenge asked for it as a bonus. But I'll do that now:

File: abc

unit sub MAIN (Str :$sort where $sort eq any("", "values", "labels") = ""); # [1]

my $data = { apple => 3, cherry => 2, banana => 1 };

generate_bar_graph($data, $sort);       # [2]

sub generate_bar_graph ($data, $sort)   # [2]
{
  my $max = %($data).keys>>.chars.max;  # [3]

  my @keys = %($data).keys;             # [4]

  if $sort eq "values"                  # [5]
  {
    @keys = @keys.sort({ %($data){$^b} cmp %($data){$^a} });
  } 
  elsif $sort eq "labels"               # [6]
  {
    @keys = @keys.sort;
  }                  

  for  @keys -> $label                  # [7]
  {
    say "{ " " x ($max - $label.chars) }$label | { "#" x 4 * %($data){$label} }"; 
  }
}

[1] Use the named parameter «sort» to set thesort order. The values are «values» and «labels» (in addition to an empty string for random order). I use a junction (any) to list the legal values, and the assignment is the default value (which is used if we don't specify one on the command line).

[2] We pass on the sort parameter.

[3] The length of the longest identifier. See note #3 of «line-counter-csv-fixed» above for a detailed explanation.

[4] Get the keys, in random order. (And note that the line above could be simplified to my $max = @keys>>.chars.max; if we move it after this line.)

[5] Sort by values, if requested.

[6] Sort by labels, if requested.

[7] We use the length of the longest identifier to prexit the labels with spaces, giving a right justified column.

Running it:

$ raku abc --sort=values
 apple | ############
cherry | ########
banana | ####

$ raku abc --sort=values
 apple | ############
cherry | ########
banana | ####

$ raku abc --sort=labels
 apple | ############
banana | ####
cherry | ########

$ raku abc --sort=labels
 apple | ############
banana | ####
cherry | ########

$ raku abc 
cherry | ########
banana | ####
 apple | ############

$ raku abc 
banana | ####
 apple | ############
cherry | ########

$ raku abc --sort=foo
Usage:
  abc [--sort=]

We should test floating point numbers:

File: abc-float (changes only):

my $data = { apple => pi, cherry => e, banana => 0.3, junkfood => 0.6 };

Running it:

$ raku abc-float
  cherry | ##########
  banana | #
   apple | ############
junkfood | ##

«pi» and «e» are numerical constants. Try this in REPL mode if you want to see the actual values:

$ raku
To exit type 'exit' or '^D'
> pi
3.141592653589793
> e
2.718281828459045

And that's it.

The Raku Instance Bar

Challenge 32.1: Count Instances

Standard Input

CSV

Sorted, but not sorted

Challenge 32.2: ASCII bar chart

Sorted output

Links