Read 'n' Search with Raku and Perl

[114] Published 6. February 2021.

This is my response to the Perl Weekly Challenge #098.

Challenge #098.1: Read N-characters

You are given file $FILE.

Create subroutine readN($FILE, $number) returns the first n-characters and moves the pointer to the (n+1)th character.

Example:

Input: Suppose the file (input.txt) contains "1234567890"
Output:
    print readN("input.txt", 4); # returns "1234"
    print readN("input.txt", 4); # returns "5678"
    print readN("input.txt", 4); # returns "90"

The Not-Given File

We can use the shell to generate the input file for us:

$ echo "1234567890" > input.txt

Note that the file will end up with a trailing newline character. Editing it in e.g. emacs does not help. The trailing newline is still there, as shown with my «hex-dump» program (from Small Inversions with Raku - my response to the Perl Weekly Challenge #024):

$ hex-dump input.txt
31 32 33 34 35 36 37 38 39 30 | 1234567890
0A                            | ␍

We can do it with spurt in Raku (and REPL):

> spurt 'input.txt', '1234567890'

And this time it works. No trailing newline:

$ hex-dump input.txt
31 32 33 34 35 36 37 38 39 30 | 1234567890

See docs.raku.org/routine/spurt for more information about spurt.

We need a way of keeping track of where in the file we are, and storing the filehandle is the obvious way. (We could store the position instead, but then we'd have to open-read-close the file again and again and use seek to fastforward past the already read characters. But that does not work, as seek count bytes and not graphemes (unicode characters).)

Storing the filehandle inside the procedure ensures that it isn't available outside of it. We can do that with a state variable.

The disadvantage of this approach is that we keep a lot of open filehandles.

File: read-n

#! /usr/bin/env raku

subset PosInt of Int where * > 0;            # [2]

unit sub MAIN (:f(:$file) where $file.IO.f && $file.IO.r = 'input.txt',  # [1]
               PosInt :s($size)  = 4,        # [2a]
	       PosInt :t($times) = 3);       # [2b]

say readN($file, $size) for ^$times;         # [3]

sub readN ($FILE, $number)                   # [4]
{
  state %handle;                             # [5]

  %handle{$FILE} = $FILE.IO.open unless %handle{$FILE};                  # [6]

  return %handle{$FILE}.readchars($number);  # [7]
}

[1] Specify the file as a named argument, or use the default filename. It must be a file (IO.f) and be readable (IO.r).

[2] We can set the number of characters to read each time [2a] and how many iterations [2b] we want - or accept the default values. The values must be positive integers.

[3] Read and print the specified number of characters, in a loop.

[4] The procedure.

[5] We store the filehandles in this state variable.

[6] Do we have a filehandle for the given file? If not open it and save the handle.

[7] Read the specified number of characters and return them.

See docs.raku.org/syntax/state for more information about state.

See docs.raku.org/routine/open for more information about open.

See docs.raku.org/routine/readchars for more information about readchars.

Running it:

$ ./read-n
1234
5678
90

Reading past the end of the file is not a problem (i.e. the program doesn't crash or protest i nany way). We get zero characters from readchars, but the say gives us a newline (an empty line) each time:

$ ./read-n -t=5
1234
5678
90

$ ./read-n -t=10
1234
5678
90

This is not very nice, and is easy to fix. We could use print, as the challenge implies, and avoid the newlines. But that would result in the complete file beeing printed on one single line (if the file itself does not contain newlines), which makes it impossible to see that it reads only a certain numbers of characters each time. So I'll refrain from implementing this.

Note that we keep all the filehandles, even after we have read to the end of the files, and we will not get rid of them until program termination. It is easy to close the files where we have reached the end, as long as the program does not continue reading from it:

File: read-n-eof (changes only)

sub readN ($FILE, $number)
{
  state %handle;

  %handle{$FILE} = $FILE.IO.open unless %handle{$FILE};

  my $string = %handle{$FILE}.readchars($number);  # [2a]

  %handle{$FILE}:delete if %handle{$FILE}.eof;     # [1]

  return $string;                                  # [2]
}

[1] Delete the filehandle (which will close it as well, as we have killed off the variable) if we have reached the end of the file. Note that it is probably a good idea to close it explicitly.

[2] Replace the green part with the non-green part of [2a] to get the previous version of the program.

See https://docs.raku.org/language/subscripts#index-entry-:delete_(subscript_adverb) for more information about the :delete adverb.

See docs.raku.org/routine/eof for more information about eof.

Running it:

$ ./read-n-eof -t=10
1234
5678
90
1234
5678
90
1234
5678
90
1234

This is probably not very useful.

We could have added another state variable called e.g. %finished, and added an entry here after reaching the end of file. And then refuse to open it again afterwards. The program would then behave just like the first version, adding trailing newlines en masse. The procedure returns the read characters, and we could check that we actually got something. It would mean rewriting the program, breaking the example. (Which may actually be a good idea..)

Here is a simpler version, that goes on printing as long as we get characters from the «readN» call. Note that we do not close the file here, so we are back where we started with potentially a lot of open files.

File: read-n-last

#! /usr/bin/env raku

subset PosInt of Int where * > 0;

unit sub MAIN (:f(:$file) where $file.IO.f && $file.IO.r = 'input.txt',
               PosInt :s($size)  = 4);         # [1]

say ( readN($file, $size) // last ) for ^Inf;  # [1a]

sub readN ($FILE, $number)
{
  state %handle;
  
  %handle{$FILE} = $FILE.IO.open unless %handle{$FILE};

  return %handle{$FILE}.readchars($number);
}

[1] Note that the «:t($times)» argument has gone, as it has been replaced by an eternal loop (in [1a]). Note the last call that terminates the loop when we get an undefined value from «readN» - which we do after reaching the end of file.

Running it:

$ ./read-n-last -s=2
12
34
56
78
90

$ ./read-n-last -s=3
123
456
789
0

$ ./read-n-last -s=5
12345
67890

$ ./read-n-last -s=9
123456789
0

A Perl Version

This is straight forward translation of the first Raku version.

File: read-n-perl

#! /usr/bin/env perl

use strict;
use warnings;
use feature 'say';
use feature 'signatures';
use feature 'state';
use Getopt::Long;

no warnings "experimental::signatures";

my $file  = 'input.txt';
my $size  = 4;
my $times = 3;

GetOptions("file=s"  => \$file,
           "size=i"  => \$size,
	   "times=i" => \$times);

die "Not a file: " unless -f $file && -r $file;

say readN($file, $size) for 1 .. $times;

sub readN ($FILE, $number)
{
  state %handle;

  open($handle{$FILE}, "<", $FILE) unless $handle{$FILE};

  my $string;
  read $handle{$FILE}, $string, $number;
  return $string;
}

Running it gives the same result as «read-n»:

$ ./read-n-perl
1234
5678
90

$ ./read-n-perl -times=2
1234
5678

$ ./read-n-perl -size=2
12
34
56

$ ./read-n-perl -size=2 -times=4
12
34
56
78

$ ./read-n-perl -size=2 -times=5
12
34
56
78
90

Note that the =i in the GetOptions call requires integer values. We ought to check that they are positive, as the custom type did for us in the Raku version(s). Feel free to do so...

Challenge #098.2: Search Insert Position

You are given a sorted array of distinct integers @N and a target $N.

Write a script to return the index of the given target if found otherwise place the target in the sorted array and return the index.

Example 1:

Input: @N = (1, 2, 3, 4) and $N = 3
Output: 2 since the target 3 is in the array at the index 2.

Example 2:

Input: @N = (1, 3, 5, 7) and $N = 6
Output: 3 since the target 6 is missing and should be placed at the index 3.

Example 3:

Input: @N = (12, 14, 16, 18) and $N = 10
Output: 0 since the target 10 is missing and should be placed at the index 0.

Example 4:

Input: @N = (11, 13, 15, 17) and $N = 19
Output: 4 since the target 19 is missing and should be placed at the index 4.

We could start at the beginning of @N and go on until we have a match, we have passed it (i.e. should insert it), or have reached the end. That certainly works for the arrays given in the examples. But it is not efficient with very long arrays. So I'll start at the middle of the array (delta: halve the size). If we have not found the match, we add half the previous delta value (with the correct sign depending on the direction). This we do as long we haven't either found the value, found that it is missing, or have reached the end or beginning.

File: search-insert-position

#! /usr/bin/env raku

unit sub MAIN (*@N where (@N.elems && all(@N) ~~ Int && [<] @N), # [1]
               Int :$N,                                          # [2]
	       :v(:$verbose)
	      );

my $end = @N.end;                         # [3]

my $delta = ($end/2).round;               # [4]
my $try   = $delta;                       # [5]

loop                                      # [6]
{
  say ": try at $try" if $verbose;

  if $try == 0 && $N < @N[0]              # [7]
  {
    say 0;
    last;
  }
  elsif $try == $end && $N > @N[$end]     # [8]
  {
    say $end +1;
    last;
  }
  
  if @N[$try] == $N                       # [9]
  {
    say $try;
    last;
  }
  elsif @N[$try] < $N && @N[$try+1] > $N  # [10]
  {
    say $try +1;
    last;
  }

  $delta = ($delta/2).round;              # [11]

  if @N[$try] < $N                        # [12]
  {
    $try += $delta;
  }
  elsif @N[$try] > $N                     # [13]
  {
    $try -= $delta;
  }
}

[1] A slurpy array. Ensure at least one value (@N.elems), that all of them are integers (all(@N) ~~ Int), and finally that they are in increasing order ([<] @N, using the Reduction Metaoperator []).

[2] A named argument for this one.

[3] The index of the last element.

[4] Halve the delta value each time. Using round to get the an integer value ensures that we do not end up with zero (as we would have if we had removed the fractional part of «1/2»). Zero would have stopped the search (as adding - or subtracting - zero would not have advanced the index).

[5] Where to start (have a try).

[6] An eternal loop, with a lot of exits (the last calls).

[7] Have we reached the beginning, and the value we look for is lower than the one found there? (If so, place it as the new beginning.)

[8] Have we reached the end, and the value we look for is lower than the one found there? (If so, we place it after the current end.)

[9] Do we have a match? (If so, print the position.)

[10] Is the current value lower and the next one (to the right) higher than the one we look for? (If so, we place it at «position + 1».)

[11] Halve the delta value, and ensure (with round) that it is 1 or more.

[12] Add the delta (to the index), if we are below the value we search for.

[13] Or subtract it, if we are above.

Note that the elsifs in [8] and [9] can be replaced with ifs. I have chosen to write it like this to show that [7] ⇄ [8] and [9] ⇄ [10] are related.

See docs.raku.org/language/operators#Reduction_metaoperators for more information about the Reduction Metaoperator [].

See docs.raku.org/routine/end for more information about end.

See docs.raku.org/routine/round for more information about round.

Running it on the examples:

$ ./search-insert-position --N=3 1 2 3 4
2

$ ./search-insert-position --N=6 1 3 5 7
3

$ ./search-insert-position --N=10 12 14 16 18
0

$ ./search-insert-position --N=19 11 13 15 17
4

With verbose mode:

$ ./search-insert-position --N=3 -v 1 2 3 4
: try at 2
2

$ ./search-insert-position --N=6 -v 1 3 5 7
: try at 2
3

$ ./search-insert-position --N=10 -v 12 14 16 18
: try at 2
: try at 1
: try at 0
0

$ ./search-insert-position --N=19 -v 11 13 15 17
: try at 2
: try at 3
4

Looking good.

The program really is overkill on small arrays. So we should try it out on a large one. Instead of typing the array on the command line (which is tedious), writing a (shell script) wrapper doing it (which is also tedious), we can ask the program to do it for us.

Here it is with multiple dispatch (the multi keyword):

File: search-insert-position-range (changes only)

multi sub MAIN ($range where
   ($range ~~ /(\d+)\.\.(\d+)/ && $0 < $1),                 # [1]
                Int :$N, :v(:$verbose))
{
  $range ~~ /(\d+)\.\.(\d+)/;                               # [1a]
  MAIN($0 .. $1, :$N, :$verbose);                           # [1b]
}


multi sub MAIN ($sequence where
   ($sequence ~~ /(\d+)\,(\d+)\.\.(\d+)/ && $0 < $1 < $2),  # [2]
                Int :$N, :v(:$verbose))
{
  $sequence ~~ /(\d+)\,(\d+)\.\.(\d+)/;                     # [2a]
  MAIN(($0.Int, $1.Int ... $2.Int).eager, :$N, :$verbose);  # [2b]
}

multi sub MAIN (*@N where (@N.elems && all(@N) ~~ Int && [<] @N),
                Int :$N, :v(:$verbose))
{
…
}

[1] If we have specified one (positional) argument that satisfies the regex (<integer>..<integer>), get the two integer values [1a] and call the other multi MAIN with the Range.

[2] As above, but for a Sequence - where we specify the first two values and the upper limit (as <integer>,<integer>...<integer>).

See docs.raku.org/syntax/multi for more information about Multiple Dispatch and mutli.

Matches and Sequences

Note that the first match comes in $0, and not $1 as in Perl.

Also note that $0 (and so on) is a match object, and not the matching string itself. That isn't a problem when used in a context that coerces the value (to an integer or a string), but it is generally better to err on the side of caution.

This works, as the Range operator (..) coerces the operands to integers:

> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 .. $1);
12..29

Ranges (and Sequences) are lazy data structures, where the values are generated when actually needed (accessed). We can slap on an eager call to expand it:

> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 .. $1).eager;
(12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29)
						 
> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 .. $1).eager.WHAT
(List)

> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 .. $1).WHAT
(Range)

But doing this with the Sequence operator (...) does not work, as it does not coerce:

> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 ... $1);
Cannot get sequence start value from an empty list

With explicit coercion:

> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0.Int ... $1.Int);
(12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29)

See docs.raku.org/routine/eager for more information about eager

See e.g. my Centenary Sequences with Raku - Part 1: Raku article for more information about Ranges, Sequences and the differences betwen them.

Running it:

$ ./search-insert-position-range -N=10 1 100
1

$ ./search-insert-position-range -v -N=10 1 100
: @N: 1,100
: try at 1
: try at 0
1

$ ./search-insert-position-range -N=10 1..100
9

$ ./search-insert-position-range -v -N=10 1..100
: @N: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,\
27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,\
52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,\
77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
: try at 50
: try at 25
: try at 12
: try at 5
: try at 9
9

And with some sequences:

$ ./search-insert-position-range -v -N=10 "1,3...100"
: @N: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,\
51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99
: try at 25
: try at 12
: try at 5
: try at 1
: try at 3
: try at 4
5

$ ./search-insert-position-range -v -N=10 "2,4...100"
: @N: 2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,\
52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100
: try at 25
: try at 12
: try at 5
: try at 1
: try at 3
: try at 4
4

$ ./search-insert-position-range -v -N=10 "1,6...100"
: @N: 1,6,11,16,21,26,31,36,41,46,51,56,61,66,71,76,81,86,91,96
: try at 10
: try at 5
: try at 2
: try at 0
: try at 1
2

$ ./search-insert-position-range -v -N=1000 1,3...100000
: @N: 1,3,5,7,9,11,13,15,17,…
: try at 25000
: try at 12500
: try at 6250
: try at 3125
: try at 1562
: try at 780
: try at 389
: try at 585
: try at 487
: try at 536
: try at 511
: try at 498
: try at 505
: try at 501
: try at 499
500

Perl

This is a straight forward translation of the first Raku version, except for the check that the array values are integers - and in increasing order. This is something that is hard to do in Perl.

File: search-insert-position-perl

#! /usr/bin/env perl

use strict;
use warnings;
use feature 'say';
use Getopt::Long;

my $N;
my $verbose = 0;

GetOptions("N=i"      => \$N,
           "verbose"  => \$verbose);

my @N = @ARGV;

my $end = @N -1;

my $delta = int(0.5 + $end/2);
my $try   = $delta;

while (1)
{
  say ": try at $try" if $verbose;

  if ($try == 0 && $N < $N[0])
  {
    say 0;
    last;
  }
  elsif ($try == $end && $N > $N[$end])
  {
    say $end +1;
    last;
  }
  
  if ($N[$try] == $N)
  {
    say $try;
    last;
  }
  elsif ($N[$try] < $N && $N[$try+1] > $N)
  {
    say ++$try;
    last;
  }

  $delta = int(0.5 + $delta/2);

  if ($N[$try] < $N)
  {
    $try += $delta;
  }
  elsif ($N[$try] > $N)
  {
    $try -= $delta;
  }
}

Running it gives the same result as «search-insert-position»:

$ ./search-insert-position-perl -v -N=3 1 2 3 4
: try at 2
2

$ ./search-insert-position-perl -v -N=6 1 3 5 7
: try at 2
3

$ ./search-insert-position-perl -v -N=6 1 3 5 7
: try at 2
3

$ ./search-insert-position-perl -v -N=10 12 14 16 18
: try at 2
: try at 1
: try at 0
0

And that's it.

Links

About me
Keyword Index
Raku Courses
Beginning Raku (book)
RSS Feed
Code (zip)
Reddit Comments