This is my response to the Perl Weekly Challenge #098.
$FILE
.
(n+1)th
character.
Input: Suppose the file (input.txt) contains "1234567890"
Output:
print readN("input.txt", 4); # returns "1234"
print readN("input.txt", 4); # returns "5678"
print readN("input.txt", 4); # returns "90"
We can use the shell to generate the input file for us:
$ echo "1234567890" > input.txt
Note that the file will end up with a trailing newline character. Editing it in e.g. emacs does not help. The trailing newline is still there, as shown with my «hex-dump» program (from Small Inversions with Raku - my response to the Perl Weekly Challenge #024):
$ hex-dump input.txt
31 32 33 34 35 36 37 38 39 30 | 1234567890
0A | ␍
We can do it with spurt
in Raku (and REPL):
> spurt 'input.txt', '1234567890'
And this time it works. No trailing newline:
$ hex-dump input.txt
31 32 33 34 35 36 37 38 39 30 | 1234567890
See
docs.raku.org/routine/spurt
for more information about spurt
.
We need a way of keeping track of where in the file we are, and storing the
filehandle is the obvious way. (We could store the position instead, but then
we'd have to open-read-close the file again and again and use seek
to fastforward past the already read characters. But that does not work, as
seek
count bytes and not graphemes (unicode characters).)
Storing the filehandle inside the procedure ensures that it isn't available outside of it. We can do that with a state variable.
The disadvantage of this approach is that we keep a lot of open filehandles.
File: read-n
#! /usr/bin/env raku
subset PosInt of Int where * > 0; # [2]
unit sub MAIN (:f(:$file) where $file.IO.f && $file.IO.r = 'input.txt', # [1]
PosInt :s($size) = 4, # [2a]
PosInt :t($times) = 3); # [2b]
say readN($file, $size) for ^$times; # [3]
sub readN ($FILE, $number) # [4]
{
state %handle; # [5]
%handle{$FILE} = $FILE.IO.open unless %handle{$FILE}; # [6]
return %handle{$FILE}.readchars($number); # [7]
}
[1] Specify the file as a named argument, or use the default filename.
It must be a file (IO.f
) and be readable (IO.r
).
[2] We can set the number of characters to read each time [2a] and how many iterations [2b] we want - or accept the default values. The values must be positive integers.
[3] Read and print the specified number of characters, in a loop.
[4] The procedure.
[5] We store the filehandles in this state variable.
[6] Do we have a filehandle for the given file? If not open it and save the handle.
[7] Read the specified number of characters and return them.
See
docs.raku.org/syntax/state
for more information about state
.
See
docs.raku.org/routine/open
for more information about open
.
See
docs.raku.org/routine/readchars
for more information about readchars
.
Running it:
$ ./read-n
1234
5678
90
Reading past the end of the file is not a problem (i.e. the program
doesn't crash or protest i nany way). We get zero characters from
readchars
, but the say
gives us a newline (an
empty line) each time:
$ ./read-n -t=5
1234
5678
90
$ ./read-n -t=10
1234
5678
90
This is not very nice, and is easy to fix. We could use
print
, as the challenge implies, and avoid the newlines. But
that would result in the complete file beeing printed on one single line
(if the file itself does not contain newlines), which makes it impossible
to see that it reads only a certain numbers of characters each time. So
I'll refrain from implementing this.
Note that we keep all the filehandles, even after we have read to the end of the files, and we will not get rid of them until program termination. It is easy to close the files where we have reached the end, as long as the program does not continue reading from it:
File: read-n-eof (changes only)
sub readN ($FILE, $number)
{
state %handle;
%handle{$FILE} = $FILE.IO.open unless %handle{$FILE};
my $string = %handle{$FILE}.readchars($number); # [2a]
%handle{$FILE}:delete if %handle{$FILE}.eof; # [1]
return $string; # [2]
}
[1] Delete the filehandle (which will close it as well, as we have killed off the variable) if we have reached the end of the file. Note that it is probably a good idea to close it explicitly.
[2] Replace the green part with the non-green part of [2a] to get the previous version of the program.
See
https://docs.raku.org/language/subscripts#index-entry-:delete_(subscript_adverb)
for more information about the :delete
adverb.
See
docs.raku.org/routine/eof for
more information about eof
.
Running it:
$ ./read-n-eof -t=10
1234
5678
90
1234
5678
90
1234
5678
90
1234
This is probably not very useful.
We could have added another state variable called e.g. %finished
, and added
an entry here after reaching the end of file. And then refuse to open it again afterwards.
The program would then behave just like the first version, adding trailing newlines
en masse. The procedure returns the read characters, and we could check that we
actually got something. It would mean rewriting the program, breaking the example.
(Which may actually be a good idea..)
Here is a simpler version, that goes on printing as long as we get characters from the «readN» call. Note that we do not close the file here, so we are back where we started with potentially a lot of open files.
File: read-n-last
#! /usr/bin/env raku
subset PosInt of Int where * > 0;
unit sub MAIN (:f(:$file) where $file.IO.f && $file.IO.r = 'input.txt',
PosInt :s($size) = 4); # [1]
say ( readN($file, $size) // last ) for ^Inf; # [1a]
sub readN ($FILE, $number)
{
state %handle;
%handle{$FILE} = $FILE.IO.open unless %handle{$FILE};
return %handle{$FILE}.readchars($number);
}
[1] Note that the «:t($times)» argument has gone, as it has been replaced by an eternal
loop (in [1a]). Note the last
call that terminates the loop when we get an
undefined value from «readN» - which we do after reaching the end of file.
Running it:
$ ./read-n-last -s=2
12
34
56
78
90
$ ./read-n-last -s=3
123
456
789
0
$ ./read-n-last -s=5
12345
67890
$ ./read-n-last -s=9
123456789
0
#! /usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use feature 'signatures';
use feature 'state';
use Getopt::Long;
no warnings "experimental::signatures";
my $file = 'input.txt';
my $size = 4;
my $times = 3;
GetOptions("file=s" => \$file,
"size=i" => \$size,
"times=i" => \$times);
die "Not a file: " unless -f $file && -r $file;
say readN($file, $size) for 1 .. $times;
sub readN ($FILE, $number)
{
state %handle;
open($handle{$FILE}, "<", $FILE) unless $handle{$FILE};
my $string;
read $handle{$FILE}, $string, $number;
return $string;
}
Running it gives the same result as «read-n»:
$ ./read-n-perl
1234
5678
90
$ ./read-n-perl -times=2
1234
5678
$ ./read-n-perl -size=2
12
34
56
$ ./read-n-perl -size=2 -times=4
12
34
56
78
$ ./read-n-perl -size=2 -times=5
12
34
56
78
90
Note that the =i
in the
GetOptions
call requires integer values. We ought to check
that they are positive, as the custom type did for us in the Raku
version(s). Feel free to do so...
@N
and a target
$N
.
Input: @N = (1, 2, 3, 4) and $N = 3
Output: 2 since the target 3 is in the array at the index 2.
Example 2:
Input: @N = (1, 3, 5, 7) and $N = 6
Output: 3 since the target 6 is missing and should be placed at the index 3.
Example 3:
Input: @N = (12, 14, 16, 18) and $N = 10
Output: 0 since the target 10 is missing and should be placed at the index 0.
Example 4:
Input: @N = (11, 13, 15, 17) and $N = 19
Output: 4 since the target 19 is missing and should be placed at the index 4.
We could start at the beginning of @N
and go on until we have
a match, we have passed it (i.e. should insert it), or have reached the end.
That certainly works for the arrays given in the examples. But it is not
efficient with very long arrays. So I'll start at the middle of the
array (delta: halve the size). If we have not found the match, we add half
the previous delta value (with the correct sign depending on the direction).
This we do as long we haven't either found the value, found that it
is missing, or have reached the end or beginning.
#! /usr/bin/env raku
unit sub MAIN (*@N where (@N.elems && all(@N) ~~ Int && [<] @N), # [1]
Int :$N, # [2]
:v(:$verbose)
);
my $end = @N.end; # [3]
my $delta = ($end/2).round; # [4]
my $try = $delta; # [5]
loop # [6]
{
say ": try at $try" if $verbose;
if $try == 0 && $N < @N[0] # [7]
{
say 0;
last;
}
elsif $try == $end && $N > @N[$end] # [8]
{
say $end +1;
last;
}
if @N[$try] == $N # [9]
{
say $try;
last;
}
elsif @N[$try] < $N && @N[$try+1] > $N # [10]
{
say $try +1;
last;
}
$delta = ($delta/2).round; # [11]
if @N[$try] < $N # [12]
{
$try += $delta;
}
elsif @N[$try] > $N # [13]
{
$try -= $delta;
}
}
[1] A slurpy array. Ensure at least
one value (@N.elems
), that all of them are integers
(all(@N) ~~ Int
), and finally that they are in increasing order
([<] @N
, using the Reduction Metaoperator []
).
[2] A named argument for this one.
[3] The index of the last element.
[4] Halve the delta value each time. Using round
to get the an integer value ensures that we do not end up with zero (as we would
have if we had removed the fractional part of «1/2»). Zero would have stopped
the search (as adding - or subtracting - zero would not have advanced the
index).
[5] Where to start (have a try).
[6] An eternal loop, with a lot of exits (the last
calls).
[7] Have we reached the beginning, and the value we look for is lower than the one found there? (If so, place it as the new beginning.)
[8] Have we reached the end, and the value we look for is lower than the one found there? (If so, we place it after the current end.)
[9] Do we have a match? (If so, print the position.)
[10] Is the current value lower and the next one (to the right) higher than the one we look for? (If so, we place it at «position + 1».)
[11] Halve the delta value, and ensure (with round
) that it
is 1 or more.
[12] Add the delta (to the index), if we are below the value we search for.
[13] Or subtract it, if we are above.
Note that the elsif
s
in [8] and [9] can be replaced with if
s. I have chosen to
write it like this to show that [7] ⇄ [8] and [9] ⇄ [10] are related.
See
docs.raku.org/language/operators#Reduction_metaoperators for more
information about the Reduction Metaoperator []
.
See
docs.raku.org/routine/end
for more information about end
.
See
docs.raku.org/routine/round
for more information about round
.
Running it on the examples:
$ ./search-insert-position --N=3 1 2 3 4
2
$ ./search-insert-position --N=6 1 3 5 7
3
$ ./search-insert-position --N=10 12 14 16 18
0
$ ./search-insert-position --N=19 11 13 15 17
4
With verbose mode:
$ ./search-insert-position --N=3 -v 1 2 3 4
: try at 2
2
$ ./search-insert-position --N=6 -v 1 3 5 7
: try at 2
3
$ ./search-insert-position --N=10 -v 12 14 16 18
: try at 2
: try at 1
: try at 0
0
$ ./search-insert-position --N=19 -v 11 13 15 17
: try at 2
: try at 3
4
Looking good.
The program really is overkill on small arrays. So we should try it out on a large one. Instead of typing the array on the command line (which is tedious), writing a (shell script) wrapper doing it (which is also tedious), we can ask the program to do it for us.
Here it is with multiple dispatch (the multi
keyword):
multi sub MAIN ($range where
($range ~~ /(\d+)\.\.(\d+)/ && $0 < $1), # [1]
Int :$N, :v(:$verbose))
{
$range ~~ /(\d+)\.\.(\d+)/; # [1a]
MAIN($0 .. $1, :$N, :$verbose); # [1b]
}
multi sub MAIN ($sequence where
($sequence ~~ /(\d+)\,(\d+)\.\.(\d+)/ && $0 < $1 < $2), # [2]
Int :$N, :v(:$verbose))
{
$sequence ~~ /(\d+)\,(\d+)\.\.(\d+)/; # [2a]
MAIN(($0.Int, $1.Int ... $2.Int).eager, :$N, :$verbose); # [2b]
}
multi sub MAIN (*@N where (@N.elems && all(@N) ~~ Int && [<] @N),
Int :$N, :v(:$verbose))
{
…
}
[1] If we have specified one (positional) argument that satisfies the regex
(<integer>..<integer>
), get the two integer values [1a]
and call the other multi MAIN
with the Range.
[2] As above, but for a Sequence - where we specify the first two values
and the upper limit (as <integer>,<integer>...<integer>
).
See
docs.raku.org/syntax/multi for more
information about Multiple Dispatch and mutli
.
Note that the first match comes in $0
, and not $1
as
in Perl.
Also note that $0
(and so on) is a match object, and not the matching
string itself. That isn't a problem when used in a context that coerces the
value (to an integer or a string), but it is generally better to err on the side of
caution.
This works, as the Range operator (..
) coerces the operands to integers:
> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 .. $1);
12..29
Ranges (and Sequences) are lazy data structures, where the values are
generated when actually needed (accessed). We can slap on an eager
call to
expand it:
> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 .. $1).eager;
(12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29)
> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 .. $1).eager.WHAT
(List)
> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 .. $1).WHAT
(Range)
But doing this with the Sequence operator (...
) does not work, as it does not
coerce:
> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0 ... $1);
Cannot get sequence start value from an empty list
With explicit coercion:
> "12..29" ~~ /(\d+)\.\.(\d+)/; say ($0.Int ... $1.Int);
(12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29)
See docs.raku.org/routine/eager for
more information about eager
See e.g. my Centenary Sequences with Raku - Part 1: Raku article for more information about Ranges, Sequences and the differences betwen them.
Running it:
$ ./search-insert-position-range -N=10 1 100
1
$ ./search-insert-position-range -v -N=10 1 100
: @N: 1,100
: try at 1
: try at 0
1
$ ./search-insert-position-range -N=10 1..100
9
$ ./search-insert-position-range -v -N=10 1..100
: @N: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,\
27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,\
52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,\
77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
: try at 50
: try at 25
: try at 12
: try at 5
: try at 9
9
And with some sequences:
$ ./search-insert-position-range -v -N=10 "1,3...100"
: @N: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,\
51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99
: try at 25
: try at 12
: try at 5
: try at 1
: try at 3
: try at 4
5
$ ./search-insert-position-range -v -N=10 "2,4...100"
: @N: 2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,\
52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100
: try at 25
: try at 12
: try at 5
: try at 1
: try at 3
: try at 4
4
$ ./search-insert-position-range -v -N=10 "1,6...100"
: @N: 1,6,11,16,21,26,31,36,41,46,51,56,61,66,71,76,81,86,91,96
: try at 10
: try at 5
: try at 2
: try at 0
: try at 1
2
$ ./search-insert-position-range -v -N=1000 1,3...100000
: @N: 1,3,5,7,9,11,13,15,17,…
: try at 25000
: try at 12500
: try at 6250
: try at 3125
: try at 1562
: try at 780
: try at 389
: try at 585
: try at 487
: try at 536
: try at 511
: try at 498
: try at 505
: try at 501
: try at 499
500
#! /usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use Getopt::Long;
my $N;
my $verbose = 0;
GetOptions("N=i" => \$N,
"verbose" => \$verbose);
my @N = @ARGV;
my $end = @N -1;
my $delta = int(0.5 + $end/2);
my $try = $delta;
while (1)
{
say ": try at $try" if $verbose;
if ($try == 0 && $N < $N[0])
{
say 0;
last;
}
elsif ($try == $end && $N > $N[$end])
{
say $end +1;
last;
}
if ($N[$try] == $N)
{
say $try;
last;
}
elsif ($N[$try] < $N && $N[$try+1] > $N)
{
say ++$try;
last;
}
$delta = int(0.5 + $delta/2);
if ($N[$try] < $N)
{
$try += $delta;
}
elsif ($N[$try] > $N)
{
$try -= $delta;
}
}
Running it gives the same result as «search-insert-position»:
$ ./search-insert-position-perl -v -N=3 1 2 3 4
: try at 2
2
$ ./search-insert-position-perl -v -N=6 1 3 5 7
: try at 2
3
$ ./search-insert-position-perl -v -N=6 1 3 5 7
: try at 2
3
$ ./search-insert-position-perl -v -N=10 12 14 16 18
: try at 2
: try at 1
: try at 0
0
And that's it.