Day Parser
with Raku

by Arne Sommer

Day Parser with Raku

[279] Published 9. March 2024.

This is my response to The Weekly Challenge #259.

Challenge #259.1: Banking Day Offset

You are given a start date and offset counter. Optionally you also get bank holiday date list.

Given a number (of days) and a start date, return the number (of days) adjusted to take into account non-banking days. In other words: convert a banking day offset to a calendar day offset.

Non-banking days are:
  1. Weekends
  2. Bank holidays
Example 1:
Input: $start_date = '2018-06-28',
       $offset = 3,
       $bank_holidays = ['2018-07-03']
Output: '2018-07-04'

Thursday bumped to Wednesday (3 day offset, with Monday a bank holiday)
Example 2:
Input: $start_date = '2018-06-28',
       $offset = 3
Output: '2018-07-03'
File: banking-day-offset
#! /usr/bin/env raku

unit sub MAIN (:s(:$start), # [1] 
               Int :o(:$offset) where $offset > 0 = 1,  # [1a]
	       :b(:$bank) = "",                         # [1b]
               :v(:$verbose));

my $date = Date.new($start);                            # [2]

my %is-bank;                                            # [3]

for $bank.words -> $bank                                # [4]
{
  %is-bank{Date.new($bank).Str} = True;                 # [4a]
}

my $added = 0;                                          # [5]
my $todo  = $offset;                                    # [6]

loop                                                    # [7]
{
  if $date.day-of-week == 6|7                           # [8]
  {
    say ": Date $date ({ $date.day-of-week == 6 ?? 'Saturday' \
      !! 'Sunday' }) Add 1" if $verbose;
    $date++;                                            # [8a]
    $added++;                                           # [8b]
  }
  elsif %is-bank{$date.Str}                             # [9]
  {
    say ": Date $date (Bank Holiday) Add 1" if $verbose;
    $date++;                                            # [9a]
    $added++;                                           # [9b]
  }
  elsif $todo                                           # [10]
  {
    say ": Date $date (Todo { $todo }) Add 1" if $verbose;
    $date++;                                            # [10a]
    $added++;                                           # [10b]
    $todo--;                                            # [10c]
  }
  else                                                  # [11]
  {
    last;                                               # [11a]
  }
}

say ": Added a total of $added days" if $verbose;

say $date.Str;                                          # [12]

[1] A date string, given as a named argument, to be enforced by [2].

I tried adding a clause here (where Date.new($start) and where Date.new($start).day-of-week) but it did not work.

[1a,1b] The offset is also a named argument (with 1 as default value), as is the list of (space separated) bank holidays (with an empty string as default value).

[2] Generate the Date object. This will fail if the date is illegal.

[3] A hash to be populated with bank holidays (as strings).

[4] Iterate over the bank holidays, if any, and add each one to the hash. Note the check for legal dates here [4a].

[5] Number of added days, from the initial one.

[6] Number of banking days still to add.

[7] An eternal loop, with an exit strategy in [11a].

[8] Are we on a Saturday or Sunday? If so, add one (day) to the date object [8a] and the counter [8b].

[9] Are we on a bank holiday? If so, add one (day) to the date object [9a] and the counter [9b].

[10] Still more days to add? If so, add one (day) to the date object [10a] and the counter [10b] and count down the days still to add.

[11] Nothing done in this iteration? If so, we are done. Exit the loop with last.

See docs.raku.org/routine/last for more information about last.

[12] Print the date as string. (This is pedantic, as say would have stringified it anyway...)

Running it:

$ ./banking-day-offset -s=2018-06-28 -o=3 -b=2018-07-03
2018-07-04

$ ./banking-day-offset -s=2018-06-28 -o=3
2018-07-03

Looking good.

With verbose mode:

 ./banking-day-offset -s=2018-06-28 -o=3 -b=2018-07-03 -v
: Date 2018-06-28 (Todo 3) Add 1
: Date 2018-06-29 (Todo 2) Add 1
: Date 2018-06-30 (Saturday) Add 1
: Date 2018-07-01 (Sunday) Add 1
: Date 2018-07-02 (Todo 1) Add 1
: Date 2018-07-03 (Bank Holiday) Add 1
: Added a total of 6 days
2018-07-04

$ ./banking-day-offset -s=2018-06-28 -o=3 -v
: Date 2018-06-28 (Todo 3) Add 1
: Date 2018-06-29 (Todo 2) Add 1
: Date 2018-06-30 (Saturday) Add 1
: Date 2018-07-01 (Sunday) Add 1
: Date 2018-07-02 (Todo 1) Add 1
: Added a total of 5 days
2018-07-03

Several bank holidays (fictious) at once?

$ ./banking-day-offset -s="2020-01-04" -o=3 -b="2020-01-06 2020-01-07" -v
: Date 2020-01-04 (Saturday) Add 1
: Date 2020-01-05 (Sunday) Add 1
: Date 2020-01-06 (Bank Holiday) Add 1
: Date 2020-01-07 (Bank Holiday) Add 1
: Date 2020-01-08 (Todo 3) Add 1
: Date 2020-01-09 (Todo 2) Add 1
: Date 2020-01-10 (Todo 1) Add 1
: Date 2020-01-11 (Saturday) Add 1
: Date 2020-01-12 (Sunday) Add 1
: Added a total of 9 days
2020-01-13

Challenge #259.2: Line Parser

You are given a line like below:

{%  id   field1="value1"    field2="value2"  field3=42 %}
Where
  1. "id" can be \w+
  2. There can be 0 or more field-value pairs
  3. The name of the fields are \w+
  4. The values are either number in which case we don't need double quotes or string in which case we need double quotes around them
The line parser should return structure like below:
{
       name => id,
       fields => {
           field1 => value1,
           field2 => value2,
           field3 => value3,
       }
}
It should be able to parse the following edge cases too:
{%  youtube title="Title \"quoted\" done" %}
and
{%  youtube title="Title with escaped backslash \\" %}
BONUS: Extend it to be able to handle multiline tags:
{% id  filed1="value1" ... %}
LINES
{% endid %}
You should expect the following structure from your line parser:
{
       name => id,
       fields => {
           field1 => value1,
           field2 => value2,
           field3 => value3,
       }
       text => LINES
}
File: line-parser
#! /usr/bin/env raku

unit sub MAIN ($file = "example1.txt", :v($verbose));              # [1]

my %hash;                                                          # [2]
my $text;                                                          # [3]

for (slurp $file).lines -> $line                                   # [4]
{
  if $line ~~ /^\{\%(.*)\%\}$/ && ! %hash<name>                    # [5]
  {
    my $data    = $0.Str;                                          # [6]
    say ": Data: $data" if $verbose;

    my @data    = $data.words;                                     # [7]
    %hash<name> = @data.shift;                                     # [8]

    my %fields;                                                    # [9]
    for @data -> $data                                             # [10]
    {
      my ($k,$v) = $data.split('=');                               # [11]

      if $v.Numeric                                                # [12]
      {
         %fields{$k} = $v.Numeric;                                 # [12a]
      }
      else                                                         # [13]
      {
        $v ~~ /^\"(.*)\"$/;                                        # [13a]
	die "String $v must be quoted" unless $0;                  # [13b]
        %fields{$k} = $0.Str;                                      # [13c]
      }
    }

    %hash<fields> = %fields;                                       # [14]
  }
  elsif $line eq '{% end' ~ %hash<name> ~ ' %}'                    # [15]
  {
    %hash<text> = $text if $text;                                  # [15a]
    last;                                                          # [15b]
  }
  elsif %hash<name>                                                # [16]
  {
    $text ~= "$line\n";                                            # [16a]
  }
  else                                                             # [17]
  {
    die "Illegal data in input: $line";                            # [17a]
  }
}

say %hash.raku;                                                    # [18]

[1] Specify the file to read the example from.

[2] The parsed data structure will end up here.

[3] Any optional text.

[4] Read the file (with slurp), and iterate over the individual rows.

See docs.raku.org/routine/slurp for more information about slurp.

[5] Do we have a starting row (and have not encountered one before, i.e. not saved the name in %hash).

[6] Retrieve the data.

[7] The data as words (i.e. split on space characters).

[8] The first part is the name. Record it.

[9] The fields, if any, will end up here.

[10] Iterate over the input data.

[11] Split on the equal sign, giving the key and value.

[12] Is the value numeric? If so, add it "as is" to the field list.

[13] If not, extract the value from inside the quotes [13a]. Fail if the regexp failed to match [13b]. Save the result [13c].

[14] Add the fields to the hash.

[15] Do we have an end line (i.e. the start tag name prefixed with "end"). If so, add any retrieved text and we are done (with last).

[16] Any other line (as long as we have recorded a name, from a starting line) is a text line.

[17] Error in the input. Say so.

[18] Print the result, using the raku method on the data structure.

Running it:

$ ./line-parser example1.txt 
{:fields(${:field1("value1"), :field2("value2"), :field3(42)}), :name("id")}

$ ./line-parser example2.txt 
String "Title must be quoted
  in block  at ./line-parser line 32
  in block  at ./line-parser line 21
  in sub MAIN at ./line-parser line 10
  in block <unit> at ./line-parser line 3

$ ./line-parser example3.txt 
String "Title must be quoted
  in block  at ./line-parser line 32
  in block  at ./line-parser line 21
  in sub MAIN at ./line-parser line 10
  in block <unit> at ./line-parser line 3

$ ./line-parser example4.txt 
{:fields(${:field1("valeu1"), :filed1("value1"), :num(3.14)}), :name("id"), \
 :text("LINE, the first\nLINE, the second\n")}

Example 2 and 3 fails, as expected, as I have not implemented escape sequence support.

Let us fix that.

File: line-parser-quoted
#! /usr/bin/env raku

use JSON::Fast;                                                     # [1]

unit sub MAIN ($file = "example1.txt", :j(:$json), :v(:$verbose));  # [2]

my %hash;
my $text;

for (slurp $file).lines -> $line
{
  if $line ~~ /^\{\%(.*)\%\}$/ && ! %hash<name>
  {
    my $data    = $0.Str;
    my @data    = do-parse($data);                                  # [3]
    %hash<name> = @data.shift;

    my %fields;
    for @data -> $data                                              # [4]
    {
      my ($k,$v)  = $data.split('=');
      %fields{$k} = $v;
    }

    %hash = %fields;
  }
  elsif $line eq '{% end' ~ %hash<name> ~ ' %}'
  {
    %hash = $text if $text;
    last;
  }
  elsif %hash<name>
  {
    $text ~= "$line\n";
  }
  else
  {
    die "Illegal data in input: $line";
  }
}

say $json                                                           # [5]
  ?? to-json %hash, :sorted-keys                                    # [5a]
  !! %hash.raku;                                                    # [5b]

sub do-parse ($string)                                              # [6]
{
  my @done;                                                         # [7]
  my $todo = $string.trim;                                          # [8]

  while $todo                                                       # [9]
  {
    $todo ~~ /^ \s* (\w+)(.*)/;                                     # [10]
    my $key = $0.Str;                                               # [10a]
    my $val = "";                                                   # [10b]
    $todo   = $1.Str;                                               # [10c]

    if $todo.starts-with('="')                                      # [11]
    {
      $todo ~~ /^\=\"(.*)/;                                         # [11a]

      my @todo = $0.Str.comb;                                       # [11b]

      while @todo                                                   # [12]
      {
        my $char = @todo.shift;                                     # [12a]
	$char ~= @todo.shift if $char eq "\\";                      # [12b]
	$val ~= $char;                                              # [13]
        if @todo[0] eq '"'                                          # [14]
	{
	  $todo = @todo[1..*].join;                                 # [14a]
	  last;                                                     # [14b]
	}
      }

      @done.push: "$key=$val";                                      # [15]
    }
    elsif $todo.starts-with('=')                                    # [16]
    {
      $todo ~~ /^\=(<[0..9.]>+)(.*)/;                               # [16a]
      $val    = $0.Str;                                             # [16b]
      $todo   = $1.Str;                                             # [16c]
      @done.push: "$key=$val";                                      # [16d]
    }
    else                                                            # [17]
    {
      @done.push: $key;                                             # [17a]
    }
  }

  say ": Parsed: { @done.raku }" if $verbose;
  return @done;                                                     # [18]
}

[1] I have added support for nicer presentation of the result, using JSON::Fast.

See raku.land/cpan:TIMOTIMO/JSON::Fast for more information about the JSON::Fast module.

[2] Enable JSON output, if you want to.

[3] Parse the data (see [6]), i.e. split on quotes instead of spaces. The result is a list of fields, as before, but now with embedded whitespace support.

[4] Iterate over the fields.

[5] Print the result as JSON if requested, and with .raku if not.

[6] The parsing procedure.

[7] The parsed fields will end up here.

[8] Get rid of leading and trailing spaces with trim.

See docs.raku.org/routine/trim for more information about trim.

[9] As long as we have more unparsed data.

[10] Get the key part, and remove it from the unparsed data.

[11] If the unparsed data starts with =", we have a quoted value part. Get rid if the leading =", and turn the unparsed data into an array of individual characters.

[12] As long as we have unparsed data, get the first one [12a] and add the next one if the first on is a backslash (i.e. we have an escaped character).

[13] Add the character (or escaped character) to the value.

[14] We do the loops until we encounter the value ending ". Then we put the remaining characters (from the array) as a string back in the unparsed data string [14a] and exit the inner loop [14b].

[15] Add the key value pair to the result.

[16] If the unparsed data starts with =, we have an unquoted numeric value. We support unsigned integers and floating point numbers only [16a]. Extract the value [16b] and update the unparsed data accordingly [16c]. Add the key value pair to the result [16d].

[17] In this case we do not have a value part. Save the standalone key [17a].

[18] Return the result.

Running it, with JSON output:

$ ./line-parser-quoted -j example1.txt 
{
  "fields": {
    "field1": "value1",
    "field2": "value2",
    "field3": "42"
  },
  "name": "id"
}

$ ./line-parser-quoted -j example2.txt 
{
  "fields": {
    "title": "Title \\\"quoted\\\" done"
  },
  "name": "youtube"
}

$ ./line-parser-quoted -j example3.txt 
{
  "fields": {
    "title": "Title with escaped backslash \\\\"
  },
  "name": "youtube"
}

$ ./line-parser-quoted -j example4.txt 
{
  "fields": {
    "field1": "valeu1",
    "filed1": "value1",
    "num": "3.14"
  },
  "name": "id",
  "text": "LINE, the first\nLINE, the second\n"
}

The extra backslashes are a bit annoying, but I am unable to get rid of them. Also note that the JSON output comes with quoted numeric values, wheras the raku method did not.

Why not use Grammars you may ask. I choose not to answer...

And that's it.