Roman Numerals with Raku

by Arne Sommer

Roman Numerals with Raku

[15] Published 1. June 2019

Perl 6 → Raku

This article has been moved from «perl6.eu» and updated to reflect the language rename in 2019.

The original title was «Roman Numerals with Perl Ⅵ».

This is my response to the Perl Weekly Challenge #10.

Challenge #10.1: Roman Numerals

Write a script to encode/decode Roman numerals. For example, given Roman numeral CCXLVI, it should return 246. Similarly, for decimal number 39, it should return XXXIX. Checkout wikipedia page for more information.

That is two tasks (and procedures), but with one common script as a wrapper. We can decide which procedure to call with two «multi MAIN»s:

File: roman-gather (partial)
multi MAIN (Int $number where $number > 0) # [1]
{
  say to-roman($number);
}
 
multi MAIN (Str $roman)                    # [2]
{
  say from-roman($roman);
}

[1] This version of MAIN is called if we pass an integer larger than zero to the program,

[2] and this version is called for all other values. (I have chosen to add the error checking in the «from-roman» procedure; which we'll discuss later.)

Note that we will get a run time error if we drop the «where» clause in the first «MAIN», and run the program with a number, as the number will be of the «IntStr» type. And the compiler cannot tell if we intend the number to be an integer or a string.

See my Raku from Zero to 35 article or docs.raku.org/type/IntStr for more information about «IntStr».

To Roman

The procedure to compute the roman numerals is quite straightforward (but long), given the rules (as specified in the wikipedia article):

File: roman-gather (partial)
sub to-roman (Int $number is copy)
{
  my $string = "";
  
  while $number >= 1000 { $string ~= "M";  $number -= 1000; }
  if $number >= 900     { $string ~= "CM"; $number -= 900; }
  if $number >= 500     { $string ~= "D";  $number -= 500; }
  if $number >= 400     { $string ~= "CD"; $number -= 400; }
  while $number >= 100  { $string ~= "C";  $number -= 100; }
  if $number >= 90      { $string ~= "XC"; $number -= 90; }
  if $number >= 50      { $string ~= "L";  $number -= 50; }
  if $number >= 40      { $string ~= "XL"; $number -= 40; }
  while $number >= 10   { $string ~= "X";  $number -= 10; }
  if $number >= 9       { $string ~= "IX"; $number -= 9; }
  if $number >= 5       { $string ~= "V";  $number -= 5; }
  if $number >= 4       { $string ~= "IV"; $number -= 4; }
  while $number >= 1    { $string ~= "I";  $number -= 1; }

  return $string;
}

Note the «while» loops for 1 (I), 10 (X), 100 (C) and M (1000), as they can appear several times. The other values ( 5 (V), 50 (L)) and 500 (D) - as well as the subtractive notation 4 (IV), 9 (IX), 40 (XL), 90 (XC), 400 (CD) and 900 (CM) ) can only appear once each.

From Roman

This one is a more complicated, as we must take care of the subtractive notation when we consider each Roman "digit":

File: roman-gather (partial)
my %value = ( I => 1, V => 5, X => 10, L => 50, C => 100, D => 500, M => 1000);
                                       # [1]
my Set $valid-roman = %value.keys.Set; # [2]

sub from-roman (Str $roman)
{
  my @digits = $roman.comb;            # [3]

  die "Non-Roman digit $_ detected." unless $valid-roman{$_} for @digits; # [4]

  my $numbers := gather                # [5]
  {
    while @digits                      # [6]
    {
      my $current = @digits.shift;     # [6]
      if @digits.elems                 # [7]
      {
        if %value{@digits[0]} > %value{$current}         # [7]
        {
          take %value{@digits.shift} - %value{$current}; # [8]
	  next;                                          # [9]
        }
      }
      take %value{$current};                             # [10]
    }
  }
  return $numbers.sum;                                   # [11]
}

[1] The roman digits and the their values, in a hash.

[2] The legal roman digits in a Set.

[3] The roman digits, one at a time, in an array.

[4] Abort if we encounter any non-roman digit in the input.

[5] Use gather/take to get the values.

[6] As long as there are more digits, take the next one.

[7] If there is a next digit, and the value is higher value than the current one,

[8] • Return the combined value (the highest minus the lowest).

[9] • And go to the next iteration.

[10] Else (to both the «if» tests), return the current value.

[11] Return the sum of all the digits.

I know that I am too fond of gather/take. It is better to write it without:

File: roman-nongather (partial)
sub from-roman (Str $roman)
{
  my @digits = $roman.comb;

  die "Non-Roman digit $_ detected." unless $valid-roman{$_} for @digits;

  my $number = 0; # [1]

  while @digits
  {
    my $current = @digits.shift;
    if @digits.elems
    {
      if %value{@digits[0]} > %value{$current}
      {
        $number += %value{@digits.shift} - %value{$current}; # [1]
        next;
      }
    }
    $number += %value{$current}; # [1]
  }

  return $number; # [1]
}

[1] This time we just add up the values as we go.

The algorithm isn't perfect, as these examples show:

$ raku roman-gather IM      # -> 999
$ raku roman-nongather IM   # -> 999

$ raku roman-gather IXM     # -> 1009
$ raku roman-nongather IXM  # -> 1009

The «to-roman» procedure have the only allowed subtractive notation. I'll add them:

File: roman (changes only, compared with «roman-nongather»)
my Set $subtractive = <CM CD XC XL IX IV>.Set;
    
      if %value{@digits[0]} > %value{$current}
      {
        die "Non-Roman Subtractive Notation"
          unless $subtractive{$current ~ @digits[0]};
        $number += %value{@digits.shift} - %value{$current};
        next;
      }

This takes care of the «IM» case:

$ raku roman IM
Non-Roman Subtractive Notation
  in sub from-roman at roman line 55
  in sub MAIN at roman line 10
  in block  at roman line 38

Then we can look at this one:

$ raku roman IXM     # -> 1009

That is (also) rather easy to fix. We keep track of the current digit value, and abort if we encounter a larger value:

File: roman (changes only)
my $current-value = Inf; # Placed before "sub from-roman"

    my $current = @digits.shift;
    
    die "Wrong order of the Roman digits" if $current-value < %value{$current};
    $current-value = %value{$current};
    
    if @digits.elems

Testing it:

$ raku roman IXM
Wrong order of the Roman digits
  in sub from-roman at roman line 54
  in sub MAIN at roman line 10
  in block  at roman line 40

It is (still) possible to do this, though:

$ raku roman MCMC  # -> 2000

This is trickier to fix, but I'll do it the easy way: simply check that the value round trips. If it doesn't, then we have an error:

File: roman (partial)
multi MAIN (Str $roman)
{
  my $int = from-roman($roman);
  $roman eq to-roman($int)
    ?? say $int
    !! die "Wrong order of the Roman digits"; 
}

This change makes the prior changes redundant, so I have removed them. (Or rather, commented them out.) The complete program looks like this now:

File: roman
multi MAIN (Int $number where $number > 0)
{
  say to-roman($number);
}
 
multi MAIN (Str $roman)
{
  my $int = from-roman($roman);
  $roman eq to-roman($int)
    ?? say $int
    !! die "Wrong order of the Roman digits"; 
}

sub to-roman (Int $number is copy)
{
  my $string = "";
  
  while $number >= 1000 { $string ~= "M";  $number -= 1000; }
  if $number >= 900     { $string ~= "CM"; $number -= 900; }
  if $number >= 500     { $string ~= "D";  $number -= 500; }
  if $number >= 400     { $string ~= "CD"; $number -= 400; }
  while $number >= 100  { $string ~= "C";  $number -= 100; }
  if $number >= 90      { $string ~= "XC"; $number -= 90; }
  if $number >= 50      { $string ~= "L";  $number -= 50; }
  if $number >= 40      { $string ~= "XL"; $number -= 40; }
  while $number >= 10   { $string ~= "X";  $number -= 10; }
  if $number >= 9       { $string ~= "IX"; $number -= 9; }
  if $number >= 5       { $string ~= "V";  $number -= 5; }
  if $number >= 4       { $string ~= "IV"; $number -= 4; }
  while $number >= 1    { $string ~= "I";  $number -= 1; }

  return $string;
}

# my Set $subtractive = <CM CD XC XL IX IV>.Set;

my %value = ( I => 1, V => 5, X => 10, L => 50, C => 100, D => 500, M => 1000);

my Set $valid-roman = %value.keys.Set;

my $current-value = Inf;

sub from-roman (Str $roman)
{
  my @digits = $roman.comb;

  die "Non-Roman digit $_ detected." unless $valid-roman{$_} for @digits;

  my $number = 0;

  while @digits
  {
    my $current = @digits.shift;
    
    # die "Wrong order of the Roman digits" if $current-value < %value{$current};
    # $current-value = %value{$current};
    
    if @digits.elems
    {
      if %value{@digits[0]} > %value{$current}
      {
        # die "Non-Roman Subtractive Notation" unless $subtractive{$current ~ @digits[0]};
        $number += %value{@digits.shift} - %value{$current};
        next;
      }
    }
    $number += %value{$current};
  }

  return $number;
}

We should test the values given in the challenge:

$ raku roman CCXLVI  # -> 246
$ raku roman 39      # -> XXXIX

We get the correct answers.

Roman Method

It is possible to use any procedure as a method, with the Ampersand Syntax:

sub to-roman ($int) { ... }

say to-roman(2019); # The same as the next one.
say 2019.&to-roman; # The same

Strictly speaking, we are not really using it as a method, but we use an alternate procedure invocation syntax - that just happens to look like a method call. Except for the & character.

If we had written a class, we could have called «to-roman» on a variable of that class.

It turns out that we have actually been using the builtin «Int» class, perhaps without knowing it.

It is actually possible to add the method to the «Int» class, but it is potentially dangerous (as we mess with internal classes, and any changes are global):

File: lib/Int-Roman.rakumod
unit module Int-Roman;  # [1]

use MONKEY-TYPING;      # [2]

augment class Int       # [3]
{
  method roman          # [4]
  {
    my $value = self;   # [4]
    my $string = "";

    while $value >= 1000 { $string ~= "M";  $value -= 1000; }
    if $value >= 900     { $string ~= "CM"; $value -= 900; }
    if $value >= 500     { $string ~= "D";  $value -= 500; }
    if $value >= 400     { $string ~= "CD"; $value -= 400; }
    while $value >= 100  { $string ~= "C";  $value -= 100; }
    if $value >= 90      { $string ~= "XC"; $value -= 90; }
    if $value >= 50      { $string ~= "L";  $value -= 50; }
    if $value >= 40      { $string ~= "XL"; $value -= 40; }
    while $value >= 10   { $string ~= "X";  $value -= 10; }
    if $value >= 9       { $string ~= "IX"; $value -= 9; }
    if $value >= 5       { $string ~= "V";  $value -= 5; }
    if $value >= 4       { $string ~= "IV"; $value -= 4; }
    while $value >= 1    { $string ~= "I";  $value -= 1; }

    return $string;
  }


[1] The name of the module.

[2] This directive is required, as we do something that is potentially dangerous.

[3] The «augment» keyword is used to extend (or augment) an existing class.

[4] The method. Note that the object itself is available as «self». And as the object is an Int, we get the value itself by accessing «self».

See docs.raku.org/syntax/augment for more information about «Augment».

See docs.raku.org/language/objects#self for more information about «self».

File: roman-int
use lib "lib";
use Int-Roman;

sub MAIN (Int $number)
{
  say $number.Int.roman; # as MAIN gives us an "IntStr" value.
}

Testing it:

$ raku roman-int 2019
MMXIX

Or we can use the module directly in REPL, on integers:

> use lib "lib";  # -> Nil
> use Int-Roman;  # -> Nil
> say 12.roman;   # -> XII
> say 999.roman;  # -> CMXCIX

Roman Base

The «base» method gives us the number in the specified base. E.g.:

> say 255.base: 16;  # -> FF        ## Hexadecimal
> say 255.base: 2;   # -> 11111111  ## Binary
> say 255.base: 8;   # -> 377       ## Octal

See my Raku from Zero to 35 article or docs.raku.org/routine/base for more information about «Base».

So let us extend it with the «r» argument to give us Roman numerals:

File: lib/Int-Roman2.rakumod
unit module Int-Roman2;

use MONKEY-TYPING;

augment class Int
{
  method roman { ... }     # [1]

  multi method base ("r")  # [2]
  {
    return self.roman;
  }
}

[1] The same code as above, but not shown here.

[2] The builtin «base» method has been set up as a «multi», so we can just plug in new versions like this.

And a program using the module:

File: roman-base
use lib "lib";
use Int-Roman2;

sub MAIN (Int $number)
{
  say $number.Int.roman; # as MAIN gives us an "IntStr" value.
  say $number.Int.base("r");
}

Testing it:

$ raku roman-base 2019
MMXIX
MMXIX

Roman Unicode

The Roman numerals (the "digits", as well as the combined numbers 2, 3, 4, 6, 7, 8, 9, 11 and 12) are available in Unicode, as described e.g. by this Wikipedia page.

What would it take for Raku to be able to recognise them? Absolutely nothing. Raku tries very hard to be Unicode compliant, and when a Unicode symbol is numeric, Raku recognises that value:

> say Ⅵ;       # -> 6
> say Ⅵ + Ⅻ;  # -> 18

This only works for a single Roman Unicode character (which we have above, even if it seems like more). If you add more characters, you will get a compile time error:

> say ⅭⅭ
===SORRY!=== Error while compiling:
Bogus postfix
------> say Ⅽ⏏Ⅽ
    expecting any of:
        infix
        infix stopper
        postfix
        statement end
        statement modifier
        statement modifier loop

This error is primarily a result of Roman numerals not beeing positional. They are really meant as strings, but we do get the numeric value as a bonus. When possible.

We could modify «from-roman» to use the Unicode characters, instead of the normal letters. But this will only annoy users of the script when they try to enter a roman number, as it is difficult to enter these characters. And it wouldn't be very user friendly.

We could add them to «from-roman» in addition to the normal letters, so that it supports both types. But this will break roundtripping, so would require some additional work.

Challenge #10.2

Write a script to find Jaro-Winkler distance between two strings. For more information check wikipedia page.

I'm on vacation this week, and do not have time enough to look into this challenge.