Caesarean Substrings
with Raku and Perl

by Arne Sommer

Caesarean Substrings with Raku and Perl

[113] Published 30. January 2021.

This is my response to the Perl Weekly Challenge #097.

Challenge #097.1: Caesar Cipher

You are given string $S containing alphabets A..Z only and a number $N.

Write a script to encrypt the given string $S using Caesar Cipher with left shift of size $N.

Example:
Input: $S = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG", $N = 3
Output: "QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD"

Plain:    ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher:   XYZABCDEFGHIJKLMNOPQRSTUVW

Plaintext:  THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Ciphertext: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

The expression «alphabets A..Z only» is wrong, as the example has several spaces as well. So they should be allowed.

File: caesar-cipher
#! /usr/bin/env raku

subset AZ-space of Str where /^ <[ A .. Z \s ]>+ $/;   # [1]
subset PosInt of Int where -25 <= $_ <= 25;            # [2]

unit sub MAIN (AZ-space $S = 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG',
               PosInt $N = 3);                         # [3]

say $S.comb.map({ caesar($_, $N) }).join;              # [4]

sub caesar ($char, $shift)
{
  return $char if $char eq " ";                        # [5]

  my $code = $char.ord;                                # [6]

  $code -= $shift;                                     # [7]

  $code += 26 if $code < 65;  # 'A'                    # [8]
  $code -= 26 if $code > 90;  # 'Z'                    # [8a]

  return $code.chr;                                    # [9]
}

[1] The allowed characters (or «domain specific alphabet»).

[2] The challenge says that the left shift value is a number. It does not make sense to allow anything other than integers, so I restrict the value to that type. Negative values should be ok, and they mean a right shift value (instead of left).

[3] The arguments, with default values as given in the challenge.

[4] Split the string into single charactes (with comb, apply the «caesar» function on each one (with map), join the characters together as a string again (with join), and print it.

[5] Do not shift spaces.

[6] Get the character codepoint.

[7] Subtract the shift value (as we shift to the left, or lower in the alphabet).

[8] Wrap around if we shift out of the A-Z range, here lower - or higher in [8b]

[9] Get the character with the specified codepoint.

See docs.raku.org/routine/ord for more information about ord.

See docs.raku.org/routine/chr for more information about chr.

Running it:

$ ./caesar-cipher 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' 3
QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

$ ./caesar-cipher 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' -3
WKH TXLFN EURZQ IRA MXPSV RYHU WKH ODCB GRJ

$ ./caesar-cipher 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' 13
GUR DHVPX OEBJA SBK WHZCF BIRE GUR YNML QBT

$ ./caesar-cipher 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' -13
GUR DHVPX OEBJA SBK WHZCF BIRE GUR YNML QBT

Raku has a ords variant that takes a whole string, and not a single character as ord. And chrs which takes an array of codepoints and turns them into a string, and not a single codepoint to a character as chr. Let us use them to write a shorter program:

File: caesar-cipher-map
#! /usr/bin/env raku

subset AZ-space of Str where /^ <[ A .. Z \s ]>+ $/;
subset PosInt of Int where -25 <= $_ <= 25;

unit sub MAIN (AZ-space $S = 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG',
               PosInt $N = 3);

say caesar($S, $N);

sub caesar ($string, $shift)
{
  return $string.ords.map({$_ == 32 ?? 32 !! (($_ - $shift - 65) % 26 ) + 65}).chrs;
    # #################### # 1a ############# ############ # 1b  # 1c ## 1d
}

[1] We use map to change the individual codepoints. We let the spaces with codepoint 32 alone [1a]. Every other value we reduce to a number between 0 and 25 (by subtracting the codepoint of the first letter (A: 65) and the shift value [1b]. The modulo operator (%) takes care of negative values for us, doing the right thing. E.g. -2 % 26 -> 24 [1c]. Then we add adjust the values up to where they should be (A to Z) [1d] before we turn the whole array of codepints into a string.

See docs.raku.org/routine/ords for more information about ords.

See docs.raku.org/routine/chrs for more information about chrs.

Running it gives the same result as before:

$ ./caesar-cipher-map 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' 3
QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

$ ./caesar-cipher-map 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' -3
WKH TXLFN EURZQ IRA MXPSV RYHU WKH ODCB GRJ

$ ./caesar-cipher-map 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' 13
GUR DHVPX OEBJA SBK WHZCF BIRE GUR YNML QBT

$ ./caesar-cipher-map 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' -13
GUR DHVPX OEBJA SBK WHZCF BIRE GUR YNML QBT

A Perl Version

This is straight forward translation of the first Raku version.

File: caesar-cipher-perl
#! /usr/bin/env perl

use strict;
use warnings;
use feature 'say';
use feature 'signatures';

no warnings "experimental::signatures";

my $S = shift(@ARGV) // 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG';

die "Illegal characters" unless $S =~ /^[A-Z\s]+$/;

my $N = shift(@ARGV) // 3;

die "Illegal shift $N" if $N !~ /^\-?\d+$/ || $N < -25 || $N > 25;

say join("", map { caesar($_, $N) } split(//, $S));

sub caesar ($char, $shift)
{
  return $char if $char eq " ";

  my $code = ord($char);

  $code -= $shift;

  $code += 26 if $code < 65;  # 'A'
  $code -= 26 if $code > 90;  # 'Z'

  return chr($code);
}

Running it gives the same result as the Raku version:

$ ./caesar-cipher-perl 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' 3
QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

$ ./caesar-cipher-perl 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' -3
WKH TXLFN EURZQ IRA MXPSV RYHU WKH ODCB GRJ

$ ./caesar-cipher-perl 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' 13
GUR DHVPX OEBJA SBK WHZCF BIRE GUR YNML QBT

$ ./caesar-cipher-perl 'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' -13
GUR DHVPX OEBJA SBK WHZCF BIRE GUR YNML QBT

Challenge #097.2: Binary Substrings

You are given a binary string $B and an integer $S.

Write a script to split the binary string $B of size $S and then find the minimum number of flips required to make it all the same.

Example 1:
Input: $B = “101100101”, $S = 3
Output: 1

Binary Substrings:
    "101": 0 flip
    "100": 1 flip to make it "101"
    "101": 0 flip
Example 2:
Input $B = “10110111”, $S = 4
Output: 2

Binary Substrings:
    "1011": 0 flip
    "0111": 2 flips to make it "1011"

We start by chopping off 3-character chunks of the binary string from the first example.

> say "101100101".comb(3);   # -> (101 100 101)
> say "1011001010".comb(3);  # -> (101 100 101 0)

The second line shows what happens if the length does not match. That gives us an illegal value, as we cannot flip a single digit into a three digit value. So we have to add a check for that.

Then we compare the first substring with the rest of them, one at a time. Using the bitwise XOR (Exclusive OR) operator is a logical choice here. That gives us a binary value, where the number of 1s is the number of flips on that substring. Raku does indeed have an XOR operator: +^. But it «coerces both arguments to Int and does a bitwise XOR operation» (according to the documentation»; see docs.raku.org/language/operators#infix_+^).

We can get around that be converting the binary value to a decimal value, before applying the XOR. Let us try:

Obtaining the number of flips:

> say ("10101".parse-base(2) +^ "10111".parse-base(2)).base(2).comb.sum;  # -> 1
> say ("11101".parse-base(2) +^ "10111".parse-base(2)).base(2).comb.sum;  # -> 2

This certainly works, but it requires a lot of code. So I'll use a much simpler approach - comparing each digit, one by one.

File: binary-substring
#! /usr/bin/env raku

subset BinaryString where /^ <[01]>+ $/;                    # [1]
subset PosInt of Int where * > 0;                           # [2]

unit sub MAIN (BinaryString $B = '101100101',               # [1]
               PosInt $S where $B.chars %% $S = 3,          # [2]
	       :v(:$verbose));

my @B     = $B.comb($S.Int);                                # [3]
my $first = @B.shift;                                       # [4]
my $total = 0;                                              # [5]

for @B -> $current                                          # [6]
{
  my $flip = bit-diff($first, $current);                    # [7]
  $total += $flip;                                          # [8]
  say ": $first -> $current -> Flip: $flip" if $verbose;
}

say $total;                                                 # [9]

sub bit-diff ($a, $b)                                       # [7]
{
  my $flip = 0;                                             # [10]

  for ^$a.chars -> $index                                   # [11]
  {
    $flip++ if $a.substr($index,1) ne $b.substr($index,1);  # [12]
  }

  return $flip;
}

[1] Ensure that the binary string is legal (containing «0»s and «1»s only).

[2] Ensure a positive integer, and also that the string is evenly divisible by it. (E.g. "4" gives us substrings of length 4, and the program will abort if the last one is shorter.)

[3] comb is usually used to split a string into single characters, but we can get more than one character in each subtring by specifying the length like this.

[4] The examples start by comparing the first substring with itself, giving zero flips. That is silly(ish), so I skip this by shifting out the first substring.

[5] The result will go here.

[6] For each substring (except the first one, see [4]).

[7] Get the number of flips for each substring,

[8] and add it to the total.

[9] Print it.

[10] The number of flips will go here.

[11] For each index in the two substrings (which have the same length),

[12] • Add 1 to the total if the characters at the given position differ, implying a shift.

See docs.raku.org/routine/comb for more information about comb.

Running it:

$ ./binary-substring "101100101" 3
 1

$ ./binary-substring -v "101100101" 3
: 101 -> 100 -> Flip: 1
: 101 -> 101 -> Flip: 0
1

$ ./binary-substring "10110111" 4
2

$ ./binary-substring -v "10110111" 4
: 1011 -> 0111 -> Flip: 2
2

Looking good.

Perl

This is a straight forward translation of the Raku version, except that I had to implement «comb»:

File: binary-substring-perl
#! /usr/bin/env perl

use strict;
use warnings;
use feature 'say';
use feature 'signatures';
use Getopt::Long;

no warnings "experimental::signatures";

my $verbose = 0;

GetOptions("verbose"  => \$verbose);

my $B = shift(@ARGV) // '101100101';

die "Not a binary number" unless $B =~ /^[01]+$/;

my $S = shift(@ARGV) // 3;

die "Not an integer" unless $S =~ /^[1-9][0-9]*$/;
die "Not a legal length" if length($B) % $S;

my @B     =  comb($B, $S);
my $first = shift(@B);
my $total = 0;

for my $current (@B)
{
  my $flip = bit_diff($first, $current);
  $total += $flip;
  say ": $first -> $current -> Flip: $flip" if $verbose;
}

say $total;

sub bit_diff ($a, $b)
{
  my $flip = 0;

  for my $index (0 .. length($a))
  {
    $flip++ if substr($a, $index,1) ne substr($b, $index,1);
  }

  return $flip;
}

sub comb ($string, $length = 1)  # [1]
{
  my @result;

  while ($string)
  {
    push(@result, substr($string, 0, $length));
    $string = substr($string, $length);
  }
  return @result;
}

[1] The missing Raku routine «comb». The optional second argument specifies the length of the substrings (of the first argument) to include in each substring returned by it.

Running it gives the same result as the Raku version(s):

$ ./binary-substring-perl "101100101" 3
1

$ ./binary-substring-perl -v "101100101" 3
: 101 -> 100 -> Flip: 1
: 101 -> 101 -> Flip: 0
1

$ ./binary-substring-perl "101100111" 3
2

$ ./binary-substring-perl -v "101100111" 3
: 101 -> 100 -> Flip: 1
: 101 -> 111 -> Flip: 1
2

And that's it.