Unicode Zip
with Raku

by Arne Sommer

Unicode Zip with Raku and Perl

[206] Published 16. October 2022.

This is my response to The Weekly Challenge #186.

Challenge #186.1: Zip List

You are given two list @a and @b of same size.

Create a subroutine sub zip(@a, @b) that merge the two list as shown in the example below.

Example:
Input:  @a = qw/1 2 3/; @b = qw/a b c/;
Output: zip(@a, @b) should return qw/1 a 2 b 3 c/;
        zip(@b, @a) should return qw/a 1 b 2 c 3/;

Let us start with the Perl version this time.

A Perl Version

Note that we cannot use arrays as procedure arguments, so I use references instead.

File: zip-list-perl
#! /usr/bin/env perl

use strict;
use warnings;
use feature 'say';
use feature 'signatures';
no warnings 'experimental::signatures';

my @a    = qw/1 2 3/;
my @b    = qw/a b c/;
my @zip1 = zip(\@a, \@b);            # [1]
my @zip2 = zip(\@b, \@a);

say 'qw/' . join(" ", @zip1) . '/;'; # [2]
say 'qw/' . join(" ", @zip2) . '/;'; # [2]

sub zip ($a, $b)                     # [3]
{
  my @return;

  for my $index (0 .. @$a -1)        # [4]
  {
    push(@return, $a->[$index]);     # [5]
    push(@return, $b->[$index]);
  }

  return @return;
}

[1] Pass two array references as argument.

[2] The challenge wanted this output, so here it is.

[3] The references are scalar variables.

[4] Iterate over the indices of the first array.

[5] Copy the array element with the current index.

Running it gives the excact same output as given by the example:

$ ./zip-list-perl
qw/1 a 2 b 3 c/;
qw/a 1 b 2 c 3/;

The Raku version

Raku has a built-in zip function, so we do not actually have to program it:

> my @a = <1 2 3>
> my @b = <a b c>
> say zip(@a, @b);       # -> ((1 a) (2 b) (3 c))

The result is a list of lists (with one element from each of the input lists), so we have to flatten it to get the desired result:

> say zip(@a, @b).flat;  # -> (1 a 2 b 3 c)

See docs.raku.org/routine/zip more information about zip.

But the task was to implement it ourselves, so let us have a go at that.

File: zip-list
#! /usr/bin/env raku

my @a    = qw/1 2 3/;
my @b    = qw/a b c/;
my @zip1 = my-zip(@a, @b);
my @zip2 = my-zip(@b, @a);

say "qw/@zip1[]/;"; 
say "qw/@zip2[]/;"; 

sub my-zip (@a is copy, @b is copy)  # [1]
{
  my @return;

  while @a
  {
    @return.push: @a.shift;
    @return.push: @b.shift;
  }

  return @return;
}

[1] Note the use of is copy so that the shifts work on a local copy, and not the global variable - which would result in an empty list in «@zip2».

Running it:

./zip-list
qw/1 a 2 b 3 c/;
qw/a 1 b 2 c 3/;

Looking good.

Challenge #186.2: Unicode Makeover

You are given a string with possible unicode characters.

Create a subroutine sub makeover($str) that replace the unicode characters with ascii equivalent. For this task, let us assume it only contains alphabets.

Example 1:
Input: $str = 'ÃÊÍÒÙ';
Output: 'AEIOU'
Example 2:
Input: $str = 'âÊíÒÙ';
Output: 'aEiOU'

Raku has built-in support for this, with the samemark method:

> say 'ÃÊÍÒÙ'.samemark('a')
AEIOU

> say 'ÃÊaqøæÍÒÙ'.samemark('a')
AEaqøæIOU

See docs.raku.org/routine/samemark more information about samemark.

The argument can either be a string or a single character, as used here. The mark/accent on this character is applied to (a copy of) the string. As we use «a», which does not have any marks, the result is a string devoid of them.

Wrapped up as a program:

File: unicode-makeover
#! /usr/bin/env raku

say makeover('ÃÊÍÒÙ');
say makeover('âÊíÒÙ');

sub makeover ($str)
{
  return $str.samemark('a');
}

Running it:

$ ./unicode-makeover
AEIOU
aEiOU

Looking good.

Perl

Perl does not support this sort of trickery out of the box, but the module Unicode::Normalize is helpful.

File: unicode-makeover-perl
#! /usr/bin/env perl

use strict;
use warnings;
use utf8;
use feature 'say';
use feature 'unicode_strings';
use feature 'signatures';
no warnings 'experimental::signatures';

use Unicode::Normalize;

say makeover('ÃÊÍÒÙ');
say makeover('âÊíÒÙ');

sub makeover ($str)
{
  my $nfkd = NFKD($str);             # [1]
  $nfkd =~ s/\p{NonspacingMark}//g;  # [2]
  return $nfkd;
}

[1] Split the characters and marks/accents into separate codepoints.

[2] Remove all the mark/accent codepoints.

Running it gives the same result as the Raku version:

$ ./unicode-makeover-perl
AEIOU
aEiOU

Note that the program will give unprintable output for unicode characters where it was unable to remove the mark/accent from a non-ascii character:

say makeover('ßØÆÅøæåÖ'); # -> ���A��aO

That is unhelpful.

And that's it.