Bracen C with Raku

[36] Published 11. October 2019

Perl 6 → Raku

This article has been moved from «perl6.eu» and updated to reflect the language rename in 2019.

This is my response to the Perl Weekly Challenge #029.

Challenge #29.1: Brace Expansion

Write a script to demonstrate brace expansion. For example, script would take command line argument Perl {Daily,Weekly,Monthly,Yearly} Challenge and should expand it and print like below:

Perl Daily Challenge
Perl Weekly Challenge
Perl Monthly Challenge
Perl Yearly Challenge

Using a regular expression seems like the right thing to do:

File: brace-expansion

sub MAIN ($string)                      # [1]
{
  if $string ~~ /(.*) \{ (.*) \} (.*)/  # [2] [3]
  {
    say "$0$_$2" for $1.Str.split(",")  # [3]
    # 3c ####### # 3b ##### # 3a #####
  }
  else                                  # [4]
  {
    say $string;                        # [4]
  }
}

[1] I use «MAIN» to get the argument.

[2] The regular expression is a three way split; anything before a {, anything between that { and a }, and finally anything after that }.

[3] If the regex matched, split the middle part (between the { and a }) on comma (3a), iterate over the partial strings (3b), and print the brace expanded texts (3c).

[4] If the regex didn't match, just print the string.

Running it:

$ raku brace-expansion "Perl {Daily,Weekly,Monthly,Yearly} Challenge"
Perl Daily Challenge
Perl Weekly Challenge
Perl Monthly Challenge
Perl Yearly Challenge

With a missing brace we get the text back unchanged:

$ raku brace-expansion "Perl {Daily,Weekly,Monthly,Yearly Challenge"
Perl {Daily,Weekly,Monthly,Yearly Challenge

Multiple brace arguments doesn't work:

$ raku brace-expansion "{Perl, Ruby, Java} {Daily,Weekly,Monthly,Yearly} Challenge"
{Perl, Ruby, Java} Daily Challenge
{Perl, Ruby, Java} Weekly Challenge
{Perl, Ruby, Java} Monthly Challenge
{Perl, Ruby, Java} Yearly Challenge

The last brace block is expanded as I have used greedy regex matching (.*). Non-greedy matching (.*?) will expand the first brace block, which is just as wrong. I'll get back to this problem later.

Cosmetic Changes

We can use «multi MAIN» to separate the two cases (and avoid the if/else block):

File: brace-expansion-multi

multi MAIN ($string where $string ~~ /(.*) \{ (.*) \} (.*)/)
{
  say "$0$_$2" for $1.Str.split(",")
}

multi MAIN ($string)
{
  say $string;
}

It behaves just as the previous version, and it is a matter of taste which version to use.

We can do it as a one-liner (shown with linebreaks to help readability):

File: brace-expansion-oneliner

@*ARGS && @*ARGS[0] ~~ /(.*) \{ (.*) \} (.*)/
  ?? $1.Str.split(",").map({ say "$0$_$2" })
  !! say @*ARGS[0];

We fetch the input string from the @*ARGS array directly, instead of using a MAIN wrapper which would have given a nice variable name. A «for» loop doesn't look good in a one-liner, so I have used «map» instead.

If we call it without an argument, we get a (confusing) result. The original program handled it automatically:

$ raku brace-expansion-oneliner 
(Any)

$ raku brace-expansion
Usage:
  brace-expansion <string>

Adding a pair of parens makes the «(Any)» go away, but still no error message:

File: brace-expansion-oneliner-fixedish

@*ARGS && (@*ARGS[0] ~~ /(.*) \{ (.*) \} (.*)/
  ?? $1.Str.split(",").map({ say "$0$_$2" })
  !! say @*ARGS[0]);

$ raku brace-expansion-oneliner-fixedish

That is probably ok.

Multiple Braces

I promised to get back to this, so here it is.

It turned out to be rather easy, with one minor (but important) change from «brace-expansion-multi»:

File: brace-expansion-turbo

multi MAIN ($string where $string ~~ /(.*? \{ (.*?) \} (.*)/)
{
  MAIN("$0$_$2") for $1.Str.split(",");
}

multi MAIN ($string)
{
  say $string;
}

«MAIN» is a magical procedure, but we can call it ourselves. As done here, recursively. The first call will expand the last brace block to individual lines, and it calls itself to go on. If the expanded lines have more brace blocks, the next one (from the right) will be expanded. And so on, until there are no more brace blocks. Then the second «multi MAIN» kicks in, doing the output.

Running it:

$ raku brace-expansion-turbo "{Perl,Ruby,Java} {Daily,Weekly,Monthly,Yearly} \
    Challenge"
Perl Daily Challenge
Ruby Daily Challenge
Java Daily Challenge
Perl Weekly Challenge
Ruby Weekly Challenge
Java Weekly Challenge
Perl Monthly Challenge
Ruby Monthly Challenge
Java Monthly Challenge
Perl Yearly Challenge
Ruby Yearly Challenge
Java Yearly Challenge

It works quite well, but the sorting order could be better. We can fix that by using non-greedy matching:

File: brace-expansion-turbo-ng

multi MAIN ($string where $string ~~ /^(.*?) \{ (.*?) \} (.*)/)
{
  MAIN("$0$_$2") for $1.Str.split(",");
}

multi MAIN ($string)
{
  say $string;
}

Non-greedy matches for the first and second part ensures that we expand the first brace block only. Note the initial «^» to ensure that we match from the beginning of the string. This is actually not required here, but it doesn't hurt to be wary when dealing with non-greedy maching.

$ raku brace-expansion-turbo-ng "{Perl,Ruby,Java} {Daily,Weekly,Monthly,\
    Yearly} Challenge"
Perl Daily Challenge
Perl Weekly Challenge
Perl Monthly Challenge
Perl Yearly Challenge
Ruby Daily Challenge
Ruby Weekly Challenge
Ruby Monthly Challenge
Ruby Yearly Challenge
Java Daily Challenge
Java Weekly Challenge
Java Monthly Challenge
Java Yearly Challenge

As the program is recursive, you can have as many brace blocks as you want. But the output will be very long if you go overboard.

$ raku brace-expansion-turbo "{Perl,Ruby,Java} {Daily,Weekly,Monthly,Yearly} \
  {Challenge,Chore}."
Perl Daily Challenge.
Perl Daily Chore.
Perl Weekly Challenge.
Perl Weekly Chore.
Perl Monthly Challenge.
Perl Monthly Chore.
Perl Yearly Challenge.
Perl Yearly Chore.
Ruby Daily Challenge.
Ruby Daily Chore.
Ruby Weekly Challenge.
Ruby Weekly Chore.
Ruby Monthly Challenge.
Ruby Monthly Chore.
Ruby Yearly Challenge.
Ruby Yearly Chore.
Java Daily Challenge.
Java Daily Chore.
Java Weekly Challenge.
Java Weekly Chore.
Java Monthly Challenge.
Java Monthly Chore.
Java Yearly Challenge.
Java Yearly Chore.

Challenge #29.2. Calling C

Write a script to demonstrate calling a C function. It could be any user defined or standard C function.

I have only tried this on Linux. The code should run on Mac, but I have no idea if it will work on Windows.

This is quite easy(ish) in Raku. We use the «NativeCall» interface, without the need for external wrappers (as required in Perl 5).

See docs.raku.org/language/nativecall for more information about «NativeCall».

I have chosen to call the C function «toupper» to convert a text string to upper case.

The «toupper» function is defined in «ctype.h», but the code resides in the «libc» library. (And finding that out required research. Also known as Google.) All libraries have names starting with «lib», so we specify it simply as «c».

«toupper» doesn't work on strings, but on single characters. So I'll have to write a wrapper function iterating over the individual characters. A character in C is a byte (8 bit). The corresponding Raku type is «uint8» (unsigned int 8-bit).

File: c-toupper (partial)

use NativeCall;

sub toupper(uint8) returns uint8 is native('c', v6) { * }

Note the "v6" part, which specifies with version of the library to use. The program will not work without it. (You can run «locate 'libc.'» to see what matches, if the «v6» part doesn't work. The filename is «libc.so.6» for version 6.) (Note the missing quotes on «v6» as it is a version object.)

See docs.raku.org/type/Version for more information about Version objects.

It is also possible to specify the library file directly:

sub toupper(uint8) returns uint8 is native('libc.so.6') { * }

The starred block at the end tells the compiler that the body is a placeholder, as the actual code resides elsewhere. In this case in the external (native) library.

File: c-toupper (partial)

sub to-upper ($string)
{
  return $string.comb.map({ toupper($_.ord).chr }).join;
    ###### 1 ####### # 2 ######################## # 3 #
}

sub MAIN ($string)
{
  say "Before: $string";

  my $new = to-upper($string);

  say "After:  $new (modified)";
}

[1] Split the string into each character,

[2] run the «map» on each of the characters,

[3] and join them together again to a string.

The «map» code takes the character, which is a Raku string (with length 1), converts it to a number (the ascii, or rather utf-8, value) with «ord», as the C fuction expects an 8-bit number. When we get it back, it is still a number so we convert it back to a character with «chr».

Running it:

$ raku c-toupper "Hello World!"
Before: Hello World!
After:  HELLO WORLD!

Utf-8 characters work (as in «doesn't crash the program»), but only ascii letters (a-z) are converted to upper case:

$ raku c-toupper "Hello øæåß"
Before: Hello øæåß
After:  HELLO øæåß

$ raku c-toupper "Hello ♣"
Before: Hello ♣
After:  HELLO ♣

The ♣ character is encoded as 3 bytes in utf-8 (0xE2 0x99 0xA3), but the program works even if I have told it to send 8-bit values. I don't know why.

The Raku functions «uc» (upper case), «lc» (lower case) (and some more) handle everything that is set up as a letter in utf-8:

$ raku
> "Hello øæåß".uc
HELLO ØÆÅSS

Note that «ß» is converted to «SS» (2 characters) in upper case, and converting it back to lower case gives «ss» (so that it doesn't round trip):

> "Hello øæåß".uc.lc
hello øæåss

Custom C Code

We can write the missing C function that converts a whole string to uppercase, and call that from our program:

File: upper_string.c

#include 
#include 

char *upper_string(char s[])
{
  int c = 0;
  static char string[1000];
 
  while (s[c] != '\0')
  {
    string[c] = toupper(s[c]);
    c++;
  }
  string[c] = '\0';
  return string;
}

It sets up a new string buffer (with length 1000 to ensure that it is long enough. Or rather, we hope so. I should have added a test for this limit in the loop, but didn't bother as this isn't a C best practice article).

Compilation involves two steps, where the second one gives the shared library:

$ gcc -O2 -fPIC -c upper_string.c -o upper_string.o
$ gcc -O2 -fPIC -shared upper_string.o -o upper_string.so

I got the gcc commands from this Introduction to Perl 6 Native Call from 2016.

This is a version of the Raku program using this new function:

File: c-toupper2

use NativeCall;

sub upper_string(Str) returns Str is native('./upper_string.so') { * }

sub MAIN ($string)
{
  say "Before: $string";

  my $new = upper_string($string);

  say "After:  $new";
}

C doesn't have strings, but char pointers (char *). NativeCall knows that, and does the translation so that we can use «Str» in the code.

The library is now in the current directory, and not somewhere in the library path (as before).

Running it gives the same result as before, as we still use «toupper» to do the actual job:

$ raku c-toupper2 "Hello øæåß"
Before: Hello øæåß
After:  HELLO øæåß

$ raku c-toupper2 "Hello ♣"
Before: Hello ♣
After:  HELLO ♣

And that's it.

Links

About me
Keyword Index
Raku Courses
Beginning Raku (book)
RSS Feed
Reddit Comments
Code (zip)