This article has been moved from «perl6.eu» and updated to reflect the language rename in 2019.
This is my response to the Perl Weekly Challenge #029.
Perl Daily Challenge
Perl Weekly Challenge
Perl Monthly Challenge
Perl Yearly Challenge
Using a regular expression seems like the right thing to do:
File: brace-expansion
sub MAIN ($string) # [1]
{
if $string ~~ /(.*) \{ (.*) \} (.*)/ # [2] [3]
{
say "$0$_$2" for $1.Str.split(",") # [3]
# 3c ####### # 3b ##### # 3a #####
}
else # [4]
{
say $string; # [4]
}
}
[1] I use «MAIN» to get the argument.
[2] The regular expression is a three way split; anything before a {
,
anything between that {
and a }
, and finally anything
after that }
.
[3] If the regex matched, split the middle part (between the {
and a }
) on comma (3a), iterate over the partial strings (3b), and
print the brace expanded texts (3c).
[4] If the regex didn't match, just print the string.
Running it:
$ raku brace-expansion "Perl {Daily,Weekly,Monthly,Yearly} Challenge"
Perl Daily Challenge
Perl Weekly Challenge
Perl Monthly Challenge
Perl Yearly Challenge
With a missing brace we get the text back unchanged:
$ raku brace-expansion "Perl {Daily,Weekly,Monthly,Yearly Challenge"
Perl {Daily,Weekly,Monthly,Yearly Challenge
Multiple brace arguments doesn't work:
$ raku brace-expansion "{Perl, Ruby, Java} {Daily,Weekly,Monthly,Yearly} Challenge"
{Perl, Ruby, Java} Daily Challenge
{Perl, Ruby, Java} Weekly Challenge
{Perl, Ruby, Java} Monthly Challenge
{Perl, Ruby, Java} Yearly Challenge
The last brace block is expanded as I have
used greedy regex matching (.*
). Non-greedy matching (.*?
)
will expand the first brace block, which is just as wrong. I'll get back to this problem
later.
multi MAIN ($string where $string ~~ /(.*) \{ (.*) \} (.*)/)
{
say "$0$_$2" for $1.Str.split(",")
}
multi MAIN ($string)
{
say $string;
}
It behaves just as the previous version, and it is a matter of taste which version to use.
We can do it as a one-liner (shown with linebreaks to help readability):
File: brace-expansion-oneliner
@*ARGS && @*ARGS[0] ~~ /(.*) \{ (.*) \} (.*)/
?? $1.Str.split(",").map({ say "$0$_$2" })
!! say @*ARGS[0];
We fetch the input string from the @*ARGS
array directly, instead of
using a MAIN wrapper which would have given a nice variable name. A «for» loop
doesn't look good in a one-liner, so I have used «map» instead.
If we call it without an argument, we get a (confusing) result. The original program handled it automatically:
$ raku brace-expansion-oneliner
(Any)
$ raku brace-expansion
Usage:
brace-expansion <string>
Adding a pair of parens makes the «(Any)» go away, but still no error message:
File: brace-expansion-oneliner-fixedish
@*ARGS && (@*ARGS[0] ~~ /(.*) \{ (.*) \} (.*)/
?? $1.Str.split(",").map({ say "$0$_$2" })
!! say @*ARGS[0]);
$ raku brace-expansion-oneliner-fixedish
That is probably ok.
It turned out to be rather easy, with one minor (but important) change from «brace-expansion-multi»:
File: brace-expansion-turbo
multi MAIN ($string where $string ~~ /(.*? \{ (.*?) \} (.*)/)
{
MAIN("$0$_$2") for $1.Str.split(",");
}
multi MAIN ($string)
{
say $string;
}
«MAIN» is a magical procedure, but we can call it ourselves. As done here, recursively. The first call will expand the last brace block to individual lines, and it calls itself to go on. If the expanded lines have more brace blocks, the next one (from the right) will be expanded. And so on, until there are no more brace blocks. Then the second «multi MAIN» kicks in, doing the output.
Running it:
$ raku brace-expansion-turbo "{Perl,Ruby,Java} {Daily,Weekly,Monthly,Yearly} \
Challenge"
Perl Daily Challenge
Ruby Daily Challenge
Java Daily Challenge
Perl Weekly Challenge
Ruby Weekly Challenge
Java Weekly Challenge
Perl Monthly Challenge
Ruby Monthly Challenge
Java Monthly Challenge
Perl Yearly Challenge
Ruby Yearly Challenge
Java Yearly Challenge
It works quite well, but the sorting order could be better. We can fix that by using non-greedy matching:
File: brace-expansion-turbo-ng
multi MAIN ($string where $string ~~ /^(.*?) \{ (.*?) \} (.*)/)
{
MAIN("$0$_$2") for $1.Str.split(",");
}
multi MAIN ($string)
{
say $string;
}
Non-greedy matches for the first and second part ensures that we expand the first brace block only. Note the initial «^» to ensure that we match from the beginning of the string. This is actually not required here, but it doesn't hurt to be wary when dealing with non-greedy maching.
$ raku brace-expansion-turbo-ng "{Perl,Ruby,Java} {Daily,Weekly,Monthly,\
Yearly} Challenge"
Perl Daily Challenge
Perl Weekly Challenge
Perl Monthly Challenge
Perl Yearly Challenge
Ruby Daily Challenge
Ruby Weekly Challenge
Ruby Monthly Challenge
Ruby Yearly Challenge
Java Daily Challenge
Java Weekly Challenge
Java Monthly Challenge
Java Yearly Challenge
As the program is recursive, you can have as many brace blocks as you want. But the output will be very long if you go overboard.
$ raku brace-expansion-turbo "{Perl,Ruby,Java} {Daily,Weekly,Monthly,Yearly} \
{Challenge,Chore}."
Perl Daily Challenge.
Perl Daily Chore.
Perl Weekly Challenge.
Perl Weekly Chore.
Perl Monthly Challenge.
Perl Monthly Chore.
Perl Yearly Challenge.
Perl Yearly Chore.
Ruby Daily Challenge.
Ruby Daily Chore.
Ruby Weekly Challenge.
Ruby Weekly Chore.
Ruby Monthly Challenge.
Ruby Monthly Chore.
Ruby Yearly Challenge.
Ruby Yearly Chore.
Java Daily Challenge.
Java Daily Chore.
Java Weekly Challenge.
Java Weekly Chore.
Java Monthly Challenge.
Java Monthly Chore.
Java Yearly Challenge.
Java Yearly Chore.
I have only tried this on Linux. The code should run on Mac, but I have no idea if it will work on Windows.
This is quite easy(ish) in Raku. We use the «NativeCall» interface, without the need for external wrappers (as required in Perl 5).
See docs.raku.org/language/nativecall for more information about «NativeCall».
I have chosen to call the C function «toupper» to convert a text string to upper case.
The «toupper» function is defined in «ctype.h», but the code resides in the «libc» library. (And finding that out required research. Also known as Google.) All libraries have names starting with «lib», so we specify it simply as «c».
«toupper» doesn't work on strings, but on single characters. So I'll have to write a wrapper function iterating over the individual characters. A character in C is a byte (8 bit). The corresponding Raku type is «uint8» (unsigned int 8-bit).
File: c-toupper (partial)
use NativeCall;
sub toupper(uint8) returns uint8 is native('c', v6) { * }
Note the "v6" part, which specifies with version of the library to use. The program will not work without it. (You can run «locate 'libc.'» to see what matches, if the «v6» part doesn't work. The filename is «libc.so.6» for version 6.) (Note the missing quotes on «v6» as it is a version object.)
See docs.raku.org/type/Version for more information about Version objects.
It is also possible to specify the library file directly:
sub toupper(uint8) returns uint8 is native('libc.so.6') { * }
The starred block at the end tells the compiler that the body is a placeholder, as the actual code resides elsewhere. In this case in the external (native) library.
File: c-toupper (partial)
sub to-upper ($string)
{
return $string.comb.map({ toupper($_.ord).chr }).join;
###### 1 ####### # 2 ######################## # 3 #
}
sub MAIN ($string)
{
say "Before: $string";
my $new = to-upper($string);
say "After: $new (modified)";
}
[1] Split the string into each character,
[2] run the «map» on each of the characters,
[3] and join them together again to a string.
The «map» code takes the character, which is a Raku string (with length 1), converts it to a number (the ascii, or rather utf-8, value) with «ord», as the C fuction expects an 8-bit number. When we get it back, it is still a number so we convert it back to a character with «chr».
Running it:
$ raku c-toupper "Hello World!"
Before: Hello World!
After: HELLO WORLD!
Utf-8 characters work (as in «doesn't crash the program»), but only ascii letters (a-z) are converted to upper case:
$ raku c-toupper "Hello øæåß"
Before: Hello øæåß
After: HELLO øæåß
$ raku c-toupper "Hello ♣"
Before: Hello ♣
After: HELLO ♣
The ♣ character is encoded as 3 bytes in utf-8 (0xE2 0x99 0xA3), but the program works even if I have told it to send 8-bit values. I don't know why.
The Raku functions «uc» (upper case), «lc» (lower case) (and some more) handle everything that is set up as a letter in utf-8:
$ raku
> "Hello øæåß".uc
HELLO ØÆÅSS
Note that «ß» is converted to «SS» (2 characters) in upper case, and converting it back to lower case gives «ss» (so that it doesn't round trip):
> "Hello øæåß".uc.lc
hello øæåss
#include
#include
char *upper_string(char s[])
{
int c = 0;
static char string[1000];
while (s[c] != '\0')
{
string[c] = toupper(s[c]);
c++;
}
string[c] = '\0';
return string;
}
It sets up a new string buffer (with length 1000 to ensure that it is long enough. Or rather, we hope so. I should have added a test for this limit in the loop, but didn't bother as this isn't a C best practice article).
Compilation involves two steps, where the second one gives the shared library:
$ gcc -O2 -fPIC -c upper_string.c -o upper_string.o
$ gcc -O2 -fPIC -shared upper_string.o -o upper_string.so
I got the gcc commands from this Introduction to Perl 6 Native Call from 2016.
This is a version of the Raku program using this new function:
File: c-toupper2
use NativeCall;
sub upper_string(Str) returns Str is native('./upper_string.so') { * }
sub MAIN ($string)
{
say "Before: $string";
my $new = upper_string($string);
say "After: $new";
}
C doesn't have strings, but char pointers (char *
). NativeCall
knows that, and does the translation so that we can use «Str» in the code.
The library is now in the current directory, and not somewhere in the library path (as before).
Running it gives the same result as before, as we still use «toupper» to do the actual job:
$ raku c-toupper2 "Hello øæåß"
Before: Hello øæåß
After: HELLO øæåß
$ raku c-toupper2 "Hello ♣"
Before: Hello ♣
After: HELLO ♣
And that's it.