Out of Order
with Raku

by Arne Sommer

Out of Order with Raku

[249] Published 13. August 2023.

This is my response to The Weekly Challenge #229.

Challenge #229.1: Lexicographic Order

You are given an array of strings.

Write a script to delete element which is not lexicographically sorted (forwards or backwards) and return the count of deletions.

Example 1:
Input: @str = ("abc", "bce", "cae")
Output: 1

In the given array "cae" is the only element which is not
lexicographically sorted.
Example 2:
Input: @str = ("yxz", "cba", "mon")
Output: 2

In the given array "yxz" and "mon" are not lexicographically sorted.

The only sane way of doing this is to use the Unicode sorting order (similar to Isolatin and even Ascii).

File: lexi-order
#! /usr/bin/env raku

unit sub MAIN (*@str where @str.elems > 0, :v(:$verbose)); # [1]

sub is-lexi-sorted ($string)                               # [2]
  return True if [<=] $string.ords;                        # [3]
  return         [>=] $string.ords;                        # [4]

say ": OK: { @str.grep(*.&is-lexi-sorted).map('"' ~ * ~ '"').join(", ") }"
  if $verbose;                                             # [7]

my @not-ok = @str.grep( ! *.&is-lexi-sorted );             # [5]

say ": Not OK: { @not-ok.map('"' ~ * ~ '"').join(", ") }" if $verbose;

say @not-ok.elems;                                         # [6]

[1] Ensure at least one string.

[2] The procedure deciding if the given string is lexicographically sorted.

[3] We use the Reduction Metaoperator [] together with the «numerically less than or equal» operator <= to tell us if the assertion (less than or equal) applies to every neighbourly values in the list of Unicode codepoints. The codepoints are supplied by ords, which works on strings, returning the values for each character in that string. We return True if this assertion holds.

See docs.raku.org/language/operators#Reduction_metaoperators for more information about the Reduction Metaoperator [].

See docs.raku.org/routine/ords for more information about ords.

[4] The «greather than or equal» assertion. This returns true or False.

[5] Get the offending strings (i.e. those not lexicographically sorted), by negating the procedure in [2].

[6] Print the number of offending strings.

[7] Note that this grep expression gives us the explicit result; a list where we have deleted the offending strings. This list is not actually used anywhere, except by verbose mode, but here it is.

Running it:

$ ./lexi-order "abc" "bce" "cae"

$ ./lexi-order "yxz" "cba" "mon"

Looking good.

With verbose mode:

$ ./lexi-order -v "abc" "bce" "cae"
: OK: "abc", "bce"
: Not OK: "cae"

$ ./lexi-order -v "yxz" "cba" "mon"
: OK: "cba"
: Not OK: "yxz", "mon"

So far, so good.

But what about upper case letters, or even non-letters, in the strings (in addition to the prescribed lower case letters). This is a case (pun intended) of sorting order blues...

$ ./lexi-order -v "abc" "Abc" "aBc" "abC"
: OK: "abc", "Abc"
: Not OK: "aBc", "abC"

In Computer Science the uppercase letters precede the lower case variants, as shown here. In the real world (i.e. a dictionary) they are considered identical.

The challenge specifies strings and not words, so it is reasonable to postulate that the result we got above is indeed correct. Especially if we add non-letters to the mix:

$ ./lexi-order -v "abc12" "abc21" "12abc" "21abc"
: OK: "12abc"
: Not OK: "abc12", "abc21", "21abc"

A librarian will almost certainly disagree, though. Here is a librarian-friendly version of the program:

File: lexi-order-librarian
#! /usr/bin/env raku

unit sub MAIN (*@str where @str.elems > 0
                 && all(@str) ~~ /^<[a..z A..Z]>+$/,  # [1]

sub is-lexi-sorted ($string)
  return True if [<=] $string.lc.ords;                # [2]
  return         [>=] $string.lc.ords;                # [2]

say ": OK: { @str.grep(*.&is-lexi-sorted).map('"' ~ * ~ '"').join(", ") }"
  if $verbose;

my @not-ok = @str.grep( ! *.&is-lexi-sorted );

say ": Not OK: { @not-ok.map('"' ~ * ~ '"').join(", ") }" if $verbose;

say @not-ok.elems;

[1] Ensure lower and uppercase Ascii letters (A to Z) only in the strings.

[2] Coerce the string to lowercase before getting the codepoints.

Running it gives the expected result:

$ ./lexi-order-librarian -v "abc" "Abc" "aBc" "abC"
: OK: "abc", "Abc", "aBc", "abC"
: Not OK: 

Note that non-ascii letters (as well as non-letters) are illegal, courtesy of the explicit regex in [1]. That is intentional, as non-ascii characters come with a code point that is somewhat random. If we were to do this correctly, we would have to take the current language into account - as the sorting order is context sensitive. Let us not descend into that rabbit hole...

The notion of "current language" is also a problem, as it must take the input strings into account, and not rely on the computer running the program (the language locale setting).

Also note that the Germans would have a field day of my use of .lc, as the lowercase "ß" is translated to "SS" in uppercase. This does not round trip:

> say "straße".uc;   # -> STRASSE
> say "STRASSE".lc;  # -> strasse

That was awfully close to the rabbit hole, so let us stop there...

Challenge #229.2: Two out of Three

You are given three array of integers.

Write a script to return all the elements that are present in at least 2 out of 3 given arrays.

Example 1:
Input: @array1 = (1, 1, 2, 4)
       @array2 = (2, 4)
       @array3 = (4)
Ouput: (2, 4)
Example 2:
Input: @array1 = (4, 1)
       @array2 = (2, 4)
       @array3 = (1, 2)
Ouput: (1, 2, 4)

File: two-out-of-tree
#! /usr/bin/env raku

unit sub MAIN ($one, $two, $three, :v(:$verbose));                     # [1]

my @array1 = $one.words;                                               # [2]
my @array2 = $two.words;                                               # [2a]
my @array3 = $three.words;                                             # [2b]
my $bag    = (@array1.unique, @array2.unique, @array3.unique).Bag;     # [3]
my $ok     = $bag.grep( *.value >= 2 );                                # [4]

say ":All: { $bag.raku }" if $verbose;
say ":Ok: { $ok.raku }"   if $verbose;

say $ok.elems;                                                         # [5]

[1] Specify the three strings with quotes (thus ensuring that the spaces work as value separators, within the strings).

[2] Use words to get the list of integers.

[3] This is very easy, really. Get rid of duplicates (if any) in each array (with unique), add the duplicate-free arrays together (here with the comma operator), turn the result (which is an array with three elements, each one an array) into a Bag, a hash like structure where the values in the list turns out as keys, and the values are the number of times they occured.

See docs.raku.org/type/Bag for more information about the Bag type.

[4] Then we use grep to get those values (the value method gives us the frequency, and the key method would give the original value) that occured two or more times.

[5] Print the number of matches, i.e. the number of Pair objects (key, value) in the Bag.

Note the missing check on the input values for integerness. One could be forgiven for trying something like this, preferable between lines [2b] and [3]:

die "array1 must contain integers only" unless all(@array1) ~~ Int;
die "array2 must contain integers only" unless all(@array2) ~~ Int;
die "array3 must contain integers only" unless all(@array3) ~~ Int;

But this will not work, as words returns strings. Use something like this instead, if you absolutely feel the need:

die "array1 must contain integers only" unless all(@array1) ~~ /^<[0..9]>+$/;

Beware that it does not allow for negative values, and nonsense values as «000» and «007» are allowed.

The challenge does indeed specify integers, but it really does not matter. As long as we use one or more spaces between the values to make words happy.

Running it:

$ ./two-out-of-tree "1 1 2 4" "2 4" "4"

$ ./two-out-of-tree "1 1 2 -4" "2 -4" "4"

$ ./two-out-of-tree "a b c" "d e a" "b"

Looking good.

With verbose mode:

$ ./two-out-of-tree -v "1 1 2 4" "2 4" "4"
:All: ("1"=>1,"4"=>3,"2"=>2).Bag
:Ok: $(("4" => 3, "2" => 2).Seq)

$ ./two-out-of-tree -v "1 1 2 -4" "2 -4" "4"
:All: ("-4"=>2,"4"=>1,"2"=>2,"1"=>1).Bag
:Ok: $(("-4" => 2, "2" => 2).Seq)

$ ./two-out-of-tree -v "a b c" "d e a" "b"
:All: ("c"=>1,"b"=>2,"d"=>1,"e"=>1,"a"=>2).Bag
:Ok: $((:b(2), :a(2)).Seq)

Negative values does not quite work out:

$ ./two-out-of-tree "-1 2 3" "4 5 6" "1 001"
  ./two-out-of-tree [-v|--verbose[=Any]] <one> <two> <three>

$ ./two-out-of-tree "1 -2 3" "4 5 6" "1 001"

So words may not be that great a choice. Let us do it the hard way:

File: two-out-of-tree-split (partial)
my @array1 = $one.split(/\s+/);
my @array2 = $two.split(/\s+/);
my @array3 = $three.split(/\s+/);

This does not help, giving the exact same result as above. So it is the shell that is the problem; not the program. (Using single quotes instead of double quotes does not help.)

The solution: Do not specify a negative value as the first value in the first string. (This is ok as the order does not matter, but it will cause problems if we were to use only negative values.)

Oh, well. Another rabbit hole...

And that's it.