This article has been moved from «perl6.eu» and updated to reflect the language rename in 2019.

This is my response to the Perl Weekly Challenge #022.

Write a script to print first **10 Sexy Prime Pairs**. Sexy primes are prime numbers
that differ from each other by 6. For example, the numbers 5 and 11 are both sexy primes,
because **11 - 5 = 6**. The term “**sexy prime**” is a pun stemming from the Latin
word for six: sex. For more information, please checkout wiki
page.

We need primes, and this is the easiest way of obtaining them, as an ordered sequence:

my $primes := (1 .. Inf).grep(*.is-prime); # [1]

[1] Using the `is-prime`

method to sort out the
primes.

See
docs.raku.org/routine/is-prime for more information about `is-prime`

.

If you have followed the Perl Weekly Challenge (and especially my solutions) for some time, this is familiar. If not take a look at one of the following articles:

- Perfect Indentation with Raku (Challenge 8)
- The Euclid Path with Raku (Challenge 12)
- Prime Vigenere and Raku (Challenge 15)
- Amicable Split with Raku (Challenge #20)

Now the actual challenge, which is quite easy given the Prime Sequence above:

File: sexy-prime-pairsmy $primes := (1 .. Inf).grep(*.is-prime);
my $sexy-primes := $primes.grep({ ($_ + 6).is-prime }); # [1]
say "{ $_ +1 }: ($sexy-primes[$_],{ $sexy-primes[$_] +6 })" for ^10; # [2]

[1] We start with the primes, and use «grep» to select only those where the value + 6
*also is a prime*.

[1] The sequence gives the first number in the pairs, so we have to do the rest. manually.

Running it gives the same result as the wiki page:

$ raku sexy-prime-pairs
1: (5,11)
2: (7,13)
3: (11,17)
4: (13,19)
5: (17,23)
6: (23,29)
7: (31,37)
8: (37,43)
9: (41,47)
10: (47,53)

It is possible to combine the two sequences:

File: sexy-prime-pairs2my $sexy-primes := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime });
say "{ $_ + 1 }: ($sexy-primes[$_],{ $sexy-primes[$_] + 6 })" for ^10;

The result is more compact, but not quite as readable. The output is the same.

unit sub MAIN (Int $count = 10);
my $sexy-primes := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime });
say "{ $_ + 1 }: ($sexy-primes[$_],{ $sexy-primes[$_] + 6 })" for ^$count;

It is easy to get the seqence to return both values in the pairs:

File: sexy-prime-pairs4```
unit sub MAIN (Int $count = 10);
my $sexy-prime-pairs := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime }) \
.map({( $_, $_ + 6 )});
say "{ $_ + 1 }: $sexy-prime-pairs[$_]" for ^$count;
```

All it took was a `map`

that returns the values as a list.

See
docs.raku.org/routine/map for more information about `map`

.

The output isn't exactly the same, though the values are:

1: 5 11
2: 7 13
3: 11 17
4: 13 19
5: 17 23
6: 23 29
7: 31 37
8: 37 43
9: 41 47
10: 47 53

It is easy to fix the output:

File: sexy-prime-pairs5 (changes only)```
say "{ $_ + 1 }: ({ $sexy-prime-pairs[$_].join(",") })" for ^$count;
```

The wikipedia article also describes «Sexy prime triplets», which it defines as «*triplets
of primes (p, p + 6, p + 12) such that p + 18 is composite*. A Composite number is defined
like this: «*A composite number n is a positive integer n>1 which is not prime* (by
mathworld.wolfram.com).

unit sub MAIN (Int $count = 10);
my $sexy-prime-triplets := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime \
&& ($_ + 12).is-prime && ! ($_ + 18).is-prime });
say "{ $_ + 1 }: ($sexy-prime-triplets[$_],{ $sexy-prime-triplets[$_] + 6 }, \
{ $sexy-prime-triplets[$_] + 12 })" for ^$count;

Running it gives the same result as the wiki page:

$ raku sexy-prime-triplets
1: (7,13,19)
2: (17,23,29)
3: (31,37,43)
4: (47,53,59)
5: (67,73,79)
6: (97,103,109)
7: (101,107,113)
8: (151,157,163)
9: (167,173,179)
10: (227,233,239)

We should perhaps fix the sequence here as well:

File: sexy-prime-triplets2unit sub MAIN (Int $count = 10);
my $sexy-prime-triplets := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime &&
($_ + 12).is-prime && ! ($_ + 18).is-prime }).map({( $_, $_ + 6, $_ + 12 )});
say "{ $_ + 1 }: ({ $sexy-prime-triplets[$_].join(",") })" for ^$count;

unit sub MAIN (Int $count = 10);
my $sexy-prime-quadruplets := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime
&& ($_ + 12).is-prime && ($_ + 18).is-prime });
say "{ $_ + 1 }: ($sexy-prime-quadruplets[$_], \
{ $sexy-prime-quadruplets[$_] + 6 }, \
{ $sexy-prime-quadruplets[$_] + 12 }, \
{ $sexy-prime-quadruplets[$_] + 18 })"
for ^$count;

Running it gives the same result as the wiki page:

1: (5,11,17,23)
2: (11,17,23,29)
3: (41,47,53,59)
4: (61,67,73,79)
5: (251,257,263,269)
6: (601,607,613,619)
7: (641,647,653,659)
8: (1091,1097,1103,1109)
9: (1481,1487,1493,1499)
10: (1601,1607,1613,1619)

Here it is with the sequence fix applied:

File: sexy-prime-quadruplets2unit sub MAIN (Int $count = 10);
my $sexy-prime-quadruplets := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime
&& ($_ + 12).is-prime && ($_ + 18).is-prime }).map({( $_, $_ + 6, $_ + 12,
$_ + 18)});
say "{ $_ + 1 }: ({ $sexy-prime-quadruplets[$_].join(",") })" for ^$count;

`(5,11,17,23,29)`

.
File: sexy-prime-ququintuplets
say "(5,11,17,23,29)";

The reason is quite clever: By adding 6 to any number, in a loop 5 times, one of the values must end with 5 or 0. (As we add 5 + 1, and do it 5 times.) A value ending with the digit 0 is not a prime (as it is either zero or divisible by e.g. 10), and if it ends with 5 it is divisible by 5 - unless it is the number 5 itself, which is a prime. So 5 must be in the ququintuplet, and it has to be the first value.

Write a script to implement **Lempel–Ziv–Welch (LZW)** compression algorithm. The script
should have method to encode/decode algorithm. The wiki
page explains
the compression algorithm very nicely.

There is two ways to do this:

- Hard code the legal alphabet in the algorithms (both the encoder and decoder)
- Deduce the alphabet from the input string, and find a way to forward it to the decoder

I have written «encode» and «decode» procedures, but will show the program MAIN wrappers first:

File: lzw (partial)multi sub MAIN (Str $string) # [1]
{
my ($alphabet, $encoded) = encode($string); # [2]
say "'" ~ $alphabet ~ "' $encoded" ; # [2]
}
multi sub MAIN (Str $alphabet, Str $binary) # [3]
{
say decode($alphabet, $binary); # [4]
}

[1] If we specify a single string as argument to the program, it encodes it.

[2] The encoder returns the alphabet and the encoded string (as a binary number).

[3] If we specify a string and a binary number as argument to the program, it decodes it.

[4] The decoder returns the original string, if we pass it the alphabet and the binary number (with any leading zeroes) we got from the encoder.

The encoder looks like this:

File: lzw (partial)sub encode (Str $string) # [1]
{
my @alphabet = ('#', $string.comb).flat.unique; # [2]
my @dict = @alphabet; # [3]
my %dict = @alphabet.kv.reverse; # [4]
my @chars = $string.comb; # [5]
my $binary = ""; # [6]
while @chars # [7]
{
my $char = @chars.shift; # [8]
last if $char eq "#"; # [9]
$char ~= @chars.shift while @chars.elems && %dict{$char ~ @chars[0]}; # [10]
my $bits = @dict.elems.base(2).chars; # [11]
my $fmt = '%0' ~ $bits ~ 'b'; # [12]
my $add = %dict{$char}.fmt($fmt); # [13]
$binary ~= $add; # [14]
if @chars.elems && @chars[0] ne "#" # [15]
{
@dict.push($char ~ @chars[0]); # [16]
%dict{$char ~ @chars[0]} = @dict.end; # [17]
}
}
return @alphabet.join, $binary; # [18]
}

[1] Starting with the input string.

[2] The alphabet (as an array) is the unique characters in the input string.
Ensure that the "#" character (the *stop character*) is the first
element in the array.

[3] A copy, where we add the *multi letter* sequences later on.

[4] A reverse lookup. Use the letter (or multi letter string) as lookup, and get the index - which is the value to emit. «kv» on an array gives the keys and values as pairs. Applying «reverse» on it swaps the keys and values, so the resulting hash has the old values as keys, and the keys as values.

[5] The input string as an array of single character, as it is more work to use substrings.

[6] The output, as a sequence of binary digits.

[7] As long as we have any more characters in the input;

[8] • get the first one.

[9] • exit the loop if the character is "#" (the *stop character*).

[10] Add the next character in the string, as long as the resulting string is present in the dictionary. (Note that «$char» starts with a single character, but can hold several as a result of this while loop.)

[11] Get the number of bits (binary digits) needed to encode the current character,

[12] and use that value as argument to the «fmt» method to get the correct number of digits, with zeros added at the beginning if required.

[13] Get the decimal value from the dictionary, convert it to binary,

[14] and add it to the end of the binary number (or rather, string).

[15] If we have more elements in the input (and it isn't the stop character),

[16] add the string with the first unused character to the dictionary,

[17] and the reverse lookup.

[18] Return the alphabet and the binary string.

Note that this algorithm works with or without using
the *stop character* to end the string.

It is tempting to convert the binary number to decimal and output that one - as it is shorter. But that would play havoc with the decoder, as any leading zeroes in the encoded number would be missing (as leading zeroes are not really a thing in numbers). And they are vital for us, as we count digits from the left. Missing leading zeroes will break the decoder. (Flipping the number before converting it to decimal in the encoder, and again after converting to binary in the decoder would fix the leading zero problem, but would introduce a trailing zero problem instead.) So if we were to convert this binary number in any way, it would have to keep any leading zeroes. One way could be to grab 8 bits at a time, and convert them to an 8-bit number (in the range 0-255).

The decoder is shorter:

File: lzw (partial)sub decode (Str $alphabet, Str $binary is copy) # [1]
{
my @alphabet = $alphabet.comb; # [2]
@alphabet.unshift('#') unless @alphabet[0] eq "#"; # [3]
my $prev; # [4]
my $output; # [5]
while $binary.chars # [6]
{
my $size = (@alphabet.elems +1).base(2).chars; # [7]
my $current = $binary.substr(0, $size); # [8]
$binary.substr-rw(0, $size) = ""; # [9]
my $value = @alphabet[$current.parse-base(2)]; # [10]
@alphabet.push($prev ~ $value) if $prev.defined; # [11]
$output ~= $value; # [12]
$prev = $value; # [13]
}
return $output; # [14]
}

[1] We start with the alphabet, and the binary string.

[2] Get the alphabet as an array.

[3] Add «#» as the first element in the alphabet, if not there already. (The encoder places it here for us, so this isn't really necessary.)

[4] The value in the previous iteration, initially empty.

[5] The output string.

[6] While we have binary characters left in the input;

[7] • get the number of binary digits for each character (or character string). This is the number of elemens in the alphabet +1 (to compensate for the fact that we do the computation before adding the value to the alphabet).

[8] • get the correct number of binary digits,

[9] • and remove them from the input.

[10] • convert the binary number to decimal, and get the character (or string) by looking it up in the alphabet (with this number as index).

[11] If we have a previous value (one that is in the alphabet), add the next one.

[12] Add the new value (one or more characters) to the end of the result.

[13] Set up the previous value to hold the current value; ready for the next iteration.

[14] Return the string.

Running it to see that it round trips:

$ raku lzw aslakaslakQQl
'#aslkQ' 0010100011000101000110100001000101010100011
$ raku lzw '#aslkQ' 0010100011000101000110100001000101010100011
aslakaslakQ

Note that you have to quote the alphabet string,
so that the shell doesn't choke on the `#`

character. The encoder prints
quotes around it to be nice.

Running the encoder with the verbose flag:

$ raku lzw-verbose --verbose aslakaslakQQl
Dict: # <=> 0 (alphabet)
Dict: a <=> 1 (alphabet)
Dict: s <=> 2 (alphabet)
Dict: l <=> 3 (alphabet)
Dict: k <=> 4 (alphabet)
Dict: Q <=> 5 (alphabet)
Out: a -> 1 -> 001
Dict: as <=> 6
Out: s -> 2 -> 010
Dict: sl <=> 7
Out: l -> 3 -> 0011
Dict: la <=> 8
Out: a -> 1 -> 0001
Dict: ak <=> 9
Out: k -> 4 -> 0100
Dict: ka <=> 10
Out: as -> 6 -> 0110
Dict: asl <=> 11
Out: la -> 8 -> 1000
Dict: lak <=> 12
Out: k -> 4 -> 0100
Dict: kQ <=> 13
Out: Q -> 5 -> 0101
Dict: QQ <=> 14
Out: Q -> 5 -> 0101
Dict: Ql <=> 15
Out: l -> 3 -> 00011
'#aslkQ' 0010100011000101000110100001000101010100011

We can see that 3 bits is enough to encode the initial alphabet. When «sl» has been added it switches to 4 bits, and after «Ql» we get 5 bits.

Running the decoder with the verbose flag:

$ raku lzw-verbose --verbose "#aslkQ" 0010100011000101000110100001000101010100011
Dict: # <=> 0 (alphabet)
Dict: a <=> 1 (alphabet)
Dict: s <=> 2 (alphabet)
Dict: l <=> 3 (alphabet)
Dict: k <=> 4 (alphabet)
Dict: Q <=> 5 (alphabet)
In: 001 -> 1 -> a
Dict: as <=> 6
In: 010 -> 2 -> s
Dict: sl <=> 7
In: 0011 -> 3 -> l
Dict: la <=> 8
In: 0001 -> 1 -> a
Dict: ak <=> 9
In: 0100 -> 4 -> k
Dict: kas <=> 10
In: 0110 -> 6 -> as
Dict: asla <=> 11
In: 1000 -> 8 -> la
Dict: lak <=> 12
In: 0100 -> 4 -> k
Dict: kQ <=> 13
In: 0101 -> 5 -> Q
Dict: QQ <=> 14
In: 0101 -> 5 -> Q
Dict: Ql <=> 15
In: 00011 -> 3 -> l
aslakaslakQQl

my @alphabet = ("#", "A" ... "Z", "a" ... "z", 0 .. 9).flat;

Then remove the «$alphabet» variables in the MAIN wrappers:

multi sub MAIN (Str $string, :$verbose)
{
say encode($string, :$verbose);
}
multi sub MAIN Str $binary, :$verbose)
{
say decode($binary, :$verbose);
}

Then remove the first line in «encode» (the declaration of «@alphabet») and change the last line to this:

return $binary;

«decode» is a little harder. Change the top (above the «if $verbose» line) to:

sub decode (Str $binary is copy, :$verbose)
{
my @dict = @alphabet

Then rename «@alphabet» to «@dict» in the rest of the procedure.

And that's it.