This is my response to the Perl Weekly Challenge #099.
$S
and a pattern $P
.
Input: $S = "abcde" $P = "a*e"
Output: 1
Example 2:
Input: $S = "abcde" $P = "a*d"
Output: 0
Example 3:
Input: $S = "abcde" $P = "?b*d"
Output: 0
Example 4:
Input: $S = "abcde" $P = "a*c?e"
Output: 1
The «?» and «*» characters are Shell Metacharacters (see e.g. flylib.com/books/en/4.356.1.126/1/) used by shells (e.g. bash) and other programs (e.g. grep, sed and awk) in a Unix (and Unix-like) environment. Raku and Perl does not support them, so we have to translate them, like this:
Shell Metacharacter | Raku/Perl Regex |
---|---|
* | .* |
? | . |
The challenge does not say what to do about other special characters (recognised as regexes by Raku and Perl), so I'll ignore that for now.
File: pattern-match
#! /usr/bin/env raku
unit sub MAIN (Str $S where $S.chars > 0, # [1]
Str $P is copy where $P.chars > 0, # [1a]
:v(:$verbose));
$P.=trans( [ '*', '?' ] => [ '.*' , '.' ]); # [2]
say ": Regex: $P " if $verbose;
say $S ~~ /^ <$P> $/ ?? 1 !! 0 # [3]
[1] Ensure that both strings have at last one character.
[2] Use trans
to translate the substrings
on the left side of the arrow to the ones on the right side. Note that
trans
does not change the string itself, but returns the modified
version. So we use .=
to assign the value back to the variable.
Also note the is copy
in [1a] so that we can change the value.
[3] Apply the regex. Note the <
and >
around
the variable telling Raku to treat regex metacharacters in the variable as
such. The anchors ensure that we match the whole string (beginning:
^
, end: $
). Using it without them would treat the
variable as literal text.
See
docs.raku.org/routine/trans
for more information about trans
.
Running it:
$ ./pattern-match abcde "a*e"
1
$ ./pattern-match abcde "a*d"
0
$ ./pattern-match abcde "?b*d"
0
$ ./pattern-match -v abcde "a*c?e"
1
With verbose mode, to show the underlying regex:
$ ./pattern-match -v abcde "a*e"
: Regex: a.*e
1
$ ./pattern-match -v abcde "a*d"
: Regex: a.*d
0
$ ./pattern-match -v abcde "?b*d"
: Regex: .b.*d
0
$ ./pattern-match -v abcde "a*c?e"
: Regex: a.*c.e
1
Looing good.
#! /usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use Getopt::Long;
my $verbose = 0;
GetOptions("verbose" => \$verbose);
my $S = shift(@ARGV);
my $P = shift(@ARGV);
die '"$S" must have length' unless length $S;
die '"$P" must have length' unless length $P;
$P =~ s/\*/.*/g; # [1]
$P =~ s/\?/./g; # [1]
say ": Perl Regex: $P " if $verbose;
say $S =~ /^$P$/ ? 1 : 0; # [2]
[1] Perl does not have a trans
routine, but we can use s///
.
Twice, once for each character sequence.
[2] The varible is interpolated, and treated as a regex if it contains any (as it
does). If we had written it like this in Raku, any metacharcters would have been qouted.
We can achieve the in Perl with quotemeta
. But we do want to treat them
as metacharacters, so this does not apply here.
Running it gives the same result as the Raku version:
$ ./pattern-match-perl -v abcde "a*e"
: Perl Regex: a.*e
1
$ ./pattern-match-perl -v abcde "a*d"
: Perl Regex: a.*d
0
$ ./pattern-match-perl -v abcde "?b*d"
: Perl Regex: .b.*d
0
$ ./pattern-match-perl -v abcde "a*c?e"
: Perl Regex: a.*c.e
1
$S
and $T
.
$T
without changing the position of characters.
Input: $S = 'littleit', $T = 'lit'
Output: 4
1: [lit] tleit
2: [li] t [t] leit
3: [li] ttlei [t]
4: litt [l] e [it]
Example 2:
Input: $S = 'london', $T = 'lon'
Output: 3
1: [lon] don
2: [lo] ndo [n]
3: [l] ond [on]
Let us construct a bit mask (a binary number) that tells if the corresponding character should be included in the result. This is a brute force approach, but that should be ok here.
Here is an illustration of the concept, for the third match of the second example:
The character is included if the corresponding bitmap value is 1
.
In this case we have six letters, so the task is to construct the binary numbers
from 0
(or really 000000
) to 111111
.
The upper limit is easy(ish), with the String Repetition Operator
x
:
> say '1' x 'london'.chars; # -> 111111
> say '1' x 'littleit'.chars; # -> 11111111
See
docs.raku.org/routine/x
for more information about x
.
Binary numbers are tedious to work with, so we switch back to decimal
(with parse-base(2)
) for the loop, and convert the loop value back (with
fmt
, so that we can zero-pad the value).
> my $length = 'london'.chars;
6
> my $binary = '1' x $length;
111111
> my $max = $binary.parse-base(2);
63
> say $_.fmt('%0' ~ $length ~ 'b') for 0 .. $max;
000000
000001
000010
000011
...
111100
111101
111110
111111
See
docs.raku.org/routine/parse-base
for more information about parse-base
.
See
docs.raku.org/routine/fmt
for more information about fmt
.
The program is almost trivial now:
File: unique-subsequence
#! /usr/bin/env raku
unit sub MAIN (Str $S where $S.chars > 0, # [1]
Str $T where $T.chars > 0, # [1a]
:v(:$verbose));
my $S-length = $S.chars; # [2]
my $T-length = $T.chars; # [2a]
my $binary = '1' x $S-length; # [3]
my $max = $binary.parse-base(2); # [3a]
my $matches = 0; # [4]
for 1 .. $max -> $current # [5]
{
my $mask = $current.fmt('%0' ~ $S-length ~ 'b'); # [6]
if $mask.comb.sum != $T-length # [7]
{
say ": Skipped binary mask '{ $mask }' - wrong number of 1s" if $verbose;
next;
}
my $candidate = (^$S-length).map({ $mask.substr($_, 1) eq '1'
## 8 ###### # 8a ####### # 8b ##########################
?? $S.substr($_,1) !! '' }).join;
# 8c ############# # 8d ### 8e ##
if $candidate eq $T # [9]
{
$matches++; # [9a]
say ": + Match found with binary mask '{ $mask }'." if $verbose;
}
else
{
say ": Considering binary mask '{ $mask}' - no match" if $verbose;
}
}
say $matches;
[1] Ensure that both string have at last one character.
[2] The lengths of the two strings.
[3] The binary flag, with the highest value, and the decimal counterpart [3a].
[4] The number of matches.
[5] Iterating from 1 to the decimal equivalence of the maximum binary value.
[6] Convert the decimal value to binary. The format expression
'%0' ~ $S-length ~ 'b'
ensures that the string has at least
$S-length
characters, and the leading zero indicates that any missing
length should be compensated by leading zeroes. The 'b'
part tells
us to print the binary version of the number.
[7] This shortcut speeds up the program quite a bit, but isn't stricly necessary.
It skips the rest of the current iteration if the number of 1
s in the binary
mask is different from the length of $T, as we cannot get a match in that case.
[8] The candidate string given the current binary mask. We start with all the indices
of the string $S [8a], and use map
to get at the corresponding (= with
the same index) binary digit in the current bitmask [8b]. If the digit is 1
,
we keep the character form $S [8c]. If not, we ignore it (by returning an empty string)
[8d]. Dinally, we take that list of characters (including empty strings) and join them
together to a single string [8e].
[9] Have we found $T? If so increase the counter [9a].
Running it:
$ ./unique-subsequence littleit lit
5
$ ./unique-subsequence london lon
3
The first one differs from the given result in the challenge. So what is going on? Verbose mode to the rescue. I have filtered out the "Match found" lines, as the rest really is noise:
$ ./unique-subsequence -v littleit lit | grep +
: + Match found with binary mask '00001011'. # 4
: + Match found with binary mask '10000011'. # ?
: + Match found with binary mask '11000001'. # 3
: + Match found with binary mask '11010000'. # 2
: + Match found with binary mask '11100000'. # 1
$ ./unique-subsequence -v london lon | grep +
: + Match found with binary mask '100011'. # 3
: + Match found with binary mask '110001'. # 2
: + Match found with binary mask '111000'. # 1
The numbers at the end are the corresponding entries in the challenge. The one marked with «?» is new. It clerly is a valid combination, so the challenge got it wrong. The answer is 5 for the first example. (The second example is correct, though.)
Note that the challenge has been updated with the fifth match. I checked after writing this section.
#! /usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use Getopt::Long;
use List::Util qw(sum);
my $verbose = 0;
GetOptions("verbose" => \$verbose);
my $S = shift(@ARGV);
my $T = shift(@ARGV);
die '"$S" must have length' unless length $S;
die '"$T" must have length' unless length $T;
my $S_length = length $S;
my $T_length = length $T;
my $binary = '1' x $S_length;
my $max = oct('0b' . $binary);
my $matches = 0;
for my $current (1 .. $max)
{
my $mask = sprintf("%0" . $S_length ."b", $current);
if (sum(split(//, $mask)) != $T_length)
{
say ": Skipped binary mask '{ $mask }' - wrong number of 1s" if $verbose;
next;
}
my $candidate = join("", map { substr($mask, $_, 1) eq '1'
? substr($S, $_,1) : '' } (0 .. $S_length -1));
if ($candidate eq $T)
{
$matches++;
say ": + Match found with binary mask '$mask'." if $verbose;
}
else
{
say ": Considering binary mask ' $mask' - no match" if $verbose;
}
}
say $matches;
Running it gives the same result as the Raku version:
$ ./unique-subsequence-perl -v littleit lit | grep +
: + Match found with binary mask '00001011'.
: + Match found with binary mask '10000011'.
: + Match found with binary mask '11000001'.
: + Match found with binary mask '11010000'.
: + Match found with binary mask '11100000'.
$ ./unique-subsequence-perl -v london lon | grep +
: + Match found with binary mask '100011'.
: + Match found with binary mask '110001'.
: + Match found with binary mask '111000'.
And that's it.