This is my response to the Perl Weekly Challenge #081.
$A
and $B
.
$A
and $B
.
$S
is called base string if repeated
concatenation of the substring results in the string.
Input:
$A = "abcdabcd"
$B = "abcdabcdabcdabcd"
Output:
("abcd", "abcdabcd")
Example 2
Input:
$A = "aaa"
$B = "aa"
Output:
("a")
I'll dive straight in:
File: common-base-string
unit sub MAIN (Str $A where $A.chars > 0, Str $B where $B.chars > 0, # [1]
:v(:$verbose));
die "Different characters in A and B"
unless $A.comb.unique.sort.join eq $B.comb.unique.sort.join; # [2]
my $a = $A.chars; # [3]
my $b = $B.chars; # [3a]
my $unique = $A.comb.unique.elems; # [4]
my @cbs; # [5]
for $unique .. min($a, $b) -> $length # [6]
{
my $c = $A.substr(0, $length); # [7]
say ": Length: $length -> $c" if $verbose;
@cbs.push: $c if $c x ($a / $length) eq $A && $c x ($b / $length) eq $B;
} # [8]
say '(' ~ @cbs.map({ '"' ~ $_ ~ '"' }).join(", ") ~ ')' if @cbs; # [9]
[1] The where
clauses are there to prevent empty strings.
[2] Get a list of unique charcters in each input string, sorted, and joined back into a string. They must be equal.
[3] The number of characters in the input strings.
[4] The number of unique characters.
[5] We are going to store the result here.
[6] The length of the common base string can be as low as 1, but no larger than the smallest of length of the two input strings. It cannot be as low as 1 if there are more than one unique character in the strings, so we start at the nymber of unique characters. This is possibly too small (e.g. «AAB» and «AABAABAAB» gives 2, but 3 is the correct lower limit), but it is better than starting at 1. Iterate over the values.
[7] • Get the specified number of characters from one of the strings (from the start).
[8] • Add the substring to the result if it is a common base
string of both input strings. Note tha usage of the string repetition
operator x
.
[9] Print the result, on the requested form. Note that nothing is printed if
there is no result. The map
places the values in(side) double
quotes, and the join
adds the commas between them.
See
docs.raku.org/routine/x for more
information about the string repetition operator x
.
Running it:
$ ./common-base-string abcdabcd abcdabcdabcdabcd
("abcd", "abcdabcd")
$ ./common-base-string aaa aa
("a")
$ ./common-base-string abababab abababab
("ab", "abab", "abababab")
Looking good.
We can sipmlify (and dumbify) the
program by removing the $unique
smartness. Simply replace the
variable with the value 1
in [6] (and delete line [4]).
input
.
West Side Story
The award-winning adaptation of the classic romantic tragedy "Romeo and
Juliet". The feuding families become two warring New York City gangs,
the white Jets led by Riff and the Latino Sharks, led by Bernardo. Their
hatred escalates to a point where neither can coexist with any form of
understanding. But when Riff's best friend (and former Jet) Tony and
Bernardo's younger sister Maria meet at a dance, no one can do anything
to stop their love. Maria and Tony begin meeting in secret, planning to
run away. Then the Sharks and Jets plan a rumble under the
highway--whoever wins gains control of the streets. Maria sends Tony to
stop it, hoping it can end the violence. It goes terribly wrong, and
before the lovers know what's happened, tragedy strikes and doesn't
stop until the climactic and heartbreaking ending.
NOTE
. " ( ) , 's --
OUTPUT
1 But City It Jet Juliet Latino New Romeo Side Story Their Then West York
adaptation any anything at award-winning away become before begin best
classic climactic coexist control dance do doesn't end ending escalates
families feuding form former friend gains gangs goes happened hatred
heartbreaking highway hoping in know love lovers meet meeting neither no
one plan planning point romantic rumble run secret sends sister streets
strikes terribly their two under understanding until violence warring
what when where white whoever wins with wrong younger
2 Bernardo Jets Riff Sharks The by it led tragedy
3 Maria Tony a can of stop
4 to
9 and the
We can use words
to split the text into words, after
reading in the file with IO.slurp
, like this:
> "input.txt".IO.slurp.words.raku
("West", "Side", "Story", "The", "award-winning", ..., "\"Romeo", "and", \
"Juliet\".", ..., "(and", ..., "secret,", ..., "highway--whoever", ..., \
"ending.").Seq
Using slurp
as a method works fine, but we must access it through
an IO
object. We can also use it as a function, without the
IO
object, like this:
slurp "input.txt"
But precedence rules will cause problem, so we have to use parens if we want to stack on more method calls:
(slurp "input.txt").words.raku
I like the parens-free version better.
See
docs.raku.org/routine/words for
more information about words
.
See
docs.raku.org/routine/slurp for
more information about words
.
Note that words
will only
split on whitespace (the \s
regex). So we must take care of
the punctuation characters et al manually. (I have used the raku
method
to make it easier to see the word boundaries.) The output has been abridged.
Fixing the input before calling words
is probably the best
approach.
#! /usr/bin/env raku
unit sub MAIN ($file where $file.IO.e && $file.IO.r = "input.txt"); # [1]
my $content = $file.IO.slurp # [2]
.trans(/<[."(),]>/ => ' ') # [3]
.subst("'s", " ", :global) # [4]
.subst("--", " ", :global); # [5]
my %freq = $content.words.Bag; # [6]
my @freq; # [7]
for %freq.keys.sort -> $word # [8]
{
@freq[%freq{$word}] ~= $word ~ " " # [8a]
}
for @freq.keys -> $freq # [9]
{
say "$freq " ~ @freq[$freq] if @freq[$freq]; # [9a]
}
[1] I have chosen to give the input file a «.txt» filename extension, and made it possible for the user to specify another file.
[2] Read the entire file in one go.
[3] Use the trans
method to get rid of the single character
sequences that we should ignore. We replace the offending charcters with a space, so that we
do not mess up the word boundaries by accident. (There are no probles with the given text,
but it is nevertheless a good idea to take extra care here.)
[4] We can use the subst
(substitute) method to replace
strings. Here we get rid of 's
. Note the use of the :global
flag to ensure that all occurences will be replaced; three in this file. The default is the
first one only.
[5] As above, but for --
. Here it is actually important to replace it with
a space, so that we end up with two words.
[6] We turn the modified string into a list of words, and then coerce that
list into a Bag (with Bag
). A Bag is specalised version of a hash, where the
keys are the values in the list we apply it on here, and the values are the frequency they
occur. Ecactly what we want, albeit on a form not quite suitable for the output we want.
[7] So we are going to build up the lists for the different word lengths here. I could have added the words as a list (giving a two-dimentional array), but it is easier to work with strings as that is what we are going to print anyway.
[8] Iterate over the words, in lexicographically sorted order, and add them to the correct entry accoring to the word length [8a].
[9] Iterate over the array of word lengths, and print the words if any [9a]. (If we
insert a value at e.g @freq[9]
, all the entries below 9 will magically pop
into existence (with an undefined value). So we have to skip the unused ones.
See
docs.raku.org/routine/trans for
more information about trans
.
See
docs.raku.org/routine/subst for
more information about subst
.
See
docs.raku.org/routine/Bag for
more information about the Bag
method and
docs.raku.org/type/Bag
more information about the Bag
type.
Running it:
./frequency-sort
1 But City It Jet Juliet Latino New Romeo Side Story Their Then West York
adaptation any anything at award-winning away become before begin best
classic climactic coexist control dance do doesn't end ending escalates
families feuding form former friend gains gangs goes happened hatred
heartbreaking highway hoping in know love lovers meet meeting neither no
one plan planning point romantic rumble run secret sends sister streets
strikes terribly their two under understanding until violence warring
what when where white whoever wins with wrong younger
2 Bernardo Jets Riff Sharks The by it led tragedy
3 Maria Tony a can of stop
4 to
9 and the
Looking good.
We can replace the two for
loops with more compact code.
Still for
loops, but postfix versions:
@freq[%freq{$_}] ~= $_ ~ " " for %freq.keys.sort;
say "$_ { @freq[$_] }" for @freq.keys.grep({ @freq[$_] });
It is possible to use map
instead:
#! /usr/bin/env raku
unit sub MAIN ($file where $file.IO.e && $file.IO.r = "input.txt");
my %freq = $file.IO.slurp
.trans(/<[."(),]>/ => ' ')
.subst("'s", " ", :global)
.subst("--", " ", :global)
.words.Bag;
my @freq;
%freq.keys.sort.map({ @freq[%freq{$_}] ~= $_ ~ " " });
@freq.keys.grep({ @freq[$_] }).map({ say "$_ { @freq[$_] }" });
I combined the two first my
lines as well. The third one (now the second one)
really stands out, but there is no easy way of getting rid of that one.
map
is basically a for
loop in disguise, and in my view a little harder to read. But that really is a matter
of taste, and exposure.
And that's it.