This is my response to The Weekly Challenge #370.
Input: $paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
@banned = ("hit")
Output: "ball"
After removing punctuation and converting to lowercase, the word "hit"
appears 3 times, and "ball" appears 2 times.
Since "hit" is on the banned list, we ignore it.
Example 2:
Input: $paragraph = "Apple? apple! Apple, pear, orange, pear, apple, orange."
@banned = ("apple", "pear")
Output: "orange"
"apple" appears 4 times.
"pear" appears 2 times.
"orange" appears 2 times.
"apple" and "pear" are both banned.
Even though "orange" has the same frequency as "pear", it is the only
non-banned word with the highest frequency.
Example 3:
Input: $paragraph = "A. a, a! A. B. b. b."
@banned = ("b")
Output: "a"
"a" appears 4 times.
"b" appears 3 times.
The input has mixed casing and heavy punctuation.
The normalised, "a" is the clear winner, since "b" is banned, "a" is
the only choice.
Example 4:
Input: $paragraph = "Ball.ball,ball:apple!apple.banana"
@banned = ("ball")
Output: "apple"
Here the punctuation acts as a delimiter.
"ball" appears 3 times.
"apple" appears 2 times.
"banana" appears 1 time.
Example 5:
Input: $paragraph = "The dog chased the cat, but the dog was faster than the cat."
@banned = ("the", "dog")
Output: "cat"
"the" appears 4 times.
"dog" appears 2 times.
"cat" appears 2 times.
"chased", "but", "was", "faster", "than" appear 1 time each.
"the" is the most frequent but is banned.
"dog" is the next most frequent but is also banned.
The next most frequent non-banned word is "cat".
#! /usr/bin/env raku
unit sub MAIN ($paragraph where $paragraph.chars > 0, # [1]
*@banned where @banned.elems > 0, # [2]
:a(:$all), # [3]
:v(:$verbose) = $all); # {3a]
my $words = $paragraph.split(/<[\s\,\.\!]>/).grep({ .chars > 0 })>>.lc.Bag;
# {4]
my %banned = @banned.map: * => 1; # [5]
my $match; # [6]
for $words.sort({ $^b.value <=> $^a.value }) -> $word # [7]
{
say ": word: '{ $word.key }' count: { $words{$word.key} } \
{ %banned{$word.key} ?? "- banned" !! "- ok" }" if $verbose;
next if %banned{$word.key}; # [8]
$match = $word.key unless $match; # [9]
last unless $all; # [3b]
}
say $match; # [10]
[1] A paragraph, with at least one character.
[2] A slurp array of banned words, with at last one of them.
[3] «All» mode enables «verbose» mode as well. It is used to print all the words, even after a match has been found.
[4]
We cannot do the split with words, as that one retains punctuation
characters. So a custom regex it is. We get rid of empty matches with grep (as
we will get that by two punctuation characters after each other). Then we turn the liberated
words into lowercase with lc and finally turn the whole mess into a Bag, a hash
like structure that counts the occurence of the (key)words.
See docs.raku.org/routine/words for more information about words.
See docs.raku.org/routine/Bag for more information about Bag.
[5] Set up a hash with the banned words, for easy lookup.
[6] The match, if any, will end up here.
[7] Iterate over the Bag content, sorted by the occurence and the highest one first (as we have swapped the order of the sort placeholders). This gives us the most frequent word first.
[8] Skip the word if it is banned.
[9] We have a match (if not set already).
[10] Print the result.
Running it:
$ ./popular-word "Bob hit a ball, the hit BALL flew far after it was hit." \
"hit"
ball
$ ./popular-word "Apple? apple! Apple, pear, orange, pear, apple, orange." \
"apple" "pear"
orange
$ ./popular-word "A. a, a! A. B. b. b." b
a
$ ./popular-word 'Ball.ball,ball:apple!apple.banana' ball
apple
$ ./popular-word "The dog chased the cat, but the dog was faster than the \
cat." the dog
cat
Looking good.
With verbose mode:
$ ./popular-word -a "Bob hit a ball, the hit BALL flew far after it was \
hit." "hit"
: word: 'hit' count: 3 - banned
: word: 'ball' count: 2 - ok
: word: 'far' count: 1 - ok
: word: 'the' count: 1 - ok
: word: 'after' count: 1 - ok
: word: 'flew' count: 1 - ok
: word: 'bob' count: 1 - ok
: word: 'was' count: 1 - ok
: word: 'a' count: 1 - ok
: word: 'it' count: 1 - ok
ball
$ ./popular-word -a "Apple? apple! Apple, pear, orange, pear, apple, \
orange." "apple" "pear"
: word: 'apple' count: 3 - banned
: word: 'pear' count: 2 - banned
: word: 'orange' count: 2 - ok
: word: 'apple?' count: 1 - ok
orange
$ ./popular-word -a "A. a, a! A. B. b. b." b
: word: 'a' count: 4 - ok
: word: 'b' count: 3 - banned
a
$ ./popular-word -a 'Ball.ball,ball:apple!apple.banana' ball
: word: 'ball' count: 2 - banned
: word: 'banana' count: 1 - ok
: word: 'ball:apple' count: 1 - ok
: word: 'apple' count: 1 - ok
banana
$ ./popular-word -a "The dog chased the cat, but the dog was faster than \
the cat." the dog
: word: 'the' count: 4 - banned
: word: 'dog' count: 2 - banned
: word: 'cat' count: 2 - ok
: word: 'faster' count: 1 - ok
: word: 'chased' count: 1 - ok
: word: 'than' count: 1 - ok
: word: 'but' count: 1 - ok
: word: 'was' count: 1 - ok
cat
Input: $str1 = "abc", $str2 = "acb"
Output: true
"abc"
split: ["a", "bc"]
split: ["a", ["b", "c"]]
swap: ["a", ["c", "b"]]
concatenate: "acb"
Example 2:
Input: $str1 = "abcd", $str2 = "cdba"
Output: true
"abcd"
split: ["ab", "cd"]
swap: ["cd", "ab"]
split: ["cd", ["a", "b"]]
swap: ["cd", ["b", "a"]]
concatenate: "cdba"
Example 3:
Input: $str1 = "hello", $str2 = "hiiii"
Output: false
A fundamental rule of scrambled strings is that they must be anagrams.
Example 4:
Input: $str1 = "ateer", $str2 = "eater"
Output: true
"ateer"
split: ["ate", "er"]
split: [["at", "e"], "er"]
swap: [["e", "at"], "er"]
concatenate: "eater"
Example 5:
Input: $str1 = "abcd", $str2 = "bdac"
Output: false
Recursion is the thing here, as hinbted at in the challenge text. The problem would be how to return the values, and we could do that by keeping track of the unchanged parts, both to the left and right of the current swap substring. But that requires adding a pre and a post variable (in addition to the actual string to scramble)- and that is not elegant.
The actual returns (so to speak) of the values are a good match for
gather/take, and I let each level take care of concatenating the
results from further calls, thus avoiding passing along left and right substrings.
#! /usr/bin/env raku
unit sub MAIN ($str1, # [1]
$str2 where $str1.chars == $str2.chars > 0, # [2]
:v(:$verbose));
my $anagrams = $str1.comb.sort.join eq $str2.comb.sort.join; # [3]
if $anagrams # [4]
{
my $scrambled := gather # [5]
{
scramble($str1);
}
for $scrambled -> $candidate # [8]
{
say ": Candidate: $candidate" if $verbose;
if $candidate eq $str2 # [7]
{
say True; # [7a]
exit; # [7b]
}
}
}
elsif $verbose # [4b]
{
say ": Not anagrams";
}
say False; # [8]
sub scramble ($str) # [9]
{
take $str; # [10]
return if $str.chars == 1; # [11]
for 1 .. $str.chars -1 -> $i # [12]
{
my @s1 = gather { scramble($str.substr(0, $i)); } # [13]
my @s2 = gather { scramble($str.substr($i)); } # [14]
for @s1 -> $s1 # [15]
{
take $s1 ~ $str.substr($i); # [16]
take $str.substr($i) ~ $s1; # [17]
for @s2 -> $s2 # [18]
{
take $s1 ~ $s2; # [19]
take $s2 ~ $s1; # [20]
}
}
for @s2 -> $s2 # [21]
{
take $str.substr(0, $i) ~ $s2; # [22]
take $s2 ~ $str.substr(0, $i); # [23]
}
}
}
[1] The first string, without explicit constraints. See [2].
[2] The second string, with the same size as the first one, and that size must be larger than 0.
[3] The anagram check, as hinted at in the third example, is easiest done by converting both strings to a canonical form and comparing those.
See my United States of Anagrams with Raku article series for more information about anagrams.
[4] Do we have a pair of anagrams?
[5] We use gather to collect the scrambled candidates,
lazily, by calling «scramble».
See my Raku Gather, I Take article or docs.raku.org/language/control#gather/take for more information about gather/take.
[6] Iterate over the scrambled candidates.
[7] Do we have a match? If so, report success [7a] and exit [7b].
[8] No match? We have failed. Say so.
[9] The recursive procedure.
[10] Return the string itself.
[11] We are done (recursive wise) if we have a single character.
[12] We are splitting the string in two, and we iterate over the number of characters to include in the first (or left) one here.
[13] Scramble the left part,
[14] and the right part. Both non-lazily (as they are placed in an array)
[15] Iterate over the left part.
[16] Return the iterated left part combined with the right part.
[17] As above, with the order swapped.
[18] Iterate over the right part as well.
[19] Return the iterated left part, combined with the iterated right part.
[20] As above, with the order swapped.
[21] Iterate over the right part.
[22] Return the left part, combined with the iterated right part.
[23] As above, with the order swapped.
Running it:
$ ./scramble-string abc abc
True
$ ./scramble-string abcd cdba
True
$ ./scramble-string hello hiiii
False
$ ./scramble-string ateer eater
True
$ ./scramble-string abcd bdac
False
Looking good.
With verbose mode:
$ ./scramble-string -v abc abc
: Candidate: abc
True
$ ./scramble-string -v abcd cdba
: Candidate: abcd
: Candidate: abcd
: Candidate: bcda
: Candidate: abcd
: Candidate: bcda
: Candidate: abcd
: Candidate: bcda
: Candidate: acdb
: Candidate: cdba
True
$ ./scramble-string -v hello hiiii
: Not anagrams
False
$ ./scramble-string -v ateer eater | wc
3983 11947 75663
$ ./scramble-string -v abcd bdac | wc
620 1858 11148
The 4th and 5th examples sprout a lot of verbose output, so I have reduced the output to a line count instead (the first number).
And that's it.