Popular Scramble with Raku

#370.1 Popular Word

You are given a string paragraph and an array of the banned words.

Write a script to return the most popular word that is not banned. It is guaranteed there is at least one word that is not banned and the answer is unique. The words in paragraph are case-insensitive and the answer should be in lowercase. The words can not contain punctuation symbols.

Example 1:

Input: $paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
       @banned = ("hit")
Output: "ball"

After removing punctuation and converting to lowercase, the word "hit"
appears 3 times, and "ball" appears 2 times.

Since "hit" is on the banned list, we ignore it.

Example 2:

Input: $paragraph = "Apple? apple! Apple, pear, orange, pear, apple, orange."
       @banned = ("apple", "pear")
Output: "orange"

"apple"  appears 4 times.
"pear"   appears 2 times.
"orange" appears 2 times.

"apple" and "pear" are both banned.

Even though "orange" has the same frequency as "pear", it is the only
non-banned word with the highest frequency.

Example 3:

Input: $paragraph = "A. a, a! A. B. b. b."
       @banned = ("b")
Output: "a"

"a" appears 4 times.
"b" appears 3 times.

The input has mixed casing and heavy punctuation.

The normalised, "a" is the clear winner, since "b" is banned, "a" is
the only choice.

Example 4:

Input: $paragraph = "Ball.ball,ball:apple!apple.banana"
       @banned = ("ball")
Output: "apple"

Here the punctuation acts as a delimiter.
"ball"   appears 3 times.
"apple"  appears 2 times.
"banana" appears 1 time.

Example 5:

Input: $paragraph = "The dog chased the cat, but the dog was faster than the cat."
       @banned = ("the", "dog")
Output: "cat"

"the" appears 4 times.
"dog" appears 2 times.
"cat" appears 2 times.

"chased", "but", "was", "faster", "than" appear 1 time each.
"the" is the most frequent but is banned.
"dog" is the next most frequent but is also banned.
The next most frequent non-banned word is "cat".

[3] «All» mode enables «verbose» mode as well. It is used to print all the words, even after a match has been found.

[4] We cannot do the split with words, as that one retains punctuation characters. So a custom regex it is. We get rid of empty matches with grep (as we will get that by two punctuation characters after each other). Then we turn the liberated words into lowercase with lc and finally turn the whole mess into a Bag, a hash like structure that counts the occurence of the (key)words.

[7] Iterate over the Bag content, sorted by the occurence and the highest one first (as we have swapped the order of the sort placeholders). This gives us the most frequent word first.

#370.2 Scramble String

You are given two strings A and B of the same length.

Write a script to return true if string B is a scramble of string A otherwise return false.

String B is a scramble of string A if A can be transformed into B by a single (recursive) scramble operation.

A scramble operation is:

If the string consists of only one character, return the string
Divide the string X into two non-empty parts
Optionally, exchange the order of those parts
Optionally, scramble each of those parts
Concatenate the scrambled parts to return a single string

Example 1:

Input: $str1 = "abc", $str2 = "acb"
Output: true

"abc"
split: ["a", "bc"]
split: ["a", ["b", "c"]]
swap: ["a", ["c", "b"]]
concatenate: "acb"

Example 2:

Input: $str1 = "abcd", $str2 = "cdba"
Output: true

"abcd"
split: ["ab", "cd"]
swap: ["cd", "ab"]
split: ["cd", ["a", "b"]]
swap: ["cd", ["b", "a"]]
concatenate: "cdba"

Example 3:

Input: $str1 = "hello", $str2 = "hiiii"
Output: false

A fundamental rule of scrambled strings is that they must be anagrams.

Example 4:

Input: $str1 = "ateer", $str2 = "eater"
Output: true

"ateer"
split: ["ate", "er"]
split: [["at", "e"], "er"]
swap: [["e", "at"], "er"]
concatenate: "eater"

Example 5:

Input: $str1 = "abcd", $str2 = "bdac"
Output: false

Recursion is the thing here, as hinbted at in the challenge text. The problem would be how to return the values, and we could do that by keeping track of the unchanged parts, both to the left and right of the current swap substring. But that requires adding a pre and a post variable (in addition to the actual string to scramble)- and that is not elegant.

The actual returns (so to speak) of the values are a good match for gather/take, and I let each level take care of concatenating the results from further calls, thus avoiding passing along left and right substrings.

[2] The second string, with the same size as the first one, and that size must be larger than 0.

[3] The anagram check, as hinted at in the third example, is easiest done by converting both strings to a canonical form and comparing those.

[5] We use gather to collect the scrambled candidates, lazily, by calling «scramble».

[12] We are splitting the string in two, and we iterate over the number of characters to include in the first (or left) one here.

The 4th and 5th examples sprout a lot of verbose output, so I have reduced the output to a line count instead (the first number).

Popular Scramble with Raku

Links