Doubled Chessboard with Raku

#376.1 Chessboard Squares

You are given two coordinates of a square on 8x8 chessboard.

Write a script to find the given two coordinates have the same colour.

8 W B W B W B W B
7 B W B W B W B W
6 W B W B W B W B
5 B W B W B W B W
4 W B W B W B W B
3 B W B W B W B W
2 W B W B W B W B
1 B W B W B W B W
  a b c d e f g h

Example 1:

Input: $c1 = "a7", $c2 = "f4"
Output: true

Example 2:

Input: $c1 = "c1", $c2 = "e8"
Output: false

Example 3:

Input: $c1 = "b5", $c2 = "h2"
Output: false

Example 4:

Input: $c1 = "f3", $c2 = "h1"
Output: true

Example 5:

Input: $c1 = "a1", $c2 = "g8"
Output: false

[3] A custom type (with subset) to ensure a legal position on the board; a letter a-h followed by a digit 1-8.

[18] Reduse the column letter to an integer from 0 and up (by subtracting 'a'.ord from the ascii order of the letter itself - in case the character set is not ascii (and unicode) compatible). It should not really matter for the result, as both colours would be wrong in that case - giving a correct comparison result.

[20] Modulo 2 (in [18]) told us if the number is even or odd. Then return the correct colour.

#376.2 Doubled Words

You are given a string (which may contain embedded newlines) which is taken from a page on a website. The string will not contain brackets qw{ [ ] }.

Write a script that will find doubled words (such as “this this”) and highlight (wrap in brackets) each doubled word.

The script should:

Work across lines, even finding situations where a word at the end of one line is repeated at the beginning of the next.
Find doubled words despite capitalization differences, such as with 'The the...', as well as allow differing amounts of whitespace (spaces, tabs, newlines, and the like) to lie between the words.
Find doubled words even when separated by HTML tags. For example, to make a word bold: '...it is very very important...'. Only show lines containing doubled words.

Adapted from Mastering Regular Expressions, Third Edition by Jeffrey E. F. Friedl

Example 1:

Input: $str = "you're given the job of checking the pages on a\nweb server for doubled words (such as 'this this'), a common problem\nwith documents subject to heavy editing."
Output: "web server for doubled words (such as '[this] [this]'), a common problem"

Example 2:

Input: $str = "Find doubled words despite capitalization differences, such as with 'The\nthe...', as well as allow differing amounts of whitespace (spaces,\ntabs, newlines, and the like) to lie between the words."
Output: "Find doubled words despite capitalization differences, such as with '[The]\n[the]...', as well as allow differing amounts of whitespace (spaces,"

Example 3:

Input: $str = "to make a word bold: '...it is <B>very</B> very important...'."
Output: "to make a word bold: '...it is <B>[very]</B> [very] important...'."

Example 4:

Input: $str = "Perl officially stands for Practical Extraction and Report Language, except when it doesn't."
Output: ""

Example 5:

Input: $str = "There's more than one one way to do it.\nEasy things should be easy and hard things should be possible."
Output: "There's more than [one] [one] way to do it."

The «look ahead» implied by the first bullet point («Work across lines») is problematic. It is better to do a «look behind», i.e. to change the previous line when we get to the repeated word on the next line.

The trick is to store each line on a parsed format that makes it possible to change the doubleness afterwards (i.e. when we parse the next line). That parsed format is actually a class, and it takes care of punctuation, html markup, case insensitivity, and constructing the printable output.

[5] The class used to store each chunk, a word with punctuation and/or html tag(s).

[10] Is the current word doubled? The default is "no", and note the is rw so that we can change the value after object construction.

[12] This method return a case insensitive version of the word, for easier comparison. I have chosen lowercase (lc), but note that upper and lowercasing does not always round trip:

[22] This method is used when we stringify the object. It will mark a doubled word in brackets, and reattach the original punctuation and/or html tags.

[34] The string \n" on the command line is not a newline, but the two characters \ and n. So we replace them with an actual newline with the substitution operator s///.

[38] The last input token, used on a new line to check what word we ended the previous one with.

[46] Split the line into words (with words), or chunks as I have called them - as they can contain more than just the word.

[50] Iterate over the chunks. Note the is copy as we update the value itself (in e.g. [57]).

[79] If we have a previous word (in @a), and that word is the same as the current one, mark the previous one [81] - and the current one [82] - as doubled.

[88] If the last line token variable has been set, we have a previous line. If that token variable is the same as the first word on this line, we mark the first word on this line [93], and the last word on the last line in the output buffer [94] as doubled.

[106] Print the row, but only if it has one or more doubled words. The Str method (in [22]) takes care of the stringification

I have added the newlines (and indentation) in the mushroom-entries to make them more readable. The program outputs everything on one line.

It is possible to construct examples that are not handled correctly, as the two first while loops should have been merged, as should the third and fourth.

Doubled Chessboard with Raku

Links