The Nine Billion Names of God
Base

by Arne Sommer

The Nine Billion Names of God with Raku - Part 2: Base

[300.2] Published 4. August 2024.

[ Index | Introduction | Base | Recursion | Loops | Summary | RakuConf ]

Counting the Bases

Let us start with some examples.

If we have an alphabet with 1 character (e.g. *), we get these words:

Example 1:
*, **, **, ***, ***, ...

Feel free to call this the «Password alphabet».

If we have an alphabet with 2 characters (e.g. A and B), we get these words (shown here with linebreaks to emphasize the patterns):

Example 2a:
A, B
AA, AB, BA, BB
AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB
...

Raku has built in support for alphabets of different lengths, with the base and the reverse parse-base methods to convert to and fro.

Well, not alphabets as such, but number systems. The devil is in the details, so let us investigate.

The supported bases in base and parse-base go from 2 (binary; digits 0 and 1) all the way up to 36 (digits 0-9,a-z). The most famous one after decimal (base 10) and the already mentioned binary (base 2) is hexadecimal (base 16; digits 0-9,a-f).

Let us try to get Raku to generate the words from example 2a:

> say (0..10).map(*.base(2)).join(", ")
0, 1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010

See docs.raku.org/routine/base for more information about base.

Then we can rewrite example 2a, using 0 and 1 instead of A and B, to better match the Raku output:

Example 2b:
0, 1
00, 01, 10, 11
000, 001, 010, 011, 100, 101, 110, 111
...

The problem is zero, as «0», «00» and «000» are different words but not different numeric values.

If we skip the problematic zero, and use 1 and 2 instead of A and B, we get a much better result (i.e. legal numeric values):

Example 2c:
1, 2
11, 12, 21, 22
111, 112, 121, 122, 211, 212, 221, 222
...

The code to generate these words is more complicated:

> say (0..30).map(*.base(3)).grep( !*.contains('0')).join(", ")
1, 2, 11, 12, 21, 22, 111, 112, 121, 122, 211, 212, 221, 222

We use a base of one higher than the target, and get rid of values containing any zeroes.

The initial zero in the range can be removed, as zero will always be zero regardless of which base we convert it to.

Let us wrap this up in a program:

File: generate-words-sort-of
#! /usr/bin/env raku

unit sub MAIN (UInt :b(:$alphabet) where 0 < $alphabet <= 35 = 0,  # [1]
    	       UInt :c(:$chars)    where $chars > 0  = 4);         # [2]

my $values = gather                                                # [3]
{
  for 1 .. Inf -> $decimal                                         # [4]
  {
    my $in-base = $decimal.base($alphabet +1);                     # [5]

    last if $in-base.chars > $chars;                               # [6]
    next if $in-base.contains('0');                                # [7]

    take $in-base;                                                 # [8]
  }
}

say $values.join(", ");                                            # [9]

[1] The number of characters in the alphabet. Legal values are 1 .. 35. The default value of zero will teminate the program, so do specify a value.

[2] The maximum number of characters in the words, with an arbitrary 4 as the default.

[3] I think that using gather/take (see [8]) to collect the words is a good idea here.

See my Raku Gather, I Take article or docs.raku.org/language/control#gather/take for more information about gather/take.

[4] Iterate over all the positive integers, ad infinitum.

[5] Convert the current integer to the alphabet version. Note the base+1, as we need that to compensate for the zeroes we er going to get rid of.

[6] We are finished if we have exceeded the word length limit.

[7] Skip values containing any zeroes.

See docs.raku.org/routine/contains for more information about contains.

[8] Return (so to speak) the current word.

[9] Pretty print all the words.

Running it:

$ ./generate-words-sort-of -b=1 -c=3
1, 11, 111

$ ./generate-words-sort-of -b=2 -c=3
1, 2, 11, 12, 21, 22, 111, 112, 121, 122, 211, 212, 221, 222

$ ./generate-words-sort-of -b=2 -c=4
1, 2, 11, 12, 21, 22, 111, 112, 121, 122, 211, 212, 221, 222, \
1111, 1112, 1121, 1122, 1211, 1212, 1221, 1222, \
2111, 2112, 2121, 2122, 2211, 2212, 2221, 2222

Looking good.

Note that I have not tackled the «no more than three identical letters in succession» rule yet. We will do just that now.

Four Letters in Succession

This time an explicit loop is a better choice than the gather/take approach used in the prior program.

File: generate-words-base
#! /usr/bin/env raku

unit sub MAIN (UInt :a(:$alphabet) where 1 <= $alphabet <= 35 = 0,  # [1]
               UInt :c(:$chars) = 0,                                # [2]
               UInt :l(:$limit) = 0,                                # [3]
                    :v(:$verbose));

die "Set one of chars and limit" if $chars == $limit == 0;          # [4]

my $count      = 0;                                                 # [5]
my $last-word  = "";                                                # [6]
my $stopped-by = "";                                                # [7]

for 1 .. Inf -> $decimal
{
  { $stopped-by = 'limit'; last } if $limit && $count > $limit -1;  # [8]

  my $in-base = $decimal.base($alphabet +1);

  { $stopped-by = 'chars'; last } if $chars && $in-base.chars > $chars;
                                                                    # [9]

  next if $in-base.contains('0');
  next if $in-base ~~ /(.) {} :my $c=$0; <?after $c ** 4> /;        # [10]

  $count++;                                                         # [11]
  $last-word = $in-base;                                            # [12]
  say ": Value: $in-base (#$count)" if $verbose;
}

say $limit                                                          # [13]
  ?? "$count words (last '$last-word' with length { $last-word.chars } \
        stopped by $stopped-by target: { (1000000 * $count/$limit).Int / 10000 }%"
  !! "$count words (last '$last-word' with length { $last-word.chars } \
        stopped by $stopped-by)";

[1] The length of the alphabet (the number of letters).

[2] The maximum number of characters to allow in the words.

[3] The maximum number of (legal) words to compute.

The name «chars» could also have been «length», and «limit» could have been «count», so feel free to mix up «c» and «l».

[4] Ensure that the limits in either [2] or [3] (or both) is set.

[5] Number of words computed so far.

[6] The last word computed so far.

[7] Why have we stopped? (I.e. which rule is responsible.)

[8] Stop if we have a word count limit, and have reached it.

[9] Stop if we have a word length limit, and have reached it.

[10] The «no more than three identical letters in succession» rule, or «not four identical letters in succession» as the code means.

This fancy regexp is based on code in this stackoverflow article: Raku regex: How to use capturing group inside lookbehinds.

[11] Count the word.

[12] Save the word.

[13] Print the number of words, as well as the last word.

$ ./generate-words-base -a=2 -l=10 -v
: Value: 1 (#1)
: Value: 2 (#2)
: Value: 11 (#3)
: Value: 12 (#4)
: Value: 21 (#5)
: Value: 22 (#6)
: Value: 111 (#7)
: Value: 112 (#8)
: Value: 121 (#9)
: Value: 122 (#10)
10 words (last '122' with length 3 stopped by limit)

We can use both limits (l and c); the first one to be reached will terminate the program:

$ ./generate-words-base -a=2 -l=10 -c=2 -v
: Value: 1 (#1)
: Value: 2 (#2)
: Value: 11 (#3)
: Value: 12 (#4)
: Value: 21 (#5)
: Value: 22 (#6)
6 words (last '22' with length 2 by chars)

Let us test the «no more than three identical letters in succession» rule. This one will try to generate words with length 1-20, but most of them are illegal as we have a single letter alphabet.

$ ./generate-words-base -a=1 -c=20 -v
: Value: 1 (#1)
: Value: 11 (#2)
: Value: 111 (#3)
3 words (last '111' with length 3 stopped by chars)

Another one:

$ ./generate-words-base -a=2 -c=4 -v
: Value: 1 (#1)
: Value: 2 (#2)
: Value: 11 (#3)
: Value: 12 (#4)
: Value: 21 (#5)
: Value: 22 (#6)
: Value: 111 (#7)
: Value: 112 (#8)
: Value: 121 (#9)
: Value: 122 (#10)
: Value: 211 (#11)
: Value: 212 (#12)
: Value: 221 (#13)
: Value: 222 (#14)
: Value: 1112 (#15)
: Value: 1121 (#16)
: Value: 1122 (#17)
: Value: 1211 (#18)
: Value: 1212 (#19)
: Value: 1221 (#20)
: Value: 1222 (#21)
: Value: 2111 (#22)
: Value: 2112 (#23)
: Value: 2121 (#24)
: Value: 2122 (#25)
: Value: 2211 (#26)
: Value: 2212 (#27)
: Value: 2221 (#28)
28 words (last '2221' with length 4 stopped by chars)

Let us sit back and take stock. We now actually have a program that can solve the task for us. Let us have a go at it:

$ ./generate-words-base -c=9 -l=9_000_000_000 -a=1
3 words (last '111' with length 3 stopped by chars target: 0%

$ ./generate-words-base -c=9 -l=9_000_000_000 -a=2
650 words (last '222122212' with length 9 stopped by chars target: 0%

$ ./generate-words-base -c=9 -l=9_000_000_000 -a=3
25203 words (last '333233323' with length 9 stopped by chars target: 0.0002%

$ ./generate-words-base -c=9 -l=9_000_000_000 -a=4
325092 words (last '444344434' with length 9 stopped by chars target: 0.0036%

$ ./generate-words-base -c=9 -l=9_000_000_000 -a=5
2347955 words (last '555455545' with length 9 stopped by chars target: 0.026%

$ ./generate-words-base -c=9 -l=9_000_000_000 -a=6
11813838 words (last '666566656' with length 9 stopped by chars target: 0.1312%

$ ./generate-words-base -c=9 -l=9_000_000_000 -a=7
46374195 words (last '777677767' with length 9 stopped by chars target: 0.5152%

$ ./generate-words-base -c=9 -l=9_000_000_000 -a=8
151820168 words (last '888788878' with length 9 stopped by chars target: 1.6868%

And so on. Each command takes longer and longer time to execute.

How long you may ask. I'll get back to that in the summary.

We'll have a look at a much more elegant solution in the next part, using recursion.

[ Index | Introduction | Base | Recursion | Loops | Summary | RakuConf ]