Smallest Cache with Raku

by Arne Sommer

Smallest Cache with Raku

[60] Published 1. March 2020.

This is my response to the Perl Weekly Challenge #49.

Challenge #49.1: Smallest Multiple

Write a script to accept a positive number as command line argument and print the smallest multiple of the given number consists of digits 0 and 1.

For example:

For given number 55, the smallest multiple is 110 consisting of digits 0 and 1.

Brute Force

Let us try a brute force approach, where we start with the given number trying multiples of it until we get a number consisting of zeroes and ones only.

This is not the best way of doing this, so scroll down to the «Binary First» section if you don't want the scenic detour.

File: smallmull-add
subset binary of Int where { /^<[ 0 1 ]>+$/ }; # [1]

unit sub MAIN (Int $number, :$verbose);        # [2]

my $current = $number;                         # [3]

loop                                           # [4]
{
  $current += $number;                         # [5]
  say ": Checking $current" if $verbose;
  if $current ~~ binary                        # [6]
  {
    say "Match: $current";                     # [6a]
    exit;                                      # [6b]
  }
}

[1] We need a way of deciding if the number consists of only zeroes and ones. A custom type, declared with «subset», is one way of doing it. The innermost part (<[ 0 1 ]>) is a Character Group with the digits 0 and 1 only. The + says that we require it 1 or more times, the inital ^ is an anchor to the start of the string, and $ is an anchor to the end of the string. So the pattern must contain these two digits only, at least once, and not anything else.

[2] I use the magical MAIN procedure to catch the arguments. Activate verbose mode with «--verbose» on the command line.

[3] The current value, starting with the user specified value.

[4] In an eternal loop,

[5] • Add the original value to the current one (in effect increasing the multiple by one, but addition should be cheaper (faster) than mutiplication).

[6] • Do we have a match; testing with the Smartmatch operator ~~? If yes, say so [6a] and exit [6b]. (Note that we could have used last instead of exit here.)

Running it:

$ raku smallmull-add 1
Match: 10

$ raku smallmull-add 2
Match: 10

$ raku smallmull-add 3
Match: 111

$ raku smallmull-add 4
Match: 100

$ raku smallmull-add 5
Match: 10

$ raku smallmull-add 6
Match: 1110

$ raku smallmull-add 7
Match: 1001

$ raku smallmull-add 8
Match: 1000

$ raku smallmull-add 9
Match: 111111111

The last one used about 28 minutes on my pc. That is not very impressive.

We can have a go at some additional values, with verbose mode:

$ raku /smallmull-add --verbose 10
: Checking 20
: Checking 30
: Checking 40
: Checking 50
: Checking 60
: Checking 70
: Checking 80
: Checking 90
: Checking 100
Match: 100

$ raku smallmull-add --verbose 11
: Checking 22
: Checking 33
: Checking 44
: Checking 55
: Checking 66
: Checking 77
: Checking 88
: Checking 99
: Checking 110
Match: 110

Note that most multiples of 9 (e.g. 9, 18, 27) are hard to compute.

We can speed it up considerably by skipping sequences that cannot match. E.g. if we get to "1020" or "1021" we skip to "1100" (or rather 1100, rounded down to the nearest multiple of the given number).

Feel free to scroll down to the «Binary First» section if you don't want the scary part of the scenic detour.

File: smallmull-add-smart
subset binary of Int where { /^<[ 0 1 ]>+$/ };

unit sub MAIN (Int $number, :$verbose);

my $current = $number;

loop
{
  $current += $number;
  say ": Checking $current" if $verbose;
  if $current ~~ binary
  {
    say "Match: $current";
    exit;
  }

  next if $current.chars < 2;

  my @digits = $current.comb.reverse;           # [1]

  for 0 .. @digits.end -> $index                # [2]
  {
    if @digits[$index] > 1                      # [2a]
    {
      @digits[$index +1]++;                     # [1a]
      @digits[0 .. $index] = (0 xx $index +1);  # [2b]
    }
  }

  my $new = @digits.reverse.join;               # [1]
  $current = $new - ( $new % $number );         # [3]
}

The code is complicated, but it seems to works. (Unless we do comprehensive testing.)

[1] The double reverse are there so that we can get a new digit added to the number without problems [1a]. If the last one is at index 6, accessing a non-exising one at index 7 gives something that is coerced to zero when we add 1 to it. Adding new digits is normally done up front, so without the reverse the first digit would be at index 0. Accessing a new digit at index -1 would fail.

[2] Iterate over the digits (with the least significant one first). If the digit is higher than 1 [2a], increase the next (higer) digit with 1 [1a] and reset the remaining digits (from the current one to the end) to 0 [2b].

[3] Round the resulting number down to the nearest multiple of the original input.

$ raku smallmull-add-smart 9
Match: 111111111

This one used about 38 seconds. That is a very impressive speed up, but still not good enough.

Running it with verbose mode shows that it skips a lot of numbers:

$ raku smallmull-add-smart --verbose 9
: Checking 18
: Checking 108
: Checking 117
: Checking 1008
: Checking 1017
: Checking 1107
: Checking 1116
: Checking 10008
...

Binary First

A far better approach is starting with the binary numbers, checking if they (taken as a decimal value) are divisible by the given number:

File: smallmull
unit sub MAIN (Int $number, :$verbose);

for 1 .. Inf -> $binary                           # [1]
{
  my $current = $binary.base(2);                  # [2]
  say ": Checking $current" if $verbose;

  if ($current > $number && $current %% $number)  # [3]
  {
    say "Match: $current";
    exit;
  }
}

[1] An infinite loop. We start with 1 (and not $number, as they are uncomparable).

[2] Convert the decimal number to a binary number.

[3] Check if the new number is divisible (with the %% Divisibility Operator). The first check is there to prevent the number itself from triggering a match.

Running it on the problematic number 9:

$ raku smallmull 9
Match: 111111111

This one used about 0.3 seconds on my pc. That is good.

It is tedious to run it manually for several values, so I wrote a version that can compute several values:

File: smallmull-upto
unit sub MAIN (Int $number, :$verbose, :$upto);  # [1]

$upto
  ?? ( show($_) for 2 .. $number )               # [2a]
  !! show($number);                              # [2b]

sub show ($number)
{
  for 1 .. Inf -> $binary
  {
    my $current = $binary.base(2);
    say ": Checking $current" if $verbose;

    if ($current > $number && $current %% $number)
    {
      say $upto                             # [3]
        ?? "$number Match: $current"        # [3a]
        !! "Match: $current"; 
      last;                                 # [4]
    }
  }
}

[1] Use the «--upto» named argument to activate «upto» mode.

[2] If «uptpo» mode, show the match for all values up to the specified number [2a]. If not, use the specified number only [2b].

[3] If «upto», show the number itself as well [3a].

[4] I had to use last instead of exit, so that the program doesn't terminate after the first iteration.

Running it:

$ raku smallmull-upto 9
Match: 111111111

$ raku smallmull-upto --upto 100
2 Match: 10
3 Match: 111
4 Match: 100
5 Match: 10
6 Match: 1110
7 Match: 1001
8 Match: 1000
9 Match: 111111111
10 Match: 100
11 Match: 110
12 Match: 11100
13 Match: 1001
14 Match: 10010
15 Match: 1110
16 Match: 10000
17 Match: 11101
18 Match: 1111111110
19 Match: 11001
20 Match: 100
21 Match: 10101
22 Match: 110
23 Match: 110101
24 Match: 111000
25 Match: 100
26 Match: 10010
27 Match: 1101111111
28 Match: 100100
29 Match: 1101101
30 Match: 1110
31 Match: 111011
32 Match: 100000
33 Match: 111111
34 Match: 111010
35 Match: 10010
36 Match: 11111111100
37 Match: 111
38 Match: 110010
39 Match: 10101
40 Match: 1000
41 Match: 11111
42 Match: 101010
43 Match: 1101101
44 Match: 1100
45 Match: 1111111110
46 Match: 1101010
47 Match: 10011
48 Match: 1110000
49 Match: 1100001
50 Match: 100
51 Match: 100011
52 Match: 100100
53 Match: 100011
54 Match: 11011111110
55 Match: 110
56 Match: 1001000
57 Match: 11001
58 Match: 11011010
59 Match: 11011111
60 Match: 11100
61 Match: 100101
62 Match: 1110110
63 Match: 1111011111
64 Match: 1000000
65 Match: 10010
66 Match: 1111110
67 Match: 1101011
68 Match: 1110100
69 Match: 10000101
70 Match: 10010
71 Match: 10011
72 Match: 111111111000
73 Match: 10001
74 Match: 1110
75 Match: 11100
76 Match: 1100100
77 Match: 1001
78 Match: 101010
79 Match: 10010011
80 Match: 10000
81 Match: 1111111101
82 Match: 111110
83 Match: 101011
84 Match: 1010100
85 Match: 111010
86 Match: 11011010
87 Match: 11010111
88 Match: 11000
89 Match: 11010101
90 Match: 1111111110
91 Match: 1001
92 Match: 11010100
93 Match: 10000011
94 Match: 100110
95 Match: 110010
96 Match: 11100000
97 Match: 11100001
98 Match: 11000010
99 Match: 111111111111111111
100 Match: 1000

This one used about 4.2 seconds. Pretty good.

Note how the multiples of 9 (9, 18, 27, 36, 54, 63, 72, 81, 90 and 99) stand out, as I stated earlier (without proof).

Why Binary is Faster

Let us consider 9, which gives us 111111111 as answer.
  • The initial program «smallmull-add» (where we added the number itself ad infinitum) has 12.345.679 (111111111/9) binary checks.
  • The amended program «smallmull-add-smart» comes out at only 255 checks (run the program in verbose mode and pipe it through «wc»), but has a lot of overhead.
  • The binary program «smallmull» has only 511 computations ("111111111".parse-base(2)), a reduction of about 99.996% compared to the initial one. That is impressive. Note that the program is faster (and easier on a code reviewer) than the second one, even if it has doubled the number of tests, as it doesn't have all the number testing and skipping.

Note that «smallmull-add-smart» works with 9, 18 and 27. But it does not work with e.g. 4. The program has half the checks than the binary program, and obviously misses the answer along the way. That is a serious problem. Also note that it doesn't work with 36, so it doesn't even get the multiples of 9 right. The binary version works, so I will not investigate this further. (Feel free to see if you can sort out the problem, though. You'll basically just end up with convoluted (or fancy) code to replicate the binary numbers.)

Challenge #49.2: LRU Cache

Write a script to demonstrate LRU Cache feature. It should support operations get and set. Accept the capacity of the LRU Cache as command line argument.

Definition of LRU: An access to an item is defined as a get or a set operati on of the item. “Least recently used” item is the one with the oldest access time.

For example:
capacity = 3
set(1, 3)
set(2, 5)
set(3, 7)

Cache at this point:
[Least recently used] 1,2,3 [most recently used]

get(2)      # returns 5

Cache looks like now:
[Least recently used] 1,3,2 [most recently used]

get(1)      # returns 3

Cache looks like now:
[Least recently used] 3,2,1 [most recently used]

get(4)      # returns -1

Cache unchanged:
[Least recently used] 3,2,1 [most recently used]

set(4, 9)

Cache is full, so pushes out key = 3:
[Least recently used] 2,1,4 [most recently used]

get(3)      # returns -1

I found this amazing post talking about LRU Cache.

This cache is a good candidate for a class. I'll start with the program, and present the class afterwards:

File: lru-cache
use lib 'lib';                                              # [1]
use Cache::LRU;                                             # [1a]

unit sub MAIN (Int $limit = 3, :$verbose);                  # [2]

say ": Cache size: $limit" if $verbose;

my $lru = Cache::LRU.new(:$limit, :$verbose);               # [3]

loop                                                        # [4]
{
  given prompt "Command (h for help): "                     # [5]
  {
     when /^c[apacity]?$/ { say $lru.get-limit; }           # [6]
     when /^c[apacity]? \s+ [\= \s+]? (\d+) $/
       { $lru.set-limit($0.Int); }                          # [7]
     when /^s[et]?\((\d+) \, \s* (\d+) \) $/
       { $lru.set($0.Int, $1.Int); }                        # [8]
     when /^s[et]? \s+ (\d+) \, \s* (\d+) $/
       { $lru.set($0.Int, $1.Int); }                        # [8a]
     when /^g[et]?\((\d+) \)$/ { say $lru.get($0.Int); }    # [9]
     when /^g[et]? \s+ (\d+)$/ { say $lru.get($0.Int); }    # [9a]
     when 'd'|'dump' { $lru.dump  if $lru.head; }           # [10]
     when 'data'     { $lru.dump2 if $lru.head; }           # [11]
     when 'h'|'help' { help; }                              # [12]
     when 'quit'|'q' { exit; }                              # [13]
     default { say "- Illegal command. Use 'h' for help"; } # [14]
  }
}

sub help
{
  ;         # [15]
}

[1] This line makes it possible to place the module in the specified location, without installing it. I have chosen a realistic name for the module [1a].

[2] The cache size defaults to 3, if not specified on the command line.

[3] Set up the Cache object. Note the colon syntax for passing on named arguments with the same variable name. (:$verbose is short for verbose => $verbose.)

[4] An eternal loop. See [13] for the exit strategy.

[5] I could have used if/elsif/else, but given/when gives more compact code. prompt displays the specified text, and waits for user input.

[6] The command «capacity» (and the short form «c») shows the capacity (or size) of the buffer.

[7] Specify an integer value after «capacity» to change the capacity (or size) of the buffer. The «=» part is optional, so all of «capacity = 12», «capacity 12», «c = 12» and «c 12» work. (The newline here and in [9] and [10] are only there because of the article with.)

[8] Type «set(1, 11)» to add the value «11» to the cach entry with id «1». These forms are also accepted: «set(1,11)», «s(1, 11)», «set(1,11)», «s(1,11)», «set 1, 11», «s 1, 11», «set 1,11» and «s 1,11».

[9] Type «get(1)» to retrieve the value with id «1» in the cache. The «get 1» form is also accepted.

[10] Type «dump» or «d» to see the cache entries, with the most recently used first.

[11] Type «data» to show the internal Raku data structure, starting with the most recently used.

[12] Type «help» or «h» to see the help message. Note that it is placed in the program (a procedure) and not in the class (a method), as the commands are part of the program.

[13] Type «quit» or «q» to exit the program.

[14] If the tests above fails. default is the gather/take version of the else in if/elsif blocks.

[15] I have to confess that I haven't bothered writing this one...

The Module

I have made a module of the entire cache infrastructure, to encapsulate the supporting data structures (head and tail pointers, the key lookup hash).

The elements themselves have a pointer to the higher and lower cache entries, so require a class of their own. I have placed this second class inside the first one, to hide it - and prevent the program from beeing able to use it directly. It can only be accessed via the «Cache::LRU» module.

File: lib/Cache/LRU.rakumod (partial)
class Cache::LRU               # [1]
{
  has $.limit  is rw = 3;      # [2]
  has $.verbose      = False;  # [3]
  has %.lookup is rw = ();     # [4]
  has $.head   is rw = Any;    # [5]
  has $.tail   is rw = Any;    # [6]

  class LRU-elem               # [7]
  {
    has $.id;                  # [8]
    has $.payload;             # [9]
    has $.higher is rw;        # [10]
    has $.lower  is rw;        # [11]
  }

[1] The cache module, which is all the user should care about.

[2] The cache capacity (or maximum size). This value can be changed dynamically on a cache in use.

[3] We can enable verbose mode on cache creation. (The missing is rw means that it cannot be changed later on.) Note that I haven't really added verbose output.

[4] The hash lookup, where we get the cache object for the given key.

[5] The head of the cache; the most recently used item.

[6] The tail of the cache; the least recently used item.

[7] The class for the individual cache elemts.

[8] • an ID.

[9] • the data (payload).

[10] • a pointer to the higher element.

[11] • a pointer to the lower element.

Note the is rw on the $.higher and $.lower pointers, as they will change values when we access the cache. Also note the missing is rw on $.payload, as I have chosen to make the value read only. If we change the value of an existing cache entry, the module will delete the existing object and then create a new one.

The code handling the size of the cache:

File: lib/Cache/LRU.rakumod (partial)
  method get-limit                       # [12]
  {
    return self.limit;                   # [12]
  }
    
  method set-limit (Int $new-limit)      # [13]
  {
    return if $new-limit < 1;            # [14]
    return if $new-limit == self.limit;  # [15]

    self.limit = $new-limit;             # [16]

    self.shrink while self.lookup.keys.elems > self.limit;  # [17]

    say ": New Cache Capacity: { self.limit }" if self.verbose;
  }

  method shrink                        # [18]
  {
    self.lookup{self.tail.id} :delete; # [19]
    self.tail = self.tail.higher;      # [20]
    self.tail.lower = Any;             # [21]
  }

[12] Get the current limit of the cache. Note that I haven't added a method to get the current size (the number of elements in the cache).

[13] Set a new limit (as an Int) for the cache;

[14] • Do nothing if the new size is less than 2. This stops negative numbers, 0 and 1.

[15] • Do nothing if the new size if the same as the current one.

[16] Set the new limit.

[17] If the new limit is smaller than the actual number of items in the cache, remove the last one one at a time until we reach the limit.

[18] This method removes tha last entry in the cache;

[19] • Remove the last entry (the tail) from the hash.

[20] • Set the tail pointer to the second last entry.

[21] • Remove the pointer from the new tail element. The result is that the old tail has a link count of 0, and is lost.

Then an internal method to extract (and remove) a given item from the cache. The element is not destroyed, so it can be inserted again, e.g. as a new head. The method is available for users of the module through the «delete» wrapper, as that one takes care of the hash entry (ensuring data consistency).

File: lib/Cache/LRU.rakumod (partial)
  method !extract ($elem)                       # [22]
  {
    $elem.higher                                # [23]
      ?? ( $elem.higher.lower = $elem.lower )   # [23a]
      !! ( self.head = $elem.lower );           # [23b]

    $elem.lower                                 # [24]
      ?? ( $elem.lower.higher = $elem.higher )  # [24a]
      !! ( self.tail = $elem.higher );          # [24b]

    $elem.higher = $elem.lower = Any;           # [25]
    return $elem;                               # [26]
  }

  method delete ($elem)                         # [27]
  {
    self.lookup{$elem.id}:delete;               # [27a]
    return self!extract($elem);                 # [27b]
  }

[22] This is a private method (specified by the leading !), only available form inside the module itself. It extracts an element from the cache, and takes care of all the pointers (except the «lookup» hash).

[23] Does this element have a higher one? If yes, bypass the current one. If not, set the lower one as the new head.

[24] Does this element have a lower one? If yes, bypass the current one. If not, set the higher one as the new tail.

[25] Remove the pointers from the extracted element.

[26] Return it.

[27] Extract an element [27a], and remove it from the hash [27b].

The method that gets a value:

File: lib/Cache/LRU.rakumod (partial)
  method get (Int $id)                                  # [28]
  {
    say ": get $id " if self.verbose;

    my $current = self.lookup{$id};                     # [29]

    return -1 unless $current;                          # [30]

    return $current.payload if $current === self.head;  # [31]

    self!extract($current);                             # [32]
  
    self.head.higher = $current;                        # [33]
    $current.lower   = self.head;                       # [34]
    self.head        = $current;                        # [35]

    return $current.payload;                            # [36]
  }

[28] Note the Int restriction on the ID. That complies with the challenge, but may be too restrictive.

[29] Get the element, given the ID.

[30] Return -1 if it doesn't exist.

[31] Return the value (the payload) if the element already is the head.

[32] Extract the element.

[33-35] Insert the «$current» element as the new head, before the current values:

  • [33] Set the current head's head to point to this one.
  • [34] Set this element's lower to point to the current head.
  • [35] Set the head to point to the current one.

[36] Return the value (the payload).

File: lib/Cache/LRU.rakumod (partial)
  method set (Int $id, $value)                               # [37]
  {
    say ": set $id @ $value" if self.verbose;

    my $old-head = self.head;                                # [38]
  
    self.head = LRU-elem.new(id => $id, payload => $value);  # [39]

    self.head.lower  = $old-head;                            # [40]
    $old-head.higher = self.head if $old-head;               # [41]
  
    if self.lookup{$id}                                      # [42]
    {
      self!extract(self.lookup{$id});                        # [43]
    }
    else                                                     # [44]
    {
      self.tail = self.head unless self.tail;                # [45]
      self.shrink if self.lookup.keys == self.limit;         # [46]
    }
    self.lookup{$id} = self.head;                            # [47]
  }
}

[37] Note the Int restriction on the ID.

[38] Take note of the current head.

[39] Create a new object, and set it as the new head.

[40] Set the new object's lower to point to the old head.

[41] Set the old head's upper to pint to the new head.

[42] If the object exist already;

[43] • remove that one.

[44] If not (i.e. it is a new ID);

[45] This one takes care of the very first element added to an empty cache, as the single element is both the head and tail.

[46] Remove the tail element if we have one element too many in the cache. (We haven't added the new one to the «lookup» hash yet, so we check for the limit.)

[47] Set up the «lookup» entry.

And finally the methods showing the data structure:

File: lib/Cache/LRU.rakumod (partial)
  method dump ($elem = self.head)             # [48]
  {
    print $elem.id ~ " -> " ~ $elem.payload;  # [49]
    print " (head)" if $elem === self.head;   # [49a]
    print " (tail)" if $elem === self.tail;   # [49b]
    say "";
  
    self.dump($elem.lower) if $elem.lower;    # [50]
  }

  method dump2 ($elem = self.head)            # [51]
  {
    say $elem;                                # [52]
  }

[48] A recursive method, that starts with the head and prints the ID and value (payload) for each cache entry all the way to the tail.

[49] The output. Prefix with a note if ut is the head or tail (or both).

[50] The recursive part.

[51] Dump the data structure as a Raku data structure.

[52] This one does the job.

Running it:

$ raku lru-cache
Command (h for help): s 1,11
Command (h for help): d
1 -> 11 (head) (tail)
Command (h for help): s 2,22
Command (h for help): d
2 -> 22 (head)
1 -> 11 (tail)
Command (h for help): s 3,33
Command (h for help): d
3 -> 33 (head)
2 -> 22
1 -> 11 (tail)
Command (h for help): s 4,44
Command (h for help): d
4 -> 44 (head)
3 -> 33
2 -> 22 (tail)
Command (h for help): s 4,444
Command (h for help): d
4 -> 444 (head)
3 -> 33
2 -> 22 (tail)
Command (h for help): s 3,333
Command (h for help): d
3 -> 333 (head)
4 -> 444
2 -> 22 (tail)
Command (h for help): s 2,222
Command (h for help): d
2 -> 222 (head)
3 -> 333
4 -> 444 (tail)
Command (h for help): c
3
Command (h for help): c 2
Command (h for help): d
2 -> 222 (head)
3 -> 333 (tail)
Command (h for help): c 3
Command (h for help): s 9,999            
Command (h for help): d
9 -> 999 (head)
2 -> 222
3 -> 333 (tail)
Command (h for help): data 
(my \Cache::LRU::LRU-elem_94053912773264 = Cache::LRU::LRU-elem.new
  (id => 9,
   payload => 999,
   higher => Any,
   lower =>
     (my \Cache::LRU::LRU-elem_94053912773200 = Cache::LRU::LRU-elem.new
        (id => 2,
         payload => 222,
         higher => Cache::LRU::LRU-elem_94053912773264,
         lower => Cache::LRU::LRU-elem.new
           (id => 3,
            payload => 333,
            higher => Cache::LRU::LRU-elem_94053912773200,
            lower => Any
           )
        )
     )
  )
)
Command (h for help): q

I have inserted a liberal amount of newlines and indentation on the last one.

Running it with the input given in the challenge (using the exact verbose syntax):

$ raku lru-cache
Command (h for help): capacity 3
Command (h for help): set(1, 3)
Command (h for help): set(2, 5)            
Command (h for help): set(3, 7)
Command (h for help): d
3 -> 7 (head)
2 -> 5
1 -> 3 (tail)
Command (h for help): get(2)
5
Command (h for help): d
2 -> 5 (head)
3 -> 7
1 -> 3 (tail)
Command (h for help): get(1)
3
Command (h for help): d
1 -> 3 (head)
2 -> 5
3 -> 7 (tail)
Command (h for help): get(4)
-1
Command (h for help): d
1 -> 3 (head)
2 -> 5
3 -> 7 (tail)
Command (h for help): set(4,9)
Command (h for help): d
4 -> 9 (head)
1 -> 3
2 -> 5 (tail)
Command (h for help): get(3)
-1
Command (h for help): q

And that's it.