Fooled by a Sequence,
Twice

by Arne Sommer

Fooled by a Sequence, Twice

[173] Published 12. March 2022.

The First Time

In my Just the Fact with Raku article I made the following statement:

Grep will coerce whatever it is used on to a list.

That is plain and simple wrong!

This was an unfortunate result of late night programming, where I ended up with a sequence that (seemingly) did not produce any values. Then I had the wrong idea that the problem could be in the grep method, that it somehow expanded the sequence to the full list. The input sequence was infinite, so that would explain the lack of result. So I looked up the documentation, and lo and behold it supported my suspicion:

The Danger of Reading Documentation

The official documentation for grep looks like this (from the top), as of 9th March 2022:

«Any» is at the top of the class hierarchy, so everything specified here is available in more specific classes - unless overridden. The «Any» version of grep is explained as:

Coerces the invocant to a list by applying its .list method and uses List.grep on it.

The documentation has versions of grep for the classes (or types) «Any», «Supply», «RaceSeq», «List» and «HyperSeq». The «Seq» class (which is behind a Sequence) is not listed here. The documentation (see link below) has a type graph that shows that it does inherit the «Any» version. And that one coerces the data to a list, which is a non-lazy data structure. Right? (Spoiler alert: This is wrong.)

See docs.raku.org/routine/grep for more information about grep.

See docs.raku.org/type/Seq for more information about the Seq type.

So I wrote the infamous quote, rewrote the program using map instead, and went to bed. The next day I realised that the problem was elsewhere, and explained that. But I did not back up and reread the previuos paragraph, as I should. So the false statement remained there.

The real problem was that the sequence only had four values (1, 2, 145, 40585), so asking for 10 caused the program to go on indefinitely in search of the fifth value.

This is a simpler example of this problem:

> say (1..Inf).grep( 11 < * < 14)[0];  # -> 12
> say (1..Inf).grep( 11 < * < 14)[1];  # -> 13
> say (1..Inf).grep( 11 < * < 14)[2];  # hangs

This seemingly infinite sequence has a finite number of elements (the two numbers 12 and 13). Defining such a data structure as a sequence is a bad idea.

The opening statement «I ended up with a sequence that did not produce any values» is wrongish. I tried to access the ten first elements, and it got the four values. The program will not do anything until it has all ten - which will not happen.


The Second Time

The very next week I was confronted with yet another sequence when writing my Padovan is Missing with Raku and Perl article, remembered my gripe with grep, and used map again - and added a note about why. Wrongly, yet again.

Then I got an email from Bruce Gray, disagreeing with my assumption about grep. Reading the documentation for the «Seq» type uncovered this sentence:

A high-level construct to generate a Seq is gather/take, as well as many built-in methods like map and grep, low-level constructors to create a Seq from an iterator or from looping constructs are available too.
This supported my assumption that the coercion to a list did not happen. I added a note in the two articles, stating that there were an error in the documentation. (Spoiler alert: This is also wrong.)

I investigated further when I had time. That uncovered several false assumptions on my behalf. The first one was the initial culprit, that grep was non-lazy. This clearly is wrong, as the following sequence of prime numbers will show:

my $primes := (1 ... Inf).grep( *.is-prime );

An example:

say $primes[^10];  # -> (2 3 5 7 11 13 17 19 23 29)

This is sequence #033 from my Centenary Sequences with Raku Part 4 - Primes and Fibonacci article, so I should know better.

The second assumption was that lists are non-lazy. They are not:

> say (1..Inf).list.WHAT;     # -> (List)
> say (1..Inf).list.is-lazy;  # -> True

So the documentation is correct.

Excercise

Figure out the differences between these infinite data structures:

> say (1..Inf).WHAT;        # -> (Range)
> say (1...Inf).WHAT;       # -> (Seq)
> say (1..Inf).list.WHAT;   # -> (List)
> say (1...Inf).list.WHAT;  # -> (List)
Tip: Look up the documentation, if you are brave enough...

There Will Not be a Third Time...

- as that would form a sequence.

Famous last words notwithstanding...