Introduction to simple iterators in Lua

Let us begin by considering what a simple iterator is. In Lua, a simple iterator is a function. Each time you call the function, you get back the next value in the sequence. In the following example, `f` is (assumed to be) a simple iterator:

``````local f = ("cat"):gmatch(".")
print(f()) --> c
print(f()) --> a
print(f()) --> t
``````

[Technical note: In the reference implementation of Lua, `string.gmatch` returns a simple iterator, but it is questionable whether or not this is mandated by the language. In particular, LuaJIT returns a full iterator rather than a simple iterator; see later]

As Lua functions can return multiple values, it seems logical to allow iterators to produce multiple values per call, and this is indeed allowed. In the following example, `f` is (assumed to be) a simple iterator which produces two values per call:

``````local f = ("The Cute Cat"):gmatch("(%u)(%l*)")
print(f()) --> T, he
print(f()) --> C, ute
print(f()) --> C, at
``````

To signal that it has finished producing values, an iterator returns `nil`. Well, nearly; an iterator can signal completion by returning no values, by returning a single `nil`, or by returning multiple values with the first value being `nil`. With this in mind, the previous example can be rewritten to use a loop rather than an explicit number of calls to `f`:

``````local f = ("The Cute Cat"):gmatch("(%u)(%l*)")
while true do
local first, rest = f()
if first == nil then
break
end
print(first, rest)
end
``````

This is rather verbose syntax, and thankfully the generic `for` loop can be used to achieve the same effect:

``````local f = ("The Cute Cat"):gmatch("(%u)(%l*)")
for first, rest in f do
print(first, rest)
end
``````

In the previous examples, `string.gmatch` is termed a simple iterator factory, as it returns a simple iterator (`f`). Equipped with this knowledge of how simple iterators work, we can write a simple iterator factory which emulates the numeric `for` loop:

``````function range(init, limit, step)
step = step or 1
return function()
local value = init
init = init + step
if limit * step >= value * step then
return value
end
end
end
``````

This can then be used like so:

``````for n in range(3, 0, -1) do print(n) end --> 3; 2; 1; 0
``````

Introduction to full iterators in Lua

Looking at `range` in slightly more detail, we see that every time it is called, it returns a function, and that the function has three upvalues (`init`, `limit`, and `step`). From an esoteric efficiency point of view, it would be nice to have less upvalues (or ideally no upvalues at all). Full iterators are one way of achieving this. Whereas a simple iterator is just a function (`f`), a full iterator is a three-tuple (`f,s,v`). As with a simple iterator, `f` is still a function (or something with a `__call` metamethod). As for the two new bits, `s` is an opaque value which is passed as the first parameter to each call of `f`, and `v` is an opaque value which is passed as the second parameter to the first call of `f`. Subsequent calls of `f` pass the first return value from the previous call of `f` as the the second parameter.

Written out formally like that, full iterators sound a little bit odd, but once you've wrapped your head around it, they are really quite reasonable. As an example, let us rewrite `range` to be a full iterator factory rather than a simple iterator factory. For this, we'll take `s` to be `limit`, and `v` to be `init - step`, like so:

``````function range(init, limit, step)
step = step or 1
return function(lim, value)
value = value + step
if lim * step >= value * step then
return value
end
end, limit, init - step
end
``````

As the generic `for` loop is designed to work with full iterators, this version of `range` can be used just like the previous version:

``````for n in range(3, 0, -1) do print(n) end --> 3; 2; 1; 0
``````

Alternatively, to appreciate what is going on, we could use a verbose loop instead:

``````local f, s, v = range(3, 0, -1)
while true do
v = f(s, v)
if v == nil then
break
end
print(v)
end
``````

Looking at this version of `range`, the returned function only has one upvalue (`step`). In real terms, this means that 32 less bytes of space are allocated on the heap as compared to the three upvalue version. Admittedly this isn't much, and it hardly justifies the language support for full iterators. Rather more convincing are the `pairs` and `ipairs` full iterator factories from the Lua standard library. The full iterator machinery is precisely what these two functions need in order for their iterator functions to have no upvalues at all, meaning that you can iterate through a table without having to allocate anything on the heap, as in the following example:

``````local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in pairs(t) do
print(ingredient .." is ".. taste)
end
--> Lemon is sour
--> Cake is nice
``````

Mapping Lua iterators

Switching from Lua to Haskell for just a minute, recall the type signature of the `map` function:

``````map :: (a -> b) -> [a] -> [b]
``````

This takes a function as its first parameter, and applies this function to every element of a list in order to generate a new list. Due to Haskell's lazy evaluation behaviour, it isn't unreasonable to equate Haskell lists with iterators in other languages. With that in mind, a first attempt at a `map` function for Lua iterators might be:

``````function map(mapf, f, ...)
return function(...)
return mapf(f(...))
end, ...
end
``````

For all intents and purposes, this is just dressed up function composition. For some cases, like the following, it even works:

``````local f = map(string.upper, ("cat"):gmatch("."))
print(f()) --> C
print(f()) --> A
print(f()) --> T
``````

Unfortunately, the following example doesn't work quite as well:

``````local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in map(function(a, b)
return a:lower(), b:upper()
end, pairs(t)) do
print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> invalid key to 'next'
``````

If you've understood how full iterators work, then you should be able to appreciate why this example fails after the first iteration: the map function is changing the results from the iterator function, and the first of these changed results is being passed back to the iterator function, which confuses it. To solve this issue, we need a slightly more complicated construction:

``````function map(mapf, f, s, v)
local function domap(...)
v = ...
if v ~= nil then
return mapf(...)
end
end
return function()
return domap(f(s, v))
end
end
``````

With this construction, we get the intended behaviour:

``````local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in map(function(a, b)
return a:lower(), b:upper()
end, pairs(t)) do
print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> cake is NICE
``````

ConcatMapping Lua iterators

Jumping back to Haskell again, consider the slightly more complex `concatMap` function:

``````concatMap :: (a -> [b]) -> [a] -> [b]
``````

This allows the iterator function to produce zero or more results for each value of input, for example:

``````module Main where
import Data.Char
main = putStr \$
concatMap (\c -> if isUpper c then [c] else [c,c]) "Cat" -- > "Caatt"
``````

We could perform a direct conversion of this to Lua, and have the map function return an iterator, but unlike Haskell's syntactic support for constructing lists, Lua has none for constructing iterators, and so the result would feel rather clunky. The central issue is deciding how to return multiple values, or infact multiple tuples, from the map function. In Haskell it is natural to do this with lists, whereas in Lua we can achieve it with coroutines. Consider the following even more complex form of the `map` function in Lua:

``````function map(mapf, f, s, v)
local done
local function maybeyield(...)
if ... ~= nil then
coroutine.yield(...)
end
end
local function domap(...)
v = ...
if v ~= nil then
return maybeyield(mapf(...))
else
done = true
end
end
return coroutine.wrap(function()
repeat
domap(f(s, v))
until done
end)
end
``````

This version of `map` still permits the previous example:

``````local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in map(function(a, b)
return a:lower(), b:upper()
end, pairs(t)) do
print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> cake is NICE
``````

Furthermore, it supports the generation of multiple tuples for each input by calling `coroutine.yield` for each additional tuple:

``````local t = {
Lemon = "sour",
Cake = "nice",
}
for ingredient, taste in map(function(a, b)
coroutine.yield(a:lower(), b:upper())
return a, "very ".. b
end, pairs(t)) do
print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> Lemon is very sour
--> cake is NICE
--> Cake is very nice
``````

Alternatively, if the map function doesn't return, then it doesn't generate anything at all:

``````for n in map(function(n)
if n ~= 0 then
return 1 / n
end
end, range(-2, 2)) do
print(n)
end
--> -0.5; -1; 1; 0.5
``````