Introduction to simple iterators in Lua

Let us begin by considering what a simple iterator is. In Lua, a simple iterator is a function. Each time you call the function, you get back the next value in the sequence. In the following example, f is (assumed to be) a simple iterator:

local f = ("cat"):gmatch(".")
print(f()) --> c
print(f()) --> a
print(f()) --> t

[Technical note: In the reference implementation of Lua, string.gmatch returns a simple iterator, but it is questionable whether or not this is mandated by the language. In particular, LuaJIT returns a full iterator rather than a simple iterator; see later]

As Lua functions can return multiple values, it seems logical to allow iterators to produce multiple values per call, and this is indeed allowed. In the following example, f is (assumed to be) a simple iterator which produces two values per call:

local f = ("The Cute Cat"):gmatch("(%u)(%l*)")
print(f()) --> T, he
print(f()) --> C, ute
print(f()) --> C, at

To signal that it has finished producing values, an iterator returns nil. Well, nearly; an iterator can signal completion by returning no values, by returning a single nil, or by returning multiple values with the first value being nil. With this in mind, the previous example can be rewritten to use a loop rather than an explicit number of calls to f:

local f = ("The Cute Cat"):gmatch("(%u)(%l*)")
while true do
  local first, rest = f()
  if first == nil then
    break
  end
  print(first, rest)
end

This is rather verbose syntax, and thankfully the generic for loop can be used to achieve the same effect:

local f = ("The Cute Cat"):gmatch("(%u)(%l*)")
for first, rest in f do
  print(first, rest)
end

In the previous examples, string.gmatch is termed a simple iterator factory, as it returns a simple iterator (f). Equipped with this knowledge of how simple iterators work, we can write a simple iterator factory which emulates the numeric for loop:

function range(init, limit, step)
  step = step or 1
  return function()
    local value = init
    init = init + step
    if limit * step >= value * step then
      return value
    end
  end
end

This can then be used like so:

for n in range(3, 0, -1) do print(n) end --> 3; 2; 1; 0

Introduction to full iterators in Lua

Looking at range in slightly more detail, we see that every time it is called, it returns a function, and that the function has three upvalues (init, limit, and step). From an esoteric efficiency point of view, it would be nice to have less upvalues (or ideally no upvalues at all). Full iterators are one way of achieving this. Whereas a simple iterator is just a function (f), a full iterator is a three-tuple (f,s,v). As with a simple iterator, f is still a function (or something with a __call metamethod). As for the two new bits, s is an opaque value which is passed as the first parameter to each call of f, and v is an opaque value which is passed as the second parameter to the first call of f. Subsequent calls of f pass the first return value from the previous call of f as the the second parameter.

Written out formally like that, full iterators sound a little bit odd, but once you've wrapped your head around it, they are really quite reasonable. As an example, let us rewrite range to be a full iterator factory rather than a simple iterator factory. For this, we'll take s to be limit, and v to be init - step, like so:

function range(init, limit, step)
  step = step or 1
  return function(lim, value)
    value = value + step
    if lim * step >= value * step then
      return value
    end
  end, limit, init - step
end

As the generic for loop is designed to work with full iterators, this version of range can be used just like the previous version:

for n in range(3, 0, -1) do print(n) end --> 3; 2; 1; 0

Alternatively, to appreciate what is going on, we could use a verbose loop instead:

local f, s, v = range(3, 0, -1)
while true do
  v = f(s, v)
  if v == nil then
    break
  end
  print(v)
end

Looking at this version of range, the returned function only has one upvalue (step). In real terms, this means that 32 less bytes of space are allocated on the heap as compared to the three upvalue version. Admittedly this isn't much, and it hardly justifies the language support for full iterators. Rather more convincing are the pairs and ipairs full iterator factories from the Lua standard library. The full iterator machinery is precisely what these two functions need in order for their iterator functions to have no upvalues at all, meaning that you can iterate through a table without having to allocate anything on the heap, as in the following example:

local t = {
  Lemon = "sour",
  Cake = "nice",
}
for ingredient, taste in pairs(t) do
  print(ingredient .." is ".. taste)
end
--> Lemon is sour
--> Cake is nice

Mapping Lua iterators

Switching from Lua to Haskell for just a minute, recall the type signature of the map function:

map :: (a -> b) -> [a] -> [b]

This takes a function as its first parameter, and applies this function to every element of a list in order to generate a new list. Due to Haskell's lazy evaluation behaviour, it isn't unreasonable to equate Haskell lists with iterators in other languages. With that in mind, a first attempt at a map function for Lua iterators might be:

function map(mapf, f, ...)
  return function(...)
    return mapf(f(...))
  end, ...
end

For all intents and purposes, this is just dressed up function composition. For some cases, like the following, it even works:

local f = map(string.upper, ("cat"):gmatch("."))
print(f()) --> C
print(f()) --> A
print(f()) --> T

Unfortunately, the following example doesn't work quite as well:

local t = {
  Lemon = "sour",
  Cake = "nice",
}
for ingredient, taste in map(function(a, b)
  return a:lower(), b:upper()
end, pairs(t)) do
  print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> invalid key to 'next'

If you've understood how full iterators work, then you should be able to appreciate why this example fails after the first iteration: the map function is changing the results from the iterator function, and the first of these changed results is being passed back to the iterator function, which confuses it. To solve this issue, we need a slightly more complicated construction:

function map(mapf, f, s, v)
  local function domap(...)
    v = ...
    if v ~= nil then
      return mapf(...)
    end
  end
  return function()
    return domap(f(s, v))
  end
end

With this construction, we get the intended behaviour:

local t = {
  Lemon = "sour",
  Cake = "nice",
}
for ingredient, taste in map(function(a, b)
  return a:lower(), b:upper()
end, pairs(t)) do
  print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> cake is NICE

ConcatMapping Lua iterators

Jumping back to Haskell again, consider the slightly more complex concatMap function:

concatMap :: (a -> [b]) -> [a] -> [b]

This allows the iterator function to produce zero or more results for each value of input, for example:

module Main where
import Data.Char
main = putStr $
  concatMap (\c -> if isUpper c then [c] else [c,c]) "Cat" -- > "Caatt"

We could perform a direct conversion of this to Lua, and have the map function return an iterator, but unlike Haskell's syntactic support for constructing lists, Lua has none for constructing iterators, and so the result would feel rather clunky. The central issue is deciding how to return multiple values, or infact multiple tuples, from the map function. In Haskell it is natural to do this with lists, whereas in Lua we can achieve it with coroutines. Consider the following even more complex form of the map function in Lua:

function map(mapf, f, s, v)
  local done
  local function maybeyield(...)
    if ... ~= nil then
      coroutine.yield(...)
    end
  end
  local function domap(...)
    v = ...
    if v ~= nil then
      return maybeyield(mapf(...))
    else
      done = true
    end
  end
  return coroutine.wrap(function()
    repeat
      domap(f(s, v))
    until done
  end)
end

This version of map still permits the previous example:

local t = {
  Lemon = "sour",
  Cake = "nice",
}
for ingredient, taste in map(function(a, b)
  return a:lower(), b:upper()
end, pairs(t)) do
  print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> cake is NICE

Furthermore, it supports the generation of multiple tuples for each input by calling coroutine.yield for each additional tuple:

local t = {
  Lemon = "sour",
  Cake = "nice",
}
for ingredient, taste in map(function(a, b)
  coroutine.yield(a:lower(), b:upper())
  return a, "very ".. b
end, pairs(t)) do
  print(ingredient .." is ".. taste)
end
--> lemon is SOUR
--> Lemon is very sour
--> cake is NICE
--> Cake is very nice

Alternatively, if the map function doesn't return, then it doesn't generate anything at all:

for n in map(function(n)
  if n ~= 0 then
    return 1 / n
  end
end, range(-2, 2)) do
  print(n)
end
--> -0.5; -1; 1; 0.5