Note: All source code in this post is from Lua 5.1, and hence is available under the MIT license (see the full copyright notice at the bottom of lua.h).

I was recently surprised by just how much the x64 C++ compiler which comes with Visual Studio 2008 (with all optimisations enabled) will inline function calls without being asked. Let us begin with a simple example:

void luaZ_init (lua_State *L, ZIO *z, lua_Reader reader,
                void *data) {
  z->L = L;
  z->reader = reader;
  z->data = data;
  z->n = 0;
  z->p = NULL;
}

As I would have expected, this function is usually inlined even though it is not marked as an inline function. It just doesn't make sense to go through all the call overhead for setting a few fields. Next up, a slightly less trivial function:

int luaD_protectedparser (lua_State *L, ZIO *z, const char *name) {
  struct SParser p;
  int status;
  p.z = z; p.name = name;
  luaZ_initbuffer(L, &p.buff);
  status = luaD_pcall(L, f_parser, &p, savestack(L, L->top),
                      L->errfunc);
  luaZ_freebuffer(L, &p.buff);
  return status;
}

The luaZ_initbuffer and luaZ_freebuffer functions are roughly the same complexity as luaZ_init, and so get inlined into luaD_protectedparser. Given this, I would not have thought that luaD_protectedparser would be inlined, but it turns out that it does actually get inlined. Now we look at a function which calls both of the prior two functions:

LUA_API int lua_load (lua_State *L, lua_Reader reader, void *data,
                      const char *chunkname) {
  ZIO z;
  int status;
  lua_lock(L);
  if (!chunkname) chunkname = "?";
  luaZ_init(L, &z, reader, data);
  status = luaD_protectedparser(L, &z, chunkname);
  lua_unlock(L);
  return status;
}

Due to the LUA_API decorator, the function gets exported, and so a non-inlined version of it is always available for use by code outside of the compilation unit. As the calls to luaZ_init and luaD_protectedparser are both inlined, lua_load starts to take up a reasonable number of bytes when compiled. Given this, along with the fact that a non-inline version has to be compiled anyway, I had believed that calls to lua_load from the same compilation unit would not get inlined. I was wrong. The final example for this post is a function which looks relatively small:

LUALIB_API int luaL_loadbuffer (lua_State *L, const char *buff,
                                size_t size, const char *name) {
  LoadS ls;
  ls.s = buff;
  ls.size = size;
  return lua_load(L, getS, &ls, name);
}

Much to my surprise, the call to lua_load in the above gets inlined (and then the calls which lua_load makes also get inlined) and so what looks like a very small function actually gets compiled to a large number of bytes, rather than becoming a small number of bytes which call lua_load.

Moral of the story: If you're relying on calls not being inlined, then you might want to adjust your mental idea of how large a function can be and still get inlined.