In one of the projects I'm currently working on, Lua is used a lot as a data description language. As a result, there are many occasions where I'd like to run a switch statement work over a Lua string. One example of this is the following function:

Art* MakeArt(lua_State* L)
{
  lua_getfield(L, -1, "type");
  switch(lua_tostring(L, -1))
  {
  case "Graphic":   return new GraphicArt(L);
  case "Rectangle": return new RectangleArt(L);
  case "Text":      return new TextArt(L);
  case "Line":      return new LineArt(L);
  case "Swf":       return new SwfArt(L);
  }
  return new UnknownArt(L);
}

To my mind, this code perfectly expresses the idea of performing different actions based on different strings from Lua. The only problem is that this code won't work. A common translation into code which does actually work is the following:

Art* MakeArt(lua_State* L)
{
  lua_getfield(L, -1, "type");
  if(const char* s = lua_tostring(L, -1))
  {
    if(!strcmp(s, "Graphic"))   return new GraphicArt(L);
    if(!strcmp(s, "Rectangle")) return new RectangleArt(L);
    if(!strcmp(s, "Text"))      return new TextArt(L);
    if(!strcmp(s, "Line"))      return new LineArt(L);
    if(!strcmp(s, "Swf"))       return new SwfArt(L);
  }
  return new UnknownArt(L);
}

This translation isn't perfect, as a string like "Line\0Stuff" will result in a LineArt rather than an UnknownArt, but usually this is acceptable. A further refinement might also make use of the string length:

Art* MakeArt(lua_State* L)
{
  lua_getfield(L, -1, "type");
  size_t len;
  if(const char* s = lua_tolstring(L, -1, &len))
  {
    switch(len)
    {
    case 7:
      if(!memcmp(s, "Graphic", 7)) return new GraphicArt(L);
      break;
    case 9:
      if(!memcmp(s, "Rectangle", 9)) return new RectangleArt(L);
      break;
    case 4:
      if(!memcmp(s, "Text", 4)) return new TextArt(L);
      if(!memcmp(s, "Line", 4)) return new LineArt(L);
      break;
    case 3:
      if(!memcmp(s, "Swf"), 3) return new SwfArt(L);
      break;
    }
  }
  return new UnknownArt(L);
}

This refinement is technically correct, and probably faster due to dispatching based on length and the use of memcmp rather than strcmp. However, compared to the very first (non-functional) code fragment, this is harder to read and harder to maintain. This point is important to bear in mind, but things will get worse still before they get better. It just so happens that in the project I'm working on, the Lua library is statically linked. This means I can take advantage of Lua's implementation details safe in the knowledge that the implementation isn't going to change. The detail which I'd like to take advantage of is that Lua calculates a hash value for each string, and stores this in a header prior to the string contents. It also stores the length in this header, which we can also take advantage of. This train of thought leads to the following code fragment:

Art* MakeArt(lua_State* L)
{
  lua_getfield(L, -1, "type");
  if(auto ts = reinterpret_cast<const TString*>(lua_tostring(L, -1)))
  {
    switch(ts[-1].tsv.len)
    {
    case 7:
      if(ts[-1].tsv.hash == 1408413413ULL) return new GraphicArt(L);
      break;
    case 9:
      if(ts[-1].tsv.hash == 3769334599ULL) return new RectangleArt(L);
      break;
    case 4:
      switch(ts[-1].tsv.hash)
      {
      case 7903477ULL: return new TextArt(scene, L);
      case 7864041ULL: return new LineArt(scene, L);
      }
      break;
    case 3:
      if(ts[-1].tsv.hash == 205275ULL) return new SwfArt(L);
      break;
    }
  }
  return new UnknownArt(L);
}

As with the previous version based on strcmp, this version isn't perfect. It is highly likely (at least for strings longer than 4 characters) that there will be some other strings of the same length and hash value, but this is acceptable to me. On the positive side, this version should be extremely quick, as switching over a hash should be faster than sequential memcmps. Unfortunately, not only is this version unreadable and unmaintainable, it is also unwritable.

At this point, enter code generation. It is an entirely mechanical process to translate the original (non-functional) code into hash-based dispatch code, and furthermore the hash calculations are best done by a machine. Therefore I've written a preprocessing script for C++ source files which performs this translation. The input looks almost like the original (non-functional) code, with the addition of some markers:

Art* MakeArt(lua_State* L)
{
  lua_getfield(L, -1, "type");
#ifdef LUA_STRING_SWITCH
  switch(lua_tostring(L, -1))
  {
  case "Graphic":   return new GraphicArt(L);
  case "Rectangle": return new RectangleArt(L);
  case "Text":      return new TextArt(L);
  case "Line":      return new LineArt(L);
  case "Swf":       return new SwfArt(L);
  }
#endif
  return new UnknownArt(L);
}

The script goes through the file, finds each #ifdef LUA_STRING_SWITCH, throws away any existing #else block, and then writes an #else block based on the contents of the switch, resulting in something like:

Art* MakeArt(lua_State* L)
{
  lua_getfield(L, -1, "type");
#ifdef LUA_STRING_SWITCH
  switch(lua_tostring(L, -1))
  {
  case "Graphic":   return new GraphicArt(L);
  case "Rectangle": return new RectangleArt(L);
  case "Text":      return new TextArt(L);
  case "Line":      return new LineArt(L);
  case "Swf":       return new SwfArt(L);
  }
#else
  if(auto ts = reinterpret_cast<const TString*>(lua_tostring(L, -1)))
  {
    switch(ts[-1].tsv.len)
    {
    case 7:
      if(ts[-1].tsv.hash == 1408413413ULL) return new GraphicArt(L);
      break;
    case 9:
      if(ts[-1].tsv.hash == 3769334599ULL) return new RectangleArt(L);
      break;
    case 4:
      switch(ts[-1].tsv.hash)
      {
      case 7903477ULL: return new TextArt(scene, L);
      case 7864041ULL: return new LineArt(scene, L);
      }
      break;
    case 3:
      if(ts[-1].tsv.hash == 205275ULL) return new SwfArt(L);
      break;
    }
  }
#endif
  return new UnknownArt(L);
}

At compile time, LUA_STRING_SWITCH isn't defined, and so the hash-based dispatch gets used. At edit time, I use the folding features of an IDE to hide the nasty hash-based dispatch, so all I see is:

Art* MakeArt(lua_State* L)
{
  lua_getfield(L, -1, "type");
#ifdef LUA_STRING_SWITCH
  switch(lua_tostring(L, -1))
  {
  case "Graphic":   return new GraphicArt(L);
  case "Rectangle": return new RectangleArt(L);
  case "Text":      return new TextArt(L);
  case "Line":      return new LineArt(L);
  case "Swf":       return new SwfArt(L);
  }
#else /* Hidden Preprocessor Block */
#endif
  return new UnknownArt(L);
}

This gives the best of both worlds. I can write pseudo-C++ which expresses my ideas very clearly, and then it can be mechanically translated and compiled down to something very efficient.