Algorithmic std::string creation

Suppose we want to create a C++ std::string object, and we know how long the resulting string will be, and we know how we'll generate the content of the string, but we do not have a copy of said content in memory. As a concrete example, consider taking a block of memory, and returning a string whose content is the hexadecimal representation of that memory block. The result should satisfy the following tests:

std::string hexify(const char*, size_t);

assertEqual("baadf00d", hexify("\xBA\xAD\xF0\x0D", 4));
assertEqual("baad"    , hexify("\xBA\xAD\xF0\x0D", 2));

A naïve approach might be something along the following lines:

#include <sstream>
#include <iomanip>

std::string hexifyChar(int c)
{
  std::stringstream ss;
  ss << std::hex << std::setw(2) << std::setfill('0') << c;
  return ss.str();
}

std::string hexify(const char* base, size_t len)
{
  std::stringstream ss;
  for(size_t i = 0; i < len; ++i)
    ss << hexifyChar(base[i]);
  return ss.str();
}

This is fairly readable code, and if speed wasn't a concern, could well be a viable solution. On the other hand, if speed is desired, then this code isn't particularly good, at least not without a clairvoyant optimising compiler. Within both functions, the rather heavyweight std::stringstream tool is used, which may be slow due to be extremely flexible, and also due to not knowing how long its result will be. Given that we do not need most of the flexibility of std::stringstream, and we know how long the result will be (2 for hexifyChar, and 2 * len for hexify), we should expect to be able to do better.

As we want to construct a std::string, let us look at its constructors, and see which ones might work for us:

  1. Default constructor: string ();
  2. Copy constructor: string (const string& str);
  3. Substring constructor: string (const string& str, size_t pos, size_t n = npos);
  4. Existing content constructor: string (const char * s, size_t n);
  5. C string constructor: string (const char * s);
  6. Repetition constructor: string (size_t n, char c);
  7. Iterator constructor: template<class Iterator> string (Iterator begin, Iterator end);

None of these look particularly like what we want, but the last option might be viable if we can wrap a hexifying algorithm inside an iterator. If said iterator is infact a random access iterator, then the constructor will be able to obtain the length of the result before it starts fetching the individual characters.

Writing an STL-compatible iterator usually takes a lot of code, but we can use boost::iterator_facade to do most of the work for us, giving us the following code:

struct HexifyIterator
  : boost::iterator_facade<HexifyIterator, const char,
                           boost::random_access_traversal_tag>
{
  HexifyIterator()
    : ptr_()
    , nibble_() {}
  HexifyIterator(pointer ptr)
    : ptr_(reinterpret_cast<const unsigned char*>(ptr))
    , nibble_(1) {}
  HexifyIterator(const HexifyIterator& other)
    : iterator_facade(other)
    , ptr_(other.ptr_)
    , nibble_(other.nibble_) {}

private:
  friend class boost::iterator_core_access;

  const unsigned char* ptr_;
  difference_type nibble_;

  void increment() { advance(1); }
  void decrement() { advance(-1); }
  void advance(difference_type n)
  {
    nibble_ -= n;
    ptr_ -= nibble_ / 2;
    nibble_ %= 2;
    if(nibble_ < 0)
    {
      ++ptr_;
      nibble_ += 2;
    }
  }

  difference_type distance_to(const HexifyIterator& other) const
  { return (other.ptr_ - ptr_) * 2 + (nibble_ - other.nibble_); }

  bool equal(const HexifyIterator& other) const
  { return ptr_ == other.ptr_ && nibble_ == other.nibble_; }

  reference dereference() const
  { return "0123456789abcdef"[(*ptr_ >> (4 * nibble_)) & 0xF]; }
};

std::string hexify(const char* base, size_t len)
{
  return std::string(HexifyIterator(base), HexifyIterator(base + len));
}

There is still rather more code here than I would like, but in terms of speed, my quick and crude test put it at 72 times faster than the naïve approach.

A look at Lua 5.2 (beta rc2)

Just over a year ago, I looked at Lua 5.2 (work 3). Lua evolves slowly, and 5.2 still isn't quite finished, though perhaps it will be by the time the Lua Workshop 2011 rolls round in September (at which I hope to give a talk). In the mean time, Lua 5.2 (beta rc2) has been released, and this time I've identified the changes since 5.1 by annotating the new reference manual with them: An Annotated Lua 5.2 (beta rc2) Reference Manual.

Presenting the changes inline with the reference manual, as compared to just listing them, provides more context for the changes, as well as making things easier to consume by reducing the density of new information. If you have any good ideas for other ways of presenting this change information, then get in touch.

Callbacks with the LuaJIT FFI

The foreign function interface (FFI) present in the latest beta releases of LuaJIT is really nice for when you need to do things outside the Lua world. At runtime you pass it some C function definitions, and then everything you've defined becomes callable, and these calls are subsequently JIT compiled in the most efficient way possible. For example, we can define and then call some Windows API functions:

ffi = require "ffi"
ffi.cdef [[
  typedef void* HWND;
  HWND FindWindowA(const char* lpClassName, const char* lpWindowName);
  int GetWindowTextA(HWND hWnd, char* lpString, int nMaxCount); ]]
len = 300
buffer = ffi.new("char[?]", len)
window = ffi.C.FindWindowA("Notepad++", nil)
len = ffi.C.GetWindowTextA(window, buffer, len)
print(ffi.string(buffer, len)) --> C:\Lua\ffi_example.lua - Notepad++

This is fine and dandy for calling C from Lua, but things get rather more complicated with callbacks from C back to Lua. For example, the Windows EnumChildWindows function accepts a callback, which gets calls for every child of the given window. LuaJIT will happily accept and understand the definition of this function:

ffi.cdef [[
  typedef void* HWND;
  typedef bool (*WNDENUMPROC)(HWND, long);
  bool EnumChildWindows(HWND hWndParent, WNDENUMPROC lpEnumFunc, long lParam); ]]

You quickly run into a problem if you try to call it though, as you realise that the LuaJIT FFI currently lacks support for turning Lua functions into something which can called from C. At this point, most people would acknowledge that the FFI isn't yet complete, and then go to write their own C glue around EnumChildWindows using the traditional (slow) Lua C API. On the other hand, if you're feeling foolhardy, then you can fight the FFI to get callbacks working, and do so without resorting to any external C code. Naturally, this is what we'll do.

Our strategy will be to perform some control flow contortions so that when EnumChildWindows calls the callback, it infact returns to Lua, then Lua calls back to resume the enumeration. If we could write it in Lua, then it might look something like:

EnumChildWindows = coroutine.wrap(function()
  while true do
    ffi.C.EnumChildWindows(coroutine.yield(), function(hWnd)
      coroutine.yield(hWnd)
    end, nil)
  end
end)

Naturally we cannot write this in Lua, but we can write it in machine code, and we can then use the FFI to load and execute machine code. The coroutine trickery will be done by the Windows fiber API, as fibers are fairly similar to coroutines.

To start with, ConvertThreadToFiber can be called to convert the currently running thread into a fiber and return the handle to the fiber. Though if the thread is already a fiber then we run into a problem, as GetCurrentFiber is a macro rather than a function, and hence is not callable by the FFI. For now we'll ignore this issue, but it will be addressed later. Next we can call VirtualAlloc to allocate some executable memory, use ffi.copy to copy some machine code into said executable memory, then call the equivalent of coroutine.wrap, which is CreateFiber. In code, this looks like:

ffi.cdef [[
  void* ConvertThreadToFiber(void* lpParameter);
  typedef void (*LPFIBER_START_ROUTINE)(void*);
  void* CreateFiber(size_t dwStackSize, LPFIBER_START_ROUTINE lpStartAddress, void* lpParameter);
  void* VirtualAlloc(void* lpAddress, size_t dwSize, uint32_t flAllocationType, uint32_t flProtect); ]]
our_fiber = ffi.C.ConvertThreadToFiber(nil)
machine_code = "TODO"
procs = ffi.C.VirtualAlloc(nil, #machine_code + 1, 0x3000, 0x40)
ffi.copy(procs, machine_code)
contortion_fiber = ffi.C.CreateFiber(1024, ffi.cast("LPFIBER_START_ROUTINE", procs), nil)

The next task is to replace the TODO with the machine code equivalent of the following pseudo-C:

for(;;) {
  EnumChildWindows(coroutine.yield(), EnumerationProcedure, 0);
}

BOOL EnumerationProcedure(HWND hWnd, void* lpParam) {
  coroutine.yield(hWnd);
  return TRUE;
}

First of all we need to make the pseudo-C slightly more C-like. In particular, the above still uses a hypothetical coroutine.yield. The fiber API presents a SwitchToFiber function, which differs from coroutine.yield in that it doesn't support parameters or return values, and it requires telling which fiber to switch to. We thus end up with something like:

void* our_fiber; // The result of ConvertThreadToFiber.
void* transfer_slot[2]; // To yield a value, put the value in [0] and a non-NULL value in [1].
                        // To yield nothing, put anything in [0] and NULL in [1].
for(;;) {
  EnumChildWindows(transfer_slot[0], enum_proc, 0);
  transfer_slot[1] = NULL;
  SwitchToFiber(our_fiber);
}

BOOL EnumerationProcedure(HWND hWnd, void* lpParam) {
  transfer_slot[0] = hWnd;
  SwitchToFiber(our_fiber);
  return TRUE;
}

Next we need to convert this down to assembly code, firstly for x86:

fiber_proc:
push 0
push enum_proc
mov eax, dword ptr [transfer_slot]
push eax
call EnumChildWindows
mov dword ptr [transfer_slot + 4], 0
push our_fiber
call SwitchToFiber
jmp fiber_proc

enum_proc:
mov eax, dword ptr [esp+4]
mov dword ptr [transfer_slot + 4], eax
push our_fiber
call SwitchToFiber
mov eax, 1
retn 8

And secondly for x64:

fiber_proc:
sub rsp, 28h
after_prologue:
mov rcx, qword ptr [rip->transfer_slot]
lea rdx, qword ptr [rip->enum_proc]
call qword ptr [rip->EnumChildWindows]
mov qword ptr [rip->transfer_slot + 8], 0
mov rcx, qword ptr [rip->our_fiber]
call qword ptr [rip->SwitchToFiber]
jmp after_prologue

enum_proc:
sub rsp, 28h
mov qword ptr [rip->transfer_slot], rcx
mov rcx, qword ptr [rip->our_fiber]
call qword ptr [rip->SwitchToFiber]
mov rax, 1
add rsp, 28h
ret

transfer_slot:    dq
                  dq
EnumChildWindows: dq
our_fiber:        dq
SwitchToFiber:    dq

At this point, we return to our earlier problem of GetCurrentFiber being a macro, and note that it boils down to the following assembly code, firstly for x86:

mov eax, dword ptr fs:[10h]
ret

And similarly for x64:

mov rax, qword ptr gs:[20h]
ret

Now we can convert the assembly down to machine code, and put everything together:

local ffi = require "ffi"
-- The definitions we want to use.
ffi.cdef [[
  typedef void* HWND;
  typedef bool (*WNDENUMPROC)(HWND, long);
  bool EnumChildWindows(HWND hWndParent, WNDENUMPROC lpEnumFunc, long lParam);
  int GetWindowTextA(HWND hWnd, char* lpString, int nMaxCount); ]]
-- Extra definitions we need for performing contortions with fibers.
ffi.cdef [[
  void* ConvertThreadToFiber(void* lpParameter);
  void SwitchToFiber(void* lpFiber);
  typedef void (*LPFIBER_START_ROUTINE)(void*);
  void* CreateFiber(size_t dwStackSize, LPFIBER_START_ROUTINE lpStartAddress, void* lpParameter);
  uint32_t GetLastError(void);
  void* VirtualAlloc(void* lpAddress, size_t dwSize, uint32_t flAllocationType, uint32_t flProtect);
  bool RtlAddFunctionTable(void* FunctionTable, uint32_t EntryCount, void* BaseAddress); ]]

local EnumChildWindows
do
  local GetLastError = ffi.C.GetLastError
  local contortion_fiber
  local procs
  local transfer_slot
  local init_callbacks
  if ffi.arch == "x86" then
    init_callbacks = function()
      -- Ensure that the thread is a fiber, converting if required.
      local our_fiber = ffi.C.ConvertThreadToFiber(nil)
      if our_fiber == nil and GetLastError() ~= 1280 then
        error("Unable to convert thread to fiber")
      end
      transfer_slot = ffi.new("void*[2]")
      -- fiber_proc: for(;;) {
      --               EnumChildWindows(transfer_slot[0], enum_proc, 0);
      --               transfer_slot[1] = 0; // to mark end of iteration
      --               SwitchToFiber(our_fiber);
      --             }
      local asm = "\x6A\x00" -- push 0
               .. "\x68????" -- push ????
               .. "\xA1????\x50" -- mov eax, dword ptr [????], push eax
               .. "\xE8????" -- call ????
               .. "\xC7\x05????\x00\x00\x00\x00" -- mov dword ptr [????], 0
               .. "\x68????" -- push ????
               .. "\xE8????" -- call ????
               .. "\xEB\xD8" -- jmp $-40
      -- enum_proc: transfer_slot[0] = *(esp+4); // the HWND
      --            SwitchToFiber(our_fiber);
      --            return TRUE;
               .. "\x8B\x44\x24\x04" -- mov eax, dword ptr [esp+4]
               .. "\x3E\xA3????" -- mov dword ptr [????], eax
               .. "\x68????" -- push ????
               .. "\xE8????" -- call ????
               .. "\x33\xC0\x40" -- mov eax, 1
               .. "\xC2\x08" -- retn 8 (*)
      procs = ffi.C.VirtualAlloc(nil, #asm + 1, 0x3000, 0x40)
      if our_fiber == nil then
        -- GetCurrentFiber()
        ffi.copy(procs, "\x64\xA1\x10\x00\x00\x00\xC3") -- return __readfsdword(0x10)
        our_fiber = ffi.cast("void*(*)(void)", procs)()
      end
      ffi.copy(procs, asm)
      local function fixup(offset, ptr, isrelative)
        local dst = ffi.cast("char*", procs) + offset
        ptr = ffi.cast("char*", ptr)
        if isrelative then
          ptr = ffi.cast("char*", ptr - (dst + 4))
        end
        ffi.cast("char**", dst)[0] = ptr
      end
      fixup( 3, ffi.cast("char*", procs) + 40)
      fixup( 8, transfer_slot)
      fixup(14, ffi.C.EnumChildWindows, true)
      fixup(20, transfer_slot + 1)
      fixup(29, our_fiber)
      fixup(34, ffi.C.SwitchToFiber, true)
      fixup(46, transfer_slot)
      fixup(51, our_fiber)
      fixup(56, ffi.C.SwitchToFiber, true)
      contortion_fiber = ffi.C.CreateFiber(1024, ffi.cast("LPFIBER_START_ROUTINE", procs), nil)
      init_callbacks = function() end
    end
  elseif ffi.arch == "x64" then
    init_callbacks = function()
      -- Ensure that the thread is a fiber, converting if required.
      local our_fiber = ffi.C.ConvertThreadToFiber(nil)
      if our_fiber == nil and GetLastError() ~= 1280 then
        error("Unable to convert thread to fiber")
      end
      -- fiber_proc: for(;;) {
      --               EnumChildWindows(transfer_slot[0], enum_proc, 0);
      --               transfer_slot[1] = 0; // to mark end of iteration
      --               SwitchToFiber(our_fiber);
      --             }
      local asm = "\x48\x83\xEC\x28" -- sub rsp, 28h
               .. "\x48\x8B\x0D\x75\x00\x00\x00" -- mov rcx, [rip->transfer_slot_0]
               .. "\x48\x8D\x15\x26\x00\x00\x00" -- lea rdx, [rip->enum_proc]
               .. "\x48\xFF\x15\x77\x00\x00\x00" -- call [rip->EnumChildWindows]
               .. "\x48\xC7\x05\x64\x00\x00\x00\x00\x00\x00\x00" -- mov [rip->transfer_slot_1], 0
               .. "\x48\x8B\x0D\x6D\x00\x00\x00" -- mov rcx, [rip->our_fiber]
               .. "\x48\xFF\x15\x6E\x00\x00\x00" -- call [rip->SwitchToFiber]
               .. "\xEB\xD9" -- jmp $-48
               .. "\x90\x90\x90\x90" -- pad 8
      -- enum_proc: transfer_slot[0] = rcx; // the HWND
      --            SwitchToFiber(our_fiber);
      --            return TRUE;
               .. "\x48\x83\xEC\x28" -- sub rsp, 28h
               .. "\x48\x89\x0D\x3D\x00\x00\x00" -- mov [rip->transfer_slot_0], rcx
               .. "\x48\x8B\x0D\x4E\x00\x00\x00" -- mov rcx, [rip->our_fiber]
               .. "\x48\xFF\x15\x4F\x00\x00\x00" -- call [rip->SwitchToFiber]
               .. "\x48\xC7\xC0\x01\x00\x00\x00" -- mov rax, 1
               .. "\x48\x83\xC4\x28" -- add rsp, 28h
               .. "\xC3" -- ret
               .. "\x90\x90\x90" -- pad 8
      -- unwind data
               .. "\0\0\0\0\52\0\0\0\120\0\0\0"
               .. "\56\0\0\0\93\0\0\0\120\0\0\0"
               .. "\1\4\1\0\4\66"
               -- pad 8
      -- mutable data
               -- transfer_slot_0
               -- transfer_slot_1
               -- EnumChildWindows
               -- our_fiber
               -- SwitchToFiber
      procs = ffi.C.VirtualAlloc(nil, #asm + 42, 0x103000, 0x40)
      if our_fiber == nil then
        -- GetCurrentFiber()
        ffi.copy(procs, "\x65\x48\x8B\x04\x25\x20\x00\x00\x00\xC3") -- return __readgsqword(0x20)
        our_fiber = ffi.cast("void*(*)(void)", procs)()
      end
      ffi.copy(procs, asm)
      transfer_slot = ffi.cast("void**", ffi.cast("char*", procs) + 128)
      transfer_slot[2] = ffi.cast("void*", ffi.C.EnumChildWindows)
      transfer_slot[3] = ffi.cast("void*", our_fiber)
      transfer_slot[4] = ffi.cast("void*", ffi.C.SwitchToFiber)
      ffi.C.RtlAddFunctionTable(ffi.cast("void*", ffi.cast("char*", procs) + 96), 2, procs)
      contortion_fiber = ffi.C.CreateFiber(1024, ffi.cast("LPFIBER_START_ROUTINE", procs), nil)
      init_callbacks = function() end
    end
  else
    error("Only x86 and x64 are supported")
  end
  EnumChildWindows = function(wnd)
    init_callbacks()
    transfer_slot[0] = wnd
    transfer_slot[1] = ffi.cast("void*", 1)
    local results = {}
    while true do
      ffi.C.SwitchToFiber(contortion_fiber)
      if transfer_slot[1] == nil then
        return results
      else
        results[#results + 1] = transfer_slot[0]
      end
    end
  end
end

With this mass of heavy complicated machinery in place, we can finally perform our original goal of enumerating windows:

local buffer  = ffi.new("char[?]", 300)
for _, window in ipairs(EnumChildWindows(nil)) do
  local len = ffi.C.GetWindowTextA(window, buffer, 300)
  if len ~= 0 then
    print(ffi.string(buffer, len))
  end
end

I freely admit that this solution isn't at all elegant, but it does show that callbacks are possible with the current LuaJIT FFI, without the need of resorting to additional C libraries.

The VPS shuffle

I rent two virtual private servers, of which one is in the UK, and the other is in the USA. Recently I concluded that I was paying too much for my UK VPS, so I switched providers, which meant a change of location from London to Manchester, along with a change of IP address. During the migration, I decided that I didn't particularly like the authoritative DNS server I was running (PowerDNS), and so I decided to write my own authoritative DNS server. This caused the migration process to be rather longer than I expected; it changed from a few hours to about a week, but the end result was much more knowledge of the DNS protocol, a new side project, and a DNS sever which I'm happy with. Also, in the past few hours, my USA VPS migrated from Atlanta to Dallas due to my hosting provider wanting to phase out their presence in Atlanta, and this migration also carried with it a change of IP address. This migration was fairly quick and painless due to most of it being an automated process on the part of my hosting provider. The fact that the migration was quick was also helped by the fact that the VPS was already running my self written DNS server, and thus didn't prompt me into a new project.

Perhaps at some point in the future I'll finish extending this DNS server code, and actually publish it.

Brainfuck in Haskell, again

As often happens when I decide to revise Haksell, I've written a Brainfuck interpreter. This time around, there is better error handling, an optimisation stage, and a main procedure.

import IO
import System.Environment

-- Parser
type Cell = Int
type LineNum = Int
data Token = Leftward | Rightward | Plus Cell | In | Out | Loop [Token]
type Parsed = ([Token], String, LineNum)

parseChars :: String -> LineNum -> Parsed
parseChars [] l = ([], [], l)
parseChars (c:s) l = parseChar c s l
parseChar :: Char -> String -> LineNum -> Parsed
parseChar '<' = parseChar' Leftward
parseChar '>' = parseChar' Rightward
parseChar '+' = parseChar' (Plus 1)
parseChar '-' = parseChar' (Plus (-1))
parseChar ',' = parseChar' In
parseChar '.' = parseChar' Out
parseChar '[' =
  \s l -> let (ts, s', l') = parseChars s l in
    if null s' then
      error("Unmatched [ on line " ++ show l)
    else
      parseChar' (Loop ts) (tail s') l'
parseChar ']' = \s l -> ([], ']':s, l)
parseChar '\n' = \s l -> parseChars s (l + 1)
parseChar _ = parseChars
parseChar' :: Token -> String -> LineNum -> Parsed
parseChar' t s l = (t:ts, s', l') where (ts, s', l') = parseChars s l
parse :: String -> [Token]
parse s =
  if null s' then
    ts
  else
    error ("Unmatched ] on line " ++ show l)
  where (ts, s', l) = parseChars s 1

-- Optimiser
optimise :: [Token] -> [Token]
optimise ((Plus n):(Plus m):ts) = optimise ((Plus (n+m)):ts)
optimise ((Loop t):(Loop _):ts) = optimise ((Loop t):ts)
optimise (Leftward:Rightward:ts) = optimise ts
optimise (Rightward:Leftward:ts) = optimise ts
optimise ((Plus 0):ts) = optimise ts
optimise ((Loop t):ts) = (Loop (optimise t)):(optimise ts)
optimise (t:ts) = t:(optimise ts)
optimise [] = []

-- Evaluator
type Tape = ([Cell], Cell, [Cell])

eval1 :: Token -> Tape -> IO Tape
eval1 In s@(ls, _, rs) =
  getChar >>= \c ->
    if c == '\r' then
      eval1 In s
    else
      return (ls, fromEnum c, rs)
eval1 Out s@(_, c, _) = (putChar (toEnum c) >> return s)
eval1 t@(Loop ts) s@(_, c, _) =
  if c == 0 then
    return s
  else
    (eval' s ts) >>= (eval1 t)
eval1 Leftward (l:ls, c, rs) = return (ls, l, c:rs)
eval1 Rightward (ls, c, r:rs) = return (c:ls, r, rs)
eval1 (Plus n) (ls, c, rs) = return (ls, (c + n) `mod` 256, rs)
eval' :: Tape -> [Token] -> IO Tape
eval' e = foldl (\s t -> s >>= eval1 t) (return e)
eval :: [Token] -> IO ()
eval ts = eval' (repeat 0, 0, repeat 0) ts >> return ()

-- Glue
main :: IO ()
main = getArgs
   >>= (\ss -> if null ss then getContents else readFile (head ss))
   >>= eval . optimise . parse
page: 12 13 14 15 16