One non-obvious corner of using DynASM to generate x64 machine code is how to make calls to existing C functions, such as printf. x64 makes this difficult because the address space is 64-bits, but most instructions can only have 32-bit immediate values or 32-bit offsets.
One of the few instructions which accept a 64-bit immediate is for moving a constant into a register - in the Intel and AMD manuals, this is the mov r64, imm64 instruction. DynASM can emit this instruction, but using the mnemonic mov64 rather than mov. This leads to the following pattern for calling functions:
An alternative pattern is to use the same approach as shared libraries: import tables. The basic idea is that the import table contains a list of function pointers, and calls are done using the call [rip + imm32] instructions, where imm32 is the distance between the instruction pointer and the function pointer. DynASM will emit this instruction in x64 mode when given assembler of the form call qword [label], so the only tricky bit left is to stick the appropriate function pointer at the appropriate label. One pattern for doing this is the following:
.section code, imports directive tells DynASM that we want to append instructions to two buffers rather than one, and that said buffers should be called .code and .imports. For each buffer, it registers a directive of the same name which sets the current output buffer to that one.
The .macro call_extern, target directive registers a one-argument macro called call_extern. Internally it uses token-pasting (the syntax for which is .., and has the same effect as the C preprocessor's ##) to create a label, uses .dword directives to write the function pointer, and uses .imports and .code to switch between output buffers.
Once the macro is registered, calls are done using call_extern.
Directly continuing from the first DynASM example, one obvious optimisation would be to write the remaining loop of run_job in assembly, thereby avoiding a function call on every iteration. This idea leads to the following version of transcode.dasm:
The interesting components of these changes are the jumps and the labels. Once you know that the -> prefix is DynASM's notation for so-called global labels, then the syntax becomes the same as in any other assembler: labels are introduced by suffixing them with a colon, and are jumped to by being used as an operand to a jump instruction. As well as global labels, DynASM also supports so-called local labels. The defining difference between the two is that an assembly fragment containing a global label can only be emitted once, whereas local labels can be emitted an unlimited number of times. As a consequence, when jumping to a local label, you need to specify whether to jump backwards to the nearest previous emission of that label, or forwards to the next subsequent emission of that label. As global labels can only be emitted once, so no such specification is needed.
| Label type | Syntax | Usage | Available names | Maximum emissions | Retrievable in C |
|---|---|---|---|---|---|
| Global | ->name: |
jmp ->name |
Any C identifier | 1 | Yes |
| Local | name: |
jmp >name (forward) orjmp <name (backward) |
Integers between 1 and 9 | ∞ | No |
| PC | =>expr: |
jmp =>expr |
Any C expression | N/A | No |
With labels explained, the remaining curiosity is the .globals directive: its effect is to emit a C enumeration with the names of all global labels. For this example, it causes the following to be written in transcode.h:
dasm_init and dasm_setup, we need to do the following:dasm_encode, the absolute address of ->loop_test: will be stored in global_labels[GLOB_loop_test], and likewise the absolute address of ->loop_body: will be stored in global_labels[GLOB_loop_body].
For completeness, the final C code is as follows:
As an example of where you might want to use DynASM, let us consider the problem of transforming an array of binary structures into an array of slightly different binary structures. For the sake of concreteness, let us assume that such transformation jobs are described by the following C structures:
num_input_records is really large and the transcoding needs to be done as fast as mechanically possible, then one idea might be to unroll the inner loop of run_job at runtime using DynASM. The idea is that the resulting code will look something like the following:make_transcoder, we need something to feed into DynASM. The following code is such an input, which we'll assume is in a file called transcode.dasc:transcode.h using the following command line:transcode.h, should look something like the following:make_transcoder:DynASM advertises itself as a dynamic assembler for code generation engines. I can think of several interpretations of what a dynamic assembler might be, not all of which are compatible with each other. As such, it is worth beginning my series about DynASM with a description of what it is and what it isn't.
The envisioned usage pattern is to have fragments of assembly code which are syntactically complete, except possibly for the values of some constants (i.e. the instructions, addressing modes, and registers are all fixed). A decision is made at runtime as to how many copies to make of each fragment, what the value of each constant should be in each fragment, and in what order to emit the fragments.
The DynASM site states that DynASM takes mixed C/Assembler source as input, and produces plain C code output. While this is true, it is also easy to misinterpret: the input to DynASM is a C file whose intended purpose is to emit machine code - the assembly portion of the input is the code to emit rather than the code to run in-line with the C portion. As an example, the intent of the following code is that when write_return_n is called, the machine code for return n; is emitted:
If DynASM was implemented by building up a string of assembly code and then passing the result to a stand-alone assembler, then the result of passing the above code through DynASM might be:
In reality, DynASM builds up a string of machine code rather than a string of assembly code, meaning that the actual output is somewhat closer to the following:
With this example in mind, DynASM can be described as a text-processing tool which takes lines starting with a vertical bar, interprets them as assembly code, and replaces them with C code which writes out the corresponding machine code. This description fails to mention a bunch of really nice features, but it gives the general idea.
Direct3D 10.1 is interoperable with the Windows GDI, but there is very little explicit documentation on the matter. MSDN has a diagram with Direct2D at the center of the interoperability web, but we have to rely on a DirectX blog post for the complete interoperability diagram. In particular, note that the latter diagram has a path from Direct3D10.1 to GDI via DXGI 1.1.
MSDN gives the impression that this interoperability is simple: when calling CreateTexture2D, there is a nice flag called D3D10_RESOURCE_MISC_GDI_COMPATIBLE, the documentation for which says that after enabling the flag, the resulting texture can be cast to an IDXGISurface1 and have GetDC called on it. A similar flag called DXGI_SWAP_CHAIN_FLAG_GDI_COMPATIBLE exists when creating a swap chains rather than textures. Unfortunately, if you try to naively use one of these flags, resource creation might well fail with E_INVALIDARG. The reason for this failure is stated on GetDC, but really needs to be more prominent:
The format for the surface or swap chain must be DXGI_FORMAT_B8G8R8A8_UNORM_SRGB or DXGI_FORMAT_B8G8R8A8_UNORM.If this constraint isn't satisfied, then it isn't GetDC which will fail, but the resource creation itself.
Recent comments
13 weeks 6 days ago
31 weeks 1 day ago
1 year 5 weeks ago
1 year 13 weeks ago
1 year 13 weeks ago
1 year 15 weeks ago
1 year 21 weeks ago
1 year 21 weeks ago
1 year 21 weeks ago
1 year 26 weeks ago