sga4to5
The Dawn of War II Beta is upon us, which brings with it a series of new file formats compared to Dawn of War 1 and Company of Heroes. The most important of these is the SGA format, which is the archive format used by Relic games for game assets. Dawn of War 1 used version 2 SGA files, Company of Heroes used version 4 and Dawn of War II uses version 5 (for completeness, I think Impossible Creatures used version 1, some unreleased project or The Outfit used version 3, and Company of Heroes Online used version 4.1).
I've already released SgaReader2 for looking inside version 5 (and 4.1, 4 and 2) SGA archives. In order to actually mod Dawn of War II properly, we need to be make to make new version 5 archives rather than just read existing ones, which is where sga4to5 comes in.
sga4to5 is a small (8 kilobyte) application which converts a version 4 SGA archive into a version 5 archive.
Download: sga4to5.exe
Combine this tool with something capable of making version 4 SGA archives (e.g. Mod Studio with a CoH mod loaded, or CoH's archive.exe) and you can create new version 5 SGA archives and start modding Dawn of War II. Note that unlike most of my tools, sga4to5 is a command line application, so a quick guide to its command line parameters is in order:
Usage: sga4to5.exe -i[in[put]] file -o[ut[put]] file [-name newname] [-q[uiet]] [-v[erbose]]
  -i, -in or -input specify the input file (version 4.0 / CoH SGA)
  -o, -out or -output specify the output file (version 5.0 / DoW2 SGA)
  -name changes the name in the file header of the output
  -q or -quiet reduces the amount written to the console
  -v or -verbose causes more than usual to be written to the console
Sample usage:
sga4to5.exe -i "E:\Valve\Steam\SteamApps\common\warhammer 40,000 dawn of war ii - beta\MahArchives\v4Attrib.sga" -o "E:\Valve\Steam\SteamApps\common\warhammer 40,000 dawn of war ii - beta\MahArchives\GameAttrib.sga" -v -name "Attributes"
Output from above command:
-- Corsix's SGA v4 to v5 Convertor --
Opened input file and output file
Input archive details:
  name: Made with Corsix's Rainman
  version: 4.0
  data header offset: 184
  data header size: 237960
  data offset: 238144
  data length: 13494961
  content MD5: 0F6155399F5200D5F1FB5F890052542F
  header  MD5: 961A968EDB37216BE0D88DB5EDE95D48
Copying file data, this may take a while...
Output archive details:
  name: Attributes
  version: 5.0
  data header offset: 13495157
  data header size: 237958
  data offset: 196
  data length: 13494961
  content MD5: C1322F3DE54A35A93D14F6A5DEF04ACD
  header  MD5: 4A6A47C63F46490A75800869CF715F75
Done
The actual conversion details are not too interesting; change the version number, add an extra field to the file header, relocate the data header to the end of the file, shrink the TOC records by 2 bytes each, update various offset and length fields, and recalculate the checksums. There are a few interesting implementation details like the initialisation vectors used for checksums and the precise definition of what data is checksummed, but the process is fairly simple conceptually.
More interesting (to me) is the process of squishing the program down to a mere 8 kilobytes. By default, a C++ program compiled using Visual Studio will either dynamically link to msvcr[t|p]90.dll (Microsoft's C/C++ runtime library) or include the required parts of the CRT within the executable itself. Using the former method, sga4to5.exe came out to around 18 kilobytes, but it referenced a several-hundred-kilobyte DLL, and using the latter method, it came to around 70 kilobytes. For a small program, the several-hundred-kilobyte runtime DLL is very excessive, and the 70 kilobyte amalgamation is still 10 times larger than it needs to be. To get any smaller, the C runtime has to be removed, which is a non-trivial process:
- CRT file access functions like fopen,fread,fwrite,fclose, etc. need to be replaced with win32 file access functions likeCreateFile,ReadFile,WriteFile,CloseHandle, etc.
- CRT memory allocators like new[]anddelete[]need to be replaced with win32 memory allocators likeVirtualAllocandVirtualFree(note thatVirtualAllochas a minimum granularity of one memory page (~4 KB), soHeapCreateandHeapAllocmay be better choices for some applications).
- CRT console I/O functions like [w]printfneed to rewritten to use win32'sWriteConsole[W]. Win32 doesn't have a direct analogue to[w]printf, so you need to write your own implementation which does enough for your needs (%s,%uand%luwere the only wprintf escapes I needed, and are simple to reimplement) and then writes string buffers usingWriteConsole[W].
- [w]main()'s- argcand- argvparameters are provided by the CRT, so they need to be obtained via calls to- CommandLineToArgv[W]and- GetCommandLine[W]instead.
- CRT string functions like wcs(i)cmpneed to be replaced with win32'sCompareStringEx(withNORM_IGNORECASEflag).
- CRT memory functions like memcpyandmemsetneed to be reimplemented. This is slightly difficult as the win32 functions likeCopyMemoryandZeroMemoryjust alias to the CRT functions, and Microsoft's C++ compiler tries to be helpful and efficient by replacing code which looks likememcpyandmemsetwith calls tomemcpyandmemset. In my implementation ofmemset, I had to insert__asm nopinto the loop body to confuse the compiler enough so that it didn't optimise it away to amemsetcall.
- The C++ compiler has to be instructed not to use buffer security checks (as they call CRT code), not to use C++ exceptions (as they require CRT code) and not to enable run-time-type-information (RTTI) (as again, it requires some CRT code).
- The C++ linker has to be instructed to ignore the CRT library, not to embed an XP manifest (as it'll reference the CRT, and add half a kilobyte to the file size), and to use [w]mainas the entry point rather than the CRT'smainfunction.
Note that the above list is the steps that I had to take to make sga4to5.exe CRT-less, other applications may have other requirements upon the CRT which are harder to remove (YMMV). After performing the above steps, sga4to5.exe came out to 14 kilobytes and referenced only two DLLs, both of which are core Windows DLLs: Kernel32.dll (for CloseHandle, CompareStringEx, CreateFileW, GetCommandLineW, GetStdHandle, ReadFile, SetFilePointer, VirtualAlloc, VirtualFree, WriteConsoleW and WriteFile) and Shell32.dll (for CommandLineToArgvW). The final step was to pass the executable through UPX, which squished it down a bit more to the final size of 8 kilobytes.