The Dawn of War II Beta is upon us, which brings with it a series of new file formats compared to Dawn of War 1 and Company of Heroes. The most important of these is the SGA format, which is the archive format used by Relic games for game assets. Dawn of War 1 used version 2 SGA files, Company of Heroes used version 4 and Dawn of War II uses version 5 (for completeness, I think Impossible Creatures used version 1, some unreleased project or The Outfit used version 3, and Company of Heroes Online used version 4.1).
I've already released SgaReader2 for looking inside version 5 (and 4.1, 4 and 2) SGA archives. In order to actually mod Dawn of War II properly, we need to be make to make new version 5 archives rather than just read existing ones, which is where sga4to5 comes in.
sga4to5 is a small (8 kilobyte) application which converts a version 4 SGA archive into a version 5 archive.
Combine this tool with something capable of making version 4 SGA archives (e.g. Mod Studio with a CoH mod loaded, or CoH's archive.exe) and you can create new version 5 SGA archives and start modding Dawn of War II. Note that unlike most of my tools, sga4to5 is a command line application, so a quick guide to its command line parameters is in order:
Usage: sga4to5.exe -i[in[put]] file -o[ut[put]] file [-name newname] [-q[uiet]] [-v[erbose]] -i, -in or -input specify the input file (version 4.0 / CoH SGA) -o, -out or -output specify the output file (version 5.0 / DoW2 SGA) -name changes the name in the file header of the output -q or -quiet reduces the amount written to the console -v or -verbose causes more than usual to be written to the console Sample usage: sga4to5.exe -i "E:\Valve\Steam\SteamApps\common\warhammer 40,000 dawn of war ii - beta\MahArchives\v4Attrib.sga" -o "E:\Valve\Steam\SteamApps\common\warhammer 40,000 dawn of war ii - beta\MahArchives\GameAttrib.sga" -v -name "Attributes" Output from above command: -- Corsix's SGA v4 to v5 Convertor -- Opened input file and output file Input archive details: name: Made with Corsix's Rainman version: 4.0 data header offset: 184 data header size: 237960 data offset: 238144 data length: 13494961 content MD5: 0F6155399F5200D5F1FB5F890052542F header MD5: 961A968EDB37216BE0D88DB5EDE95D48 Copying file data, this may take a while... Output archive details: name: Attributes version: 5.0 data header offset: 13495157 data header size: 237958 data offset: 196 data length: 13494961 content MD5: C1322F3DE54A35A93D14F6A5DEF04ACD header MD5: 4A6A47C63F46490A75800869CF715F75 Done
The actual conversion details are not too interesting; change the version number, add an extra field to the file header, relocate the data header to the end of the file, shrink the TOC records by 2 bytes each, update various offset and length fields, and recalculate the checksums. There are a few interesting implementation details like the initialisation vectors used for checksums and the precise definition of what data is checksummed, but the process is fairly simple conceptually.
More interesting (to me) is the process of squishing the program down to a mere 8 kilobytes. By default, a C++ program compiled using Visual Studio will either dynamically link to
msvcr[t|p]90.dll (Microsoft's C/C++ runtime library) or include the required parts of the CRT within the executable itself. Using the former method,
sga4to5.exe came out to around 18 kilobytes, but it referenced a several-hundred-kilobyte DLL, and using the latter method, it came to around 70 kilobytes. For a small program, the several-hundred-kilobyte runtime DLL is very excessive, and the 70 kilobyte amalgamation is still 10 times larger than it needs to be. To get any smaller, the C runtime has to be removed, which is a non-trivial process:
- CRT file access functions like
fclose, etc. need to be replaced with win32 file access functions like
- CRT memory allocators like
deleteneed to be replaced with win32 memory allocators like
VirtualAllochas a minimum granularity of one memory page (~4 KB), so
HeapAllocmay be better choices for some applications).
- CRT console I/O functions like
[w]printfneed to rewritten to use win32's
WriteConsole[W]. Win32 doesn't have a direct analogue to
[w]printf, so you need to write your own implementation which does enough for your needs (
%luwere the only wprintf escapes I needed, and are simple to reimplement) and then writes string buffers using
argvparameters are provided by the CRT, so they need to be obtained via calls to
- CRT string functions like
wcs(i)cmpneed to be replaced with win32's
- CRT memory functions like
memsetneed to be reimplemented. This is slightly difficult as the win32 functions like
ZeroMemoryjust alias to the CRT functions, and Microsoft's C++ compiler tries to be helpful and efficient by replacing code which looks like
memsetwith calls to
memset. In my implementation of
memset, I had to insert
__asm nopinto the loop body to confuse the compiler enough so that it didn't optimise it away to a
- The C++ compiler has to be instructed not to use buffer security checks (as they call CRT code), not to use C++ exceptions (as they require CRT code) and not to enable run-time-type-information (RTTI) (as again, it requires some CRT code).
- The C++ linker has to be instructed to ignore the CRT library, not to embed an XP manifest (as it'll reference the CRT, and add half a kilobyte to the file size), and to use
[w]mainas the entry point rather than the CRT's
Note that the above list is the steps that I had to take to make
sga4to5.exe CRT-less, other applications may have other requirements upon the CRT which are harder to remove (YMMV). After performing the above steps,
sga4to5.exe came out to 14 kilobytes and referenced only two DLLs, both of which are core Windows DLLs:
CommandLineToArgvW). The final step was to pass the executable through UPX, which squished it down a bit more to the final size of 8 kilobytes.