Tuesday, May 11, 2010

Somebody Save me

Saving. Hooboy.
This is gonna suck.

To save something, everything important must be written to a file. Every letter in a file takes up a byte of hard drive space.

First, the arithmetic.
To save a 100x100 Floor, made up of Tiles that can be saved in ten bytes, requires 10,000 tiles * 10 bytes/tile = 100 KB.
To save a 10-floor Dungeon takes 100KB/floor * 10 floors = 1MB
To save a 20-dungeon world takes 20MB.
20MB is a quite-large filesize for a single savegame. Especially because that's discounting needing to save Characters and Items too. That's also assuming I limit the average floor to 100x100 (I won't). That's also assuming dungeons will be only 10 floors on average (they won't). That's also assuming the entire world will only have 20 or so (it won't).

So, compression is necessary (admittedly, 10 bytes for a tile is ridiculous, but an exaggeration that makes my point).

Stuff written to a file must be done one item at a time. Naively, this entails putting an int in the output, followed by a space to delimit it from the next item. Problem: generally this converts things to ASCII, making things even worse ('118' would take 3 bytes, for instance).

How I'm going about minimizing savegame space:
Convert small ints to chars, first of all. This limits the values to 0-255, but allows for saving in a single character. The occasional value that requires more than that will be stored with the number of bytes needed, i.e. a 16bit integer would be stored in two characters.
Let things be reverse-engineerable. For instance, there's no need to save the x,y coordinates of each tile if I save them all in order, so I save two bytes (or four if I allow for bigger than 256x256 floors). The biggest problem is the sheer number of highly-uninteresting wall tiles. Currently I'd estimate that at least half (if not 2/3rds) of floor tiles wind up being walls. Making the common case the most optimal: let wall tiles be represented in the absolute minimum amount of space possible; a single byte saying "this is a wall, nothing interesting to see here, move along". I say a byte because the minimum amount I can write to a file is a single byte. BUT, I can do all kinds of processing to that byte before writing it, giving me access to the individual bits. Thus, I can compress a wall tile into a single bit
So, to save a floor, first I write the height/width of the floor (to know when to stop reading). Then, I write eight tiles worth of data at a time, i.e. a single byte. To make that byte, I grab eight tiles (in order), and for each wall tile write a 0 in the corresponding bit. If it's a floor tile, I write a 1. After that byte is figured out, write it to the file. Then, for all the floor tiles, I encode the type of tile it is (regular, stairs up/down, door, etc.), if an enemy is present, and if there are any items in a single byte. I do this by letting the first six bits be the tile type (plenty of types; 2^6 =64 different encodings), the next bit be if there is an enemy and the one after be if there are any items. Then I write the enemy data if there is one, and the item data if there are any. After I do that for each non-wall tile, continue writing 8 tiles at a time in this manner until done.
In total, this means my wall tiles are 1 bit, and my floor tiles are 9 bits, plus any enemies or items.

A character's stat increases on leveling up is entirely deterministic. This was chosen to avoid the frustrating feeling of "dammit if only this stupid game would give me a good character for once..." that someone (probably me) is liable to come across at some point. That said, it means oodles of fun fuzzy feelings for saving. Instead of having to save current HP, max HP, current MP, max MP, attack, defense, all nine base stats, plus whatever else I add (none of these would be single-byte values, by the way), I can just save a race and a list of class-level pairs. For enemies, this will be limited to two or three at max. This is because fighting an Orc Fighter-Wizard-Priest-Shaman-Bard would strain credibility (and readability). Most enemies will have a single class, if that (not liable to find any Giant Bat Berserkers), and I'm not likely to have more than 128 classes, so I can encode the number of classes into each class that I save: the first bit will be 1 if there are more classes after this one, and the next 7 bits will encode the class to read, as the index of the Race in the global vector. The next byte will be the number of levels the character has in it (possibly the next two bytes, depending on how far I let enemies go. The player will be given two bytes for sure; I don't need to worry as much about size there because there is only one player ever). This means that to save an enemy, I need: two bytes for currentHP, two bytes for currentMP, 1 byte for race, two-six bytes for class/levels, one byte for number of items enemy is carrying, and then all the items the enemy is carrying. So, the average enemy with no items will take up 8 bytes on disk.

Haven't really worked out how I'm gonna save items yet. That won't be too bad, as I can probably get everything down to just a few bytes on average, two bytes best-case. Each modifier takes up a single byte, and all items have at least one (the material). If I go over 128 adjectives (which I doubtlessly will) and 256 base types (maybe?), double that.

Now, let's look at the improved arithmetic:
Currently, I have a 100x100 floor with 40 monsters on it and 50 items. Let's say 50% of the floor is wall tiles (it's probably more like 70%, but this makes the math nicer). So, the average tile size is (1bit*50%+9bits*50%) = 5bits. The enemies have one class and don't carry items, so they each take up 8 bytes. The items have two modifiers (plus material) each on average, so they take up 4 bytes each.
In total then, we have: 
10,000 tiles*5bits/tile + 40monsters*8bytes/monster + 50items*4bytes/item
= 50,000bits/(8bits/byte) + 320bytes + 200 bytes
= 6,750bytes+520bytes = 7.27KB
Over thirteen times better, without oversimplifications.
Clearly, the thing needing the most work is the tiles, but 9bits/tile in the worst case is QUITE GOOD, so I'll be hard-pressed to do better than that (without simplifying tiles; I can probably do 5bits worst-case if I limit myself to floor/stairsup/stairsdown/door, but that'd be shortsighted).

THEN I'll compress everything with a zipping algorithm, thereby squeezing everything even further. Not gonna do that in early versions of the save algorithm, for headache-related reasons.
To load, I do all of that backwards. Less of a pain in the ass than it sounds.

2 comments:

  1. Hm... over thirteen times, indeed. Any consideration of saving which tile that fireball is on, or other such effects?

    ReplyDelete
  2. Similar to how enemies and on-floor items are saved. Stuff gets saved in recursive order of encapsulation (or something).

    ReplyDelete