Serialization
Skir defines a standard for serializing and deserializing data types to JSON and binary. The generated data classes implement this standard to ensure that data structures defined in your schema can be encoded and decoded consistently across all languages.
Serialization formats
When serializing a data structure, you can choose one of 3 formats:
| Format | Persistable | Space efficiency | Readability | Notes |
|---|---|---|---|---|
| JSON (Dense) | Yes: safe | High | Low | Default choice. Safe for persistence and offers a good balance between performance and debuggability. |
| JSON (Readable) | No: unsafe | Low | High | Good for debugging. Do not use for persistence: schema evolution (e.g. renaming fields) will break compatibility with old data. |
| Binary | Yes: safe | Very High | None | Most compact, fastest in languages like C++. |
JSON, dense flavor
This is the serialization format you should choose in most cases.
Structs are serialized as JSON arrays, where the field numbers in the index definition match the indexes in the array. Enum constants are serialized as numbers.
struct User {
user_id: int32;
removed;
name: string;
rest_day: Weekday;
pets: [Pet];
nickname: string;
}
const JOHN_DOE: User = {
user_id = 400,
name = "John Doe",
rest_day = "SUNDAY",
pets = [
{ name = "Fluffy" },
{ name = "Fido" },
],
nickname = "",
}The dense JSON representation of JOHN_DOE is:
[400,0,"John Doe",7,[["Fluffy"],["Fido"]]]A couple observations:
- Removed fields are replaced with zeros
- Trailing fields with default values (
nicknamein this example) are omitted
This format is not very readable, but it's compact and it allows you to rename fields in your struct definition without breaking backward compatibility.
Encoding rules
| Type | Encoded as | Examples |
|---|---|---|
| bool | 1 for true, 0 for false | 1 |
| int32 | A JSON number | 1234 |
int64 hash64 |
| 1234 "9007199254740992" |
float32 float64 |
| 1.23 "Infinity" |
| timestamp | A JSON number representing milliseconds since the Unix epoch | 1672531200000 |
| string | A JSON string | "Hello" |
| bytes | A Base64 string | "SGVsbG8=" |
| T? | null if the value is missing, otherwise the serialized value. | null 123 |
| [T] | A JSON array | [1, 2, 3] |
| struct | A JSON array. The array index corresponds to the field number. Removed fields are represented as 0. Trailing default values are omitted. | [400, 0, "John"] |
| enum |
| 1 [2, "value"] |
JSON, readable flavor
Structs are serialized as JSON objects, and enum constants are serialized as strings.
The readable JSON representation of JOHN_DOE is:
{
"user_id": 400,
"name": "John Doe",
"rest_day": "SUNDAY",
"pets": [
{ "name": "Fluffy" },
{ "name": "Fido" }
]
}This format is more verbose and readable, but it should not be used if you need persistence, because Skir allows fields to be renamed in record definitions. In other words, never store a readable JSON on disk or in a database.
Encoding rules
| Type | Encoded as | Examples |
|---|---|---|
| bool | true or false | true |
| int32 | A JSON number | 1234 |
int64 hash64 |
| 1234 "9007199254740992" |
float32 float64 |
| 1.23 "Infinity" |
| timestamp | An object with unix_millis and formatted fields | { "unix_millis": 1672531200000, "formatted": "2023-01-01T00:00:00Z" } |
| string | A JSON string | "Hello" |
| bytes | The string "hex:" followed by the hexadecimal representation | "hex:48656c6c6f" |
| T? | null if the value is missing, otherwise the serialized value. | null 123 |
| [T] | A JSON array | [1, 2, 3] |
| struct | A JSON object containing field names and values. Default values are omitted. | { "name": "John", "age": 30 } |
| enum |
| "RED" { "kind": "rgb", "value": "ff0000" } |
Binary format
This format is a bit more compact than JSON, and serialization/deserialization can be faster in languages like C++. Only prefer this format over JSON when the small performance gain is likely to matter, which should be rare.
Encoding rules
All numeric values are encoded using little-endian byte order.
| Type | Encoded as | Examples |
|---|---|---|
| bool | 1 for true, 0 for false | 0x01 0x00 |
| int32 |
| 10 -> 0x0a 255 -> 0xe8 0xff 0x00 -1 -> 0xeb 0xff |
| int64 |
| |
| hash64 |
| |
| float32 |
| 0.0 -> 0x00 1.5 -> 0xf0 00 00 c0 3f |
| float64 |
| 0.0 -> 0x00 |
| timestamp |
| |
| string |
| "Hi" -> 0xf3 0x02 0x48 0x69 |
| bytes |
| |
| T? |
| null -> 0xff val -> val_bytes |
| [T] |
| [1, 2] -> 0xf8 ... ... |
| struct | Same encoding as an array. The array index corresponds to the field number. Removed fields are represented as 0. Trailing default values are omitted. | |
| enum |
|
Deserialization
JSON flavors
When Skir deserializes JSON, it knows how to handle both dense and readable flavor. You do not need to specify which flavor is being used.
Handling of zeros
Both the dense JSON and binary formats use zeros to represent removed fields to save space. To preserve forward compatibility, zero is treated as a valid input for any type, even non-numerical ones.
With the exception of optional types (T?), all types will decode a zero value (integer 0) as the default value for that type. For example, a string decodes 0 as "", and an array decodes 0 as []. For optional types, 0 is decoded as the default value of the underlying type (e.g. string? decodes 0 as "", not null).