Serialization

Skir defines a standard for serializing and deserializing data types to JSON and binary. The generated data classes implement this standard to ensure that data structures defined in your schema can be encoded and decoded consistently across all languages.

Serialization formats

When serializing a data structure, you can choose one of 3 formats:

FormatPersistableSpace efficiencyReadabilityNotes
JSON (Dense)Yes: safeHighLowDefault choice. Safe for persistence and offers a good balance between performance and debuggability.
JSON (Readable)No: unsafeLowHighGood for debugging. Do not use for persistence: schema evolution (e.g. renaming fields) will break compatibility with old data.
BinaryYes: safeVery HighNoneMost compact, fastest in languages like C++.

JSON, dense flavor

This is the serialization format you should choose in most cases.

Structs are serialized as JSON arrays, where the field numbers in the index definition match the indexes in the array. Enum constants are serialized as numbers.

Skir
struct User {
  user_id: int32;
  removed;
  name: string;
  rest_day: Weekday;
  pets: [Pet];
  nickname: string;
}

const JOHN_DOE: User = {
  user_id = 400,
  name = "John Doe",
  rest_day = "SUNDAY",
  pets = [
    { name = "Fluffy" },
    { name = "Fido" },
  ],
  nickname = "",
}

The dense JSON representation of JOHN_DOE is:

json
[400,0,"John Doe",7,[["Fluffy"],["Fido"]]]

A couple observations:

  • Removed fields are replaced with zeros
  • Trailing fields with default values (nickname in this example) are omitted

This format is not very readable, but it's compact and it allows you to rename fields in your struct definition without breaking backward compatibility.

Encoding rules

TypeEncoded asExamples
bool1 for true, 0 for false
1
int32A JSON number1234
int64
hash64
  • If the value is within the safe integer range for JavaScript (±9,007,199,254,740,991), it is serialized as a JSON number.
  • Otherwise, it is serialized as a string.
1234
"9007199254740992"
float32
float64
  • Finite numbers are serialized as JSON numbers.
  • NaN, Infinity, and -Infinity are serialized as strings.
1.23
"Infinity"
timestampA JSON number representing milliseconds since the Unix epoch1672531200000
stringA JSON string"Hello"
bytesA Base64 string"SGVsbG8="
T?null if the value is missing, otherwise the serialized value.
null
123
[T]A JSON array[1, 2, 3]
structA JSON array. The array index corresponds to the field number. Removed fields are represented as 0. Trailing default values are omitted.[400, 0, "John"]
enum
  • Constant variants are serialized as integers.
  • Wrapper variants are serialized as arrays with two elements: the variant number, the value.
1
[2, "value"]

JSON, readable flavor

Structs are serialized as JSON objects, and enum constants are serialized as strings.

The readable JSON representation of JOHN_DOE is:

json
{
  "user_id": 400,
  "name": "John Doe",
  "rest_day": "SUNDAY",
  "pets": [
    { "name": "Fluffy" },
    { "name": "Fido" }
  ]
}

This format is more verbose and readable, but it should not be used if you need persistence, because Skir allows fields to be renamed in record definitions. In other words, never store a readable JSON on disk or in a database.

Encoding rules

TypeEncoded asExamples
booltrue or false
true
int32A JSON number1234
int64
hash64
  • If the value is within the safe integer range for JavaScript (±9,007,199,254,740,991), it is serialized as a JSON number.
  • Otherwise, it is serialized as a string.
1234
"9007199254740992"
float32
float64
  • Finite numbers are serialized as JSON numbers.
  • NaN, Infinity, and -Infinity are serialized as strings.
1.23
"Infinity"
timestampAn object with unix_millis and formatted fields{ "unix_millis": 1672531200000, "formatted": "2023-01-01T00:00:00Z" }
stringA JSON string"Hello"
bytesThe string "hex:" followed by the hexadecimal representation"hex:48656c6c6f"
T?null if the value is missing, otherwise the serialized value.
null
123
[T]A JSON array[1, 2, 3]
structA JSON object containing field names and values. Default values are omitted.{ "name": "John", "age": 30 }
enum
  • Constant variants are serialized as strings.
  • Wrapper variants are serialized as objects with kind and value fields.
"RED"
{ "kind": "rgb", "value": "ff0000" }

Binary format

This format is a bit more compact than JSON, and serialization/deserialization can be faster in languages like C++. Only prefer this format over JSON when the small performance gain is likely to matter, which should be rare.

Encoding rules

All numeric values are encoded using little-endian byte order.

TypeEncoded asExamples
bool1 for true, 0 for false
0x01
0x00
int32
  • 0-231: single byte val
  • 232-65535: 0xe8 followed by uint16(val)
  • ≥ 65536: 0xe9 followed by uint32(val)
  • -256 to -1: 0xeb followed by uint8(val + 256)
  • -65536 to -257: 0xec followed by uint16(val + 65536)
  • ≤ -65537: 0xed followed by int32(val)
10 -> 0x0a
255 -> 0xe8 0xff 0x00
-1 -> 0xeb 0xff
int64
  • If the value fits in a 32-bit signed integer, uses the int32 encoding.
  • Otherwise: marker 0xee followed by 8 bytes (int64).
hash64
  • If the value fits in a 32-bit unsigned integer, uses the int32 encoding.
  • Otherwise: marker 0xea followed by 8 bytes (uint64).
float32
  • 0 is encoded as a single byte 0x00.
  • Otherwise: marker 0xf0 followed by 4 bytes (IEEE 754, little endian).
0.0 -> 0x00
1.5 -> 0xf0 00 00 c0 3f
float64
  • 0 is encoded as a single byte 0x00.
  • Otherwise: marker 0xf1 followed by 8 bytes (IEEE 754, little endian).
0.0 -> 0x00
timestamp
  • 0 (Epoch) is encoded as a single byte 0x00.
  • Otherwise: marker 0xef followed by 8 bytes (int64 millis).
string
  • Empty string: 0xf2.
  • Non-empty: marker 0xf3, followed by length (encoded as a number), followed by UTF-8 bytes.
"Hi" -> 0xf3 0x02 0x48 0x69
bytes
  • Empty: 0xf4.
  • Non-empty: marker 0xf5, followed by length (encoded as a number), followed by raw bytes.
T?
  • null is encoded as 0xff.
  • Otherwise, the value is encoded directly.
null -> 0xff
val -> val_bytes
[T]
  • Length 0-3: markers 0xf6-0xf9.
  • Length > 3: marker 0xfa followed by length (encoded as a number).
  • Then items are written sequentially.
[1, 2] -> 0xf8 ... ...
structSame encoding as an array. The array index corresponds to the field number. Removed fields are represented as 0. Trailing default values are omitted.
enum
  • Constant variant: encoded as a number (the variant number).
  • Wrapper variant: markers 0xfb-0xfe (for variant numbers 1-4) or 0xf8 followed by the variant number. Then the value.

Deserialization

JSON flavors

When Skir deserializes JSON, it knows how to handle both dense and readable flavor. You do not need to specify which flavor is being used.

Handling of zeros

Both the dense JSON and binary formats use zeros to represent removed fields to save space. To preserve forward compatibility, zero is treated as a valid input for any type, even non-numerical ones.

With the exception of optional types (T?), all types will decode a zero value (integer 0) as the default value for that type. For example, a string decodes 0 as "", and an array decodes 0 as []. For optional types, 0 is decoded as the default value of the underlying type (e.g. string? decodes 0 as "", not null).