On Encodings

Status: Proposal

A perhaps curious feature of the Willow data model is that its specifications rarely talk about encodings. We11Let’s be honest: Aljoscha strongly believe that specifications should concern themselves with purely logical data types as long as possible, treating encodings as a minor and ultimately interchangeable detail. When specifications define concepts in terms of their encodings, results usually end up miserably underspecified (see JSON) or full of incidental complexity (see XML).

Nevertheless, protocols that deal with persistent storage and network transmissions eventually have to serialise data. In this document, we give both some generic definitions around arbitrary encodings, and some specific encodings that recur throughout the Willow family of specifications.

Encodings, In Detail

The Willow protocols are generic over user-supplied parameters. Invariably, some values of the user-supplied types need to be encoded, so there also have to be user-supplied definitions of how to encode things. But not every function that turns values into byte strings defines an encoding.

Hence, we formally define what we mean by encodings. We first give a succinct, rather mathematical definition, followed by a more accessible English explanation.

An encoding function encode_s for a set S is a function from S to the set of bytestrings, such that there exists a function decode_s such that:

In plain language:

Specific Encodings

In this section, we propose some specific encodings for several datatypes of Willow.

When designing encodings, there rarely exists a single best solution. Different scenarios might warrant different trade-offs between encoding and decoding time, encoding size, or simplicity. Furthermore, additional context can often enable more efficient encodings than general-purpose solutions. For example, a database that groups Entries by namespace_id would not need to encode the namespace_id of every individual Entry.

For these reasons, we encourage you to take the following suggested encodings with a grain of salt, and to look for contextual information that you could leverage to obtain even more efficient encodings for your specific use cases.

To encode small numbers with fewer bytes than large numbers, we define for any U64 n the value compact_width(n) as

Data Model Encodings

When encoding Paths, we want to use fixed-width unsigned integers of minimal width. Hence, we define:

path_length_power is the least natural number such that We can represent the length of any Component in path_length_power bytes. UPathLengthPower denotes the type of numbers between zero (inclusive) and (exclusive).

path_count_power is the least natural number such that We can represent the number of Components of any Path in path_count_power bytes. UPathCountPower denotes the type of numbers between zero (inclusive) and (exclusive).

encode_path

To encode a Path p, we define encode_path(p) as the concatenation of

encode_entry

To encode an Entry e, let encode_namespace_id be an encoding function for NamespaceId, let encode_subspace_id be an encoding function for SubspaceId, and let encode_payload_digest be an encoding function for PayloadDigest. We then define encode_entry(e) as the concatenation of:

encode_namespace_id(e.namespace_id)
encode_subspace_id(e.subspace_id)
encode_path(e.path)
e.timestamp, encoded as a big-endian U64
e.payload_length, encoded as a big-endian U64
encode_payload_digest(e.payload_digest)

Relativity

When encoding Willow objects, we can often achieve smaller encoding sizes by encoding how some object differs from another. In this section, we define several such relative encodings.

In all subsequent definitions, whenever the value open is part of a numeric computation or comparison, it should be treated as a very large number, say, The definitions all ensure that the resulting values never have to be encoded.

encode_path_relative

To encode a Path p relative to a reference Path ref, we define encode_path_relative(p, ref) as the concatenation of:

The length of the longest common prefix of p and ref, encoded as a UPathCountPower.
encode_path of the Path obtained by removing that longest common prefix from p.

encode_entry_relative_entry

To encode an Entry e relative to a reference Entry ref, we first define time_diff as the absolute value of e.timestamp - ref.timestamp. We then define encode_entry_relative_entry(e, ref) as the concatenation of:

BitBig-endian bitfield
0Encode e.namespace_id?1 iff e.namespace_id != ref.namespace_id
1Encode e.subspace_id?1 iff e.subspace_id != ref.subspace_id
2Add or subtract time_diff from ref.timestamp?1 iff e.timestamp - ref.timestamp > 0
3always 0
4, 52-bit integer n such that 2^n gives compact_width(time_diff)
Bit 4 is 1 iff compact_width(time_diff) is 4 or 8.
Bit 5 is 1 iff compact_width(time_diff) is 2 or 8.
6, 72-bit integer n such that 2^n gives compact_width(e.payload_length)
Bit 6 is 1 iff compact_width(e.payload_length) is 4 or 8.
Bit 7 is 1 iff compact_width(e.payload_length) is 2 or 8.
encode_namespace_id(e.namespace_id), or the empty string, if e.namespace_id == ref.namespace_id
encode_subspace_id(e.subspace_id), or the empty string, if e.subspace_id == ref.subspace_id
encode_path_relative(e.path, ref.path)
time_diff, encoded as an unsigned, big-endian compact_width(time_diff)-byte integer
e.payload_length, encoded as an unsigned, big-endian compact_width(e.payload_length)-byte integer
encode_payload_digest(e.payload_digest)

encode_entry_in_namespace_area

To encode an Entry e that is included in an outer Area out in a namespace of NamespaceId namespace_id, we first define time_diff as the minimum of e.timestamp - out.times.start and out.times.end - e.timestamp. We then define encode_entry_in_namespace_area(e, out, namespace_id) as the concatenation of:

BitBig-endian bitfield
0Encode e.subspace_id?1 iff out.subspace_id == any
1Add time_diff to out.times.start, or subtract from out.times.end?1 iff e.timestamp - out.times.start <= out.times.end - e.timestamp
2, 32-bit integer n such that 2^n gives compact_width(time_diff)
Bit 2 is 1 iff compact_width(time_diff) is 4 or 8.
Bit 3 is 1 iff compact_width(time_diff) is 2 or 8.
4, 52-bit integer n such that 2^n gives compact_width(e.payload_length)
Bit 4 is 1 iff compact_width(e.payload_length) is 4 or 8.
Bit 5 is 1 iff compact_width(e.payload_length) is 2 or 8.
6, 7always 0
encode_subspace_id(e.subspace_id), or the empty string, if out.subspace_id != any
encode_path_relative(e.path, out.path)
time_diff, encoded as an unsigned, big-endian compact_width(time_diff)-byte integer
e.payload_length, encoded as an unsigned, big-endian compact_width(e.payload_length)-byte integer
encode_payload_digest(e.payload_digest)

encode_entry_in_namespace_3drange

To encode an Entry e that is included in a 3dRange out in a namespace of NamespaceId namespace_id, we first define time_diff as the minimum absolute value of e.timestamp - out.times.start and e.timestamp - out.times.end. We then define encode_entry_in_namespace_3drange(e, out, namespace_id) as the concatenation of:

BitBig-endian bitfield
0Encode e.subspace_id?1 iff e.subspace_id == out.subspaces.start
1Encode e.path relative to out.paths.start or to out.paths.end?1 iff the longest common prefix of e.path and out.paths.start is at least as long as the longest common prefix of e.path and out.paths.end
2Combine time_diff with out.times.start, or with out.times.end?1 iff time_diff == abs(e.timestamp - out.times.start)
3Add or subtract time_diff?1 iff bit 2 is 1 and e.timestamp >= out.times.start, or bit 2 is 0 and e.timestamp <= out.times.end.
4, 52-bit integer n such that 2^n gives compact_width(time_diff)
Bit 4 is 1 iff compact_width(time_diff) is 4 or 8.
Bit 5 is 1 iff compact_width(time_diff) is 2 or 8.
6, 72-bit integer n such that 2^n gives compact_width(e.payload_length)
Bit 6 is 1 iff compact_width(e.payload_length) is 4 or 8.
Bit 7 is 1 iff compact_width(e.payload_length) is 2 or 8.
encode_subspace_id(e.subspace_id), or the empty string, if e.subspace_id == out.subspaces.start
encode_path_relative(e.path, out.paths.start) if the longest common prefix of e.path and out.paths.start is at least as long as the longest common prefix of e.path and out.paths.end, otherwise encode_path_relative(e.path, out.paths.end)
time_diff, encoded as an unsigned, big-endian compact_width(time_diff)-byte integer
e.payload_length, encoded as an unsigned, big-endian compact_width(e.payload_length)-byte integer
encode_payload_digest(e.payload_digest)

encode_area_in_area

To encode an Area a that is included in an outer Area out, we first define start_diff as the minimum of a.times.start - out.times.start and out.times.end - a.times.start, and we define end_diff as the minimum of a.times.end - out.times.start and out.times.end - a.times.end. We then define encode_area_in_area(a, out) as the concatenation of:

BitBig-endian bitfield
0Encode a.subspace_id?1 iff a.subspace_id != out.subspace_id
1Encode end value of a.times?1 iff a.times.end == open
2Add start_diff to out.times.start, or subtract from out.times.end?1 iff start_diff == a.times.start - out.times.start
3Add end_diff to out.times.start, or subtract from out.times.end?1 iff end_diff == a.times.end - a.times.start
4, 52-bit integer n such that 2^n gives compact_width(start_diff)
Bit 4 is 1 iff compact_width(start_diff) is 4 or 8.
Bit 5 is 1 iff compact_width(start_diff) is 2 or 8.
6, 72-bit integer n such that 2^n gives compact_width(end_diff)
Bit 6 is 1 iff compact_width(end_diff) is 4 or 8.
Bit 7 is 1 iff compact_width(end_diff) is 2 or 8.
encode_subspace_id(a.subspace_id), or the empty string, if a.subspace_id == out.subspace_id
encode_path_relative(a.path, out.path)
start_diff, encoded as an unsigned, big-endian compact_width(start_diff)-byte integer
end_diff, encoded as an unsigned, big-endian compact_width(end_diff)-byte integer, or the empty string, if a.times.end == open

encode_3drange_relative_3drange

To encode a 3dRange r relative to a reference 3dRange ref, we first define

We then define encode_3drange_relative_3drange(r, ref) as the concatenation of:
BitBig-endian bitfield
0, 1Encode r.subspaces.start?
11 otherwise.
2, 3Encode r.subspaces.end?
00 if r.subspaces.end == open, and else
10 otherwise.
4Encode r.paths.start relative to ref.paths.start or to ref.paths.end?1 iff the longest common prefix of r.paths.start and ref.paths.start is at least as long as the longest common prefix of r.paths.start and ref.paths.end
51 iff r.paths.end == open
6Encode r.paths.end relative to ref.paths.start or to ref.paths.end (if at all)?
0 if r.paths.end == open, otherwise
1 iff the longest common prefix of r.paths.end and ref.paths.start is at least as long as the longest common prefix of r.paths.end and ref.paths.end
71 iff r.times.end == open
8Encode r.times.start relative to ref.times.start or ref.times.end?1 iff start_to_start <= start_to_end
9Add or subtract start_time_diff?1 iff bit 8 is 1 and r.times.start >= ref.times.start, or bit 8 is 0 and r.times.start >= ref.times.end.
10, 112-bit integer n such that 2^n gives compact_width(start_time_diff)
Bit 10 is 1 iff compact_width(start_time_diff) is 4 or 8.
Bit 11 is 1 iff compact_width(start_time_diff) is 2 or 8.
12Encode r.times.end relative to ref.times.start or ref.times.end (if at all)?
0 if r.times.end == open, otherwise
13Add or subtract end_time_diff (if encoding it at all)?
0 if r.times.end == open, otherwise
1 iff bit 12 is 1 and r.times.end >= ref.times.start, or bit 12 is 0 and r.times.end >= ref.times.end.
14, 15ignored, or 2-bit integer n such that 2^n gives compact_width(end_time_diff)
00 if r.times.end == open, otherwise:
Bit 14 is 1 iff compact_width(end_time_diff) is 4 or 8.
Bit 15 is 1 iff compact_width(end_time_diff) is 2 or 8.
encode_subspace_id(r.subspaces.start), or the empty string if r.subspaces.start == ref.subspaces.start or r.subspaces.start == ref.subspaces.end
encode_subspace_id(r.subspaces.end), or the empty string if r.subspaces.end == open, r.subspaces.end == ref.subspaces.start or r.subspaces.end == ref.subspaces.end
encode_path_relative(r.paths.start, ref.paths.start) if the longest common prefix of r.paths.start and ref.paths.start is at least as long as the longest common prefix of r.paths.start and ref.paths.end, otherwise encode_path_relative(r.paths.start, ref.paths.end)
the empty string if r.paths.end == open, otherwise:
encode_path_relative(r.paths.end, ref.paths.start) if the longest common prefix of r.paths.end and ref.paths.start is at least as long as the longest common prefix of r.paths.end and ref.paths.end, otherwise encode_path_relative(r.paths.end, ref.paths.end)
start_time_diff, encoded as an unsigned, big-endian compact_width(start_time_diff)-byte integer
end_time_diff, encoded as an unsigned, big-endian compact_width(end_time_diff)-byte integer, or the empty string, if end_time_diff.times.end == open

Capabilities

Encodings for Meadowcap and McSubspaceCapabilities.

encode_subspace_capability

To encode a McSubspaceCapability c, we define encode_mc_subspace_capability(c) as the concatenation of:

encode_namespace_pk(c.namespace_key)
encode_user_pk(c.user_key)
encode_namespace_sig(c.initial_authorisation)
for each pair (pk, sig) in c.delegations the concatenation of:

encode_mc_capability

To encode a McCapability c whose granted area is know to be included in some22If no smaller containing Area is known from context, use the full area. Area out, we define encode_mc_capability(c) depending on whether c.inner is a CommunalCapability or an OwnedCapability.

If c.inner is a CommunalCapability, then encode_mc_capability(c) is the concatenation of:

byte 0x00, if c.inner.access_mode == read,
byte 0x01 otherwise.
encode_namespace_pk(c.inner.namespace_key)
encode_user_pk(c.inner.user_key)
for each triplet (area, pk, sig) in c.inner.delegations the concatenation of:

If c.inner is an OwnedCapability, then encode_mc_capability(c) is the concatenation of:

byte 0x02, if c.inner.access_mode == read,
byte 0x03 otherwise.
encode_namespace_pk(c.inner.namespace_key)
encode_user_pk(c.inner.user_key)
encode_namespace_sig(c.inner.initial_authorisation)
for each triplet (area, pk, sig) in c.inner.delegations the concatenation of: