Private Area Intersection
In order to synchronise data, peers must inform each other about which data they are interested in. If done openly, this would let peers learn about details such as NamespaceIds, SubspaceIds, or Paths that they have no business knowing about. In this document, we describe a technique that does not leak this information.
Setting and Goals
We consider the setting where two peers wish to synchronise some data that is subject to read access control via capabilities. More precisely, they want to specify pairs of namespaces and AreaOfInterests, and then synchronise the intersections.
The simplemost solution consists in the peers openly exchanging read capabilities and then specifying their AreaOfInterests, which must be fully included in the granted areas of the read capabilities. This works well for managing read access control and determining which Entries to synchronise, but it leaks some potentially sensitive information. Two examples:
First, suppose that Alfie creates an Entry at Path gemma_stinks, and gives a read capability for this Path to Betty. Later, Betty connects to Gemma's machine for syncing, and asks for gemma_stinks in Alfie’s subspace. In sending her read capability, she hands a signed proof to Gemma that Alfie thinks1,21Gemma does not, in fact, stink.2Also, Alfie is really very nice and would never say such a thing outside of thought experiments for demonstrating the dangers of leaking Paths. she stinks. Not good.
Second, suppose a scenario where everyone uses encrypted paths, with individual encryption keys per subspace. Alfie synchronises with Betty, asking her for random-looking Paths of the same structure in ten different subspaces. Betty has the decryption keys for all but one of the subspaces. All the paths she can decrypt happen to decrypt to gemma_stinks. This gives Betty a strong idea about what the tenth person thinks of Gemma, despite the fact that Betty cannot decrypt the Path. Not good.
Ideally, we would like to employ a mechanism where peers cannot learn any information beyond the granted areas of the read capabilities which they hold at the start of the process. Unfortunately, such a mechanism would have to involve privacy-preserving verification of cryptographic signatures, and we are not aware of any suitable cryptographic primitives.
We can, however, design solutions which do not allow peers to learn about the existence of any NamespaceId, SubspaceId, or Path which they did not know about already. If, for example, both peers knew about a certain namespace, they should both get to know that the other peer also knows about that namespace. But for a namespace which only one of the peers knows about, the other peer should not learn its NamespaceId.
Such solutions cannot prevent peers from confirming guesses about data they shouldn't know about. Hence, it is important that NamespaceIds and SubspaceIds are sufficiently long and random-looking. Similarly, encrypting Components with different encryption keys for different subspaces can ensure that Paths are unguessable. Because valid TimestampsFinding efficient encryption schemes and privacy-preserving synchronisation techniques that work for Timestamps is an interesting research endeavour, but out of scope for us. can easily be guessed, we do not try to hide information about them.
We present our solution in three stages. First, we show how to privately test two items for equality, then we generalise to privately intersecting two sets, and then we reduce our problem of intersecting namespaces and AreaOfInterests to that of intersecting sets.
Private Equality Testing
We start by considering private equality testing: two peers — Alfie and Betty — who hold a single item each wish to determine whether they hold the same item, without revealing any information about their item in case of inequality. Before giving the precise mathematical formulation, we describe the solution by way of analogy.
Imagine the items were colours. Assume colours can easily be mixed with other colours, but unmixing a given colour into its components is impossible. The following procedure then solves the problem:
- Alfie and Betty each start with a data colour data_A and data_B respectively.
- Alfie and Betty each randomly select a secret colour secret_A and secret_B respectively.
- They each mix their data colour with their secret colour and send the result to the other person (
mix(data_A, secret_A)
andmix(data_B, secret_B)
). - Upon receiving a mixture, they mix their own secret into it, remember the result and also send it to the other person (
mix(mix(data_B, secret_B), secret_A)
andmix(mix(data_A, secret_A), secret_B)
).
If both peers receive the same colour they remembered, then they started with the same data colour, and otherwise they did not. Because unmixing colours is impossible and mixing with a randomly chosen secret colour essentially yields a new random-looking colour, the peers cannot learn anything about each other’s colours in case of inequality33Neither can any eavesdropper learn about the data colours. The procedure is highly related to a Diffie–Hellman key exchange for that reason, and we have borrowed the colour metaphor from its wikipedia page..
Note that the colour analogy is not fully accurate: data colours correspond to group members but secret colours correspond to scalars, which are of a different type than group members.Leaving the world of analogy, the actual cryptographic primitives we use are finite cyclic groups — such as the X25519 elliptic curve — equipped with a way of serialising group members for transport and with a way of generating pseudo-random group members from the items to test for equality.
Do not worry if the mathy description here does not fully make sense to you. We give it for completeness’ sake, but you can grasp the underlying concepts without being familiar with groups and cryptography.Let Items be the set of items for which we want to be able to privately test for equality. Let G be a finite cyclic group with a well-known generator g and group operation *, and let item_to_group be a hash function from Items into G.
Now, let Alfie be a peer that holds some item and let Betty be a peer that holds some item Define and
To privately test for equality of and Alfie and Betty each randomly select scalars (natural numbers) and respectively. Alfie then transmits ( times) and Betty transmits
After receiving these messages, Alfie answers with They can compute these because and Betty answers with
If G was chosen so that accidental (or maliciously crafted) collisions are unlikely (or infeasible), then if and only if Because
Private Set Intersection
We can generalise the equality test to computing set intersection by essentially sending the same information but for multiple items at once.This technique for private set intersection is due to Huberman, Franklin, and Hogg. We return to the analogy of colours again, before giving the mathematically precise formulation.
Suppose Alfie and Betty start with sets of data colours. They independently (and arbitrarily) number their data colours as data_A_0, data_A_1, ... and data_B_0, data_B_1, ...` respectively.
Alfie and Betty still choose only a single random secret (secret_A and secret_B respectively), and they send the results of mixing each of their data colours with their secret colour individually ({0: mix(data_A_0, secret_A), mix(data_A_1, secret_A), ...}
and {0: mix(data_B_0, secret_B), mix(data_B_1, secret_B), ...}
).
For each numbered colour mix they receive, they reply by adding their own secret, keeping the numbering identical.
Any colour that occurs both in the final set of colours they sent and in the final set of colours they received corresponds to a shared data colour, and the numbering tells each of them which of the original colours are shared. But for any other colour, they cannot reconstruct the corresponding original data colour of the other peer.
In the formal setting, let Alfie and Betty hold sequences of Items and that hash to sequences of group members and respectively, and let them choose random scalars and again.
Alfie then transmits and Betty transmits
After receiving these messages, Alfie answers with and Betty answers with
For all such that Alfie learns that item is in the intersection, and Betty learns that item is in the intersection.
Dynamic Sets
The algorithm as described so far requires Alfie and Betty to fully know their sets in advance. For the WGPS, we want to allow for dynamically changing sets — both because peers might learn about new namespaces dynamically, and because they might not have enough resources to store group members for the full sets in memory at the same time.
We can overcome this limitation with a small change: rather than sending monolithic messages containing lists of group members, we send individual group members together with small numeric identifiers. These identifiers can be used to map responses to the original group members. In particular, we use resource handles for this purpose.
Private Syncing
We now have the necessary tools to describe how two peers can exchange read capabilities for their sync interests in a privacy-preserving manner. To recapitulate, we consider two peers — Alfie and Betty — who each hold a set of read capabilities. They wish to determine the intersections of their granted areas without leaking any NamespaceIds, SubspaceIds or Paths that are not covered by the other peer’s read capabilities. We now reduce this problem to that of private set intersection.
We have to introduce a bit of terminology first. Trust us that it will be useful, and also trust us that all attempts to avoid these definitions resulted in unreadable messes.A read capability is called complete if the subspace_id of its granted area is any, and it is called selective otherwise.
The fragments of a complete read capability of granted area area and granted namespace namespace are the pairs (namespace, pre)
, such that pre is a prefixThe prefixes of foobar are the empty Path, foo, and foobar itself. of area.path.
The fragments of a selective read capability of granted area area and granted namespace namespace are the pairs (namespace, pre)
and the triplets (namespace, area.subspace_id, pre)
, such that pre is a prefix of area.path. The pairs are called secondary fragments, all other fragments (including those of complete read capabilities) are called primary fragments.
A fragment whose Path is the empty Path is called a least-specific fragment. A fragment whose Path is the path of the granted area of its originating read capability is called a most-specific fragment.
To privately exchange read capabilities, Alfie and Betty perform private set intersection with the sets of fragments of all their read capabilities. Additionally, they transmit for each group member they send whether it corresponds to a primary or secondary fragment. The peers can then detect nonempty intersections between their read capabilities by checking whether their most-specific fragments are in the intersection. More precisely, we need to consider three cases:
If the most-specific fragment of a peer’s complete read capability is in the intersection, then the peer can safely send (and authenticate) the read capability without leaking any information. Together with the read capability, the peer should also transmit the fragment. The other peer can then safely reply with all its read capabilities whose fragments include the transmitted fragment.
The same holds when the primary most-specific fragment of a peer’s selective read capability is in the intersection.
Things are more complicated, however, when the secondary most-specific fragment of a peer’s selective read capability is in the intersection, but the corresponding fragment of the other peer is a primary fragment44If both peers’ fragments were secondary, but their corresponding primary fragments were not in the intersection, then the read capabilities simply would not overlap — the peers would request equal Paths in distinct subspaces.. To better understand this case, consider an example:
Suppose, in some namespace, Alfie is interested in the Entries at arbitrary paths with subspace_id @gemmas_stuff
. Betty, in the same namespace, is interested in the Entries whose path is prefixed by chess, regardless of their subspace_id. Then Alfie’s secondary most-specific fragment is in the intersection, but his primary most-specific fragment is not (and neither is Betty’s most-specific fragment).
It might be tempting for Alfie to transmit his read capability, but unfortunately, Betty might have fabricated her fragments. In this case, Betty would learn about the existance of @gemmas_stuff
, violating our privacy objectives. Alfie could prompt Betty to present her read capability first, instead. But Betty then faces the same problem: Alfie could have fabricated his fragments, and he would illegitimately learn about the chess Path in that case.
To solve this standoff, we employ a second type of unforgeable token, that lets Betty prove that she has access to the full subspace at some Path, without specifying that Path explicitly. Alfie can request this token (by transmitting the least-specific secondary fragment of his read capability), Betty can then prove that she is indeed authorised to know about arbitrary SubspaceIds in this namespace, and Alfie can then send (and authenticate) his read capability, to which Betty replies with her own, proper read capability.
We call these unforgeable tokens subspace capabilities. Whenever a peer is granted a complete read capability of non-empty path, it should also be granted a corresponding subspace capability. Each subspace capability must have a single receiver (a public key of some signature scheme), and a single granted namespace (a NamespaceId). The receiver can authenticate itself by signing a collaboratively selected nonce.
Subspace Capabilities and Meadowcap
We conclude by presenting a datatype that implements subspace capabilities, nicely complementing Meadowcap. Note that in Meadowcap, read capabilities for all subspaces of a namespacecan only exist in owned namespaces.
A capability that certifies read access to arbitrary SubspaceIds at some unspecified Path.struct McSubspaceCapabilityThe namespace for which this grants access.The user to whom this grants access.
Authorisation of the user_key by the namespace_key.
Successive authorisations of new UserPublicKeys.
The receiver of a McSubspaceCapability is the user to whom it grants access. Formally, the receiver is the final UserPublicKey in the delegations, or the user_key if the delegations are empty.
The granted namespace of a McSubspaceCapability is the namespace for which it certifies access to all subspaces. Formally, the granted namespace of a McSubspaceCapability is its namespace_key.
Validity governs how McSubspaceCapabilities can be delegated. We define validity based on the number of delegations.
A McSubspaceCapability with zero delegations is valid if initial_authorisation is a NamespaceSignature issued by the namespace_key over the byte 0x02
, followed by the user_key (encoded via encode_user_pk).
For a McSubspaceCapabilities cap with more than zero delegations, let (new_user, new_signature)
be the final pair of cap.delegations, and let prev_cap be the McSubspaceCapability obtained by removing the last pair from cap.delegations. Denote the receiver of prev_cap as prev_receiver.
Then cap is valid if prev_cap is valid, and new_signature is a UserSignature issued by the prev_receiver over the bytestring handover, which is defined as follows:
- If prev_cap.delegations is empty, then handover is the concatenation of the following bytestrings:
- Otherwise, let prev_signature be the UserSignature in the last pair of prev_cap.delegations. Then handover is the concatenation of the following bytestrings: