Mitigating Privacy Leakage via Data Correlation Attacks using ZKPs
Background and Motivation
As on-chain attestations increases in adoption, the risks of individual users exposing more information than they had originally intended to become greater. The main reason that this will happen is because users may not be able to appreciate how much information about themselves they are registering on-chain until it’s too late, i.e. until it is already on an immutable public ledger, at which point it is too late. The other reason that this may happen is that more information can be derived through correlation and triangulation of data points, than can be revealed by each individual datum in isolation. There have been a number of studies of such “data correlation attacks” outside of web3 (e.g. see this paper) but this has not been applied in practice to the areas of on-chain attestations due to the nascent nature of the space.
Overview
The general idea is to issue an attestation to an Ethereum address in such a way that the attestation subject can prove that they control the key to the address to which the attestation was issued, and that the issued attestation is based on a specific schema id issued from a specific issuer.
This will allow a user to prove that they have a number of attestations of specific types, issued by specific issuers, without revealing which specific attestations they are, or which addresses they have been issued to.
In order to do this, we will compile a ZKP circuit that conforms to a specific standard, as described:
- The circuit will accept as private inputs:
- Attestations array: an 2D array of attestations, each attestation is also an array that corresponds to the metadata fields of that attestation
- Issuers signatures array: an array of signatures from the issuers of the respective attestations, verified against the address in the
attester
field of the attestation metadata - Subject signatures array: an array of signatures corresponding to the addresses the attestations were issued
- The circuit will accept as public inputs:
- The current date: this can be any arbitrary date, but it is assumed that the date will be verified as part of the zk-proof verification process
- Issuers public keys array: an array of public keys corresponding to the issuers signatures array.
- The circuit will have the following public outputs:
- An array of schema ids corresponding to each attestation
- The circuit will iterate over the attestations array, and on every iteration the circuit will:
- hash the value of each metadata field in the attestation and create a merkle tree
- verify the signature in the corresponding element of the issuers signature array against the merkle root of attestation metadata it just created, and the given public key retrieved from the corresponding element of the
issuers public keys array
- verify the signature in the corresponding element of the subject signatures array against the merkle root of the attestation data and the address in the
subject
field of the attestation metadata - verify that the value of the
attestedDate
field is less than the value of the current date input - add the
schemaId
field from the attestation metadata to the public outputs of circuit
The values the attestation metadata include:
- schemaId
- attester
- attestedDate
- expirationDate
- subject
- attestationData
All inputs are 32 bytes.
Discussion
We don’t need all the attestation metadata. Fields such as revoked
, replacedBy
, revocationDate
, version
, and portal
are not strictly necessary for the proof, as these serve no purpose. For example, the revoked
field is not useful, because the user can simply use the attestation data of the attestation before the attestation was revoked, and the circuit will have no awareness that there was any subsequent revocation.
This exposes one of the limitations of this mechanism, in that it cannot prove that any attestations have not been revoked.
The attestationData
field will need to be a hash of the attestation data, not the actual attestation payload. This hash can be used for specific applications, whereby the hash of the attestation data can be a merkle root of some merklized attestation payload, which can be used to perform selective disclosure. This is just for the circuit input, the attestation payload can still anything, but it just needs to be hashed before being input to the circuit.
The issuer’s signature needs to be stored with the attestation, in order for this mechanism to work. To this end, we propose that the specification will require that the attestation schema contains an issuerSignature
field. This will allow a client application to be able to know where to retrieve the signatures needed to create the proof.
This mechanism is predicated upon the requirement that the subject
field of the attestation is the raw public key of the issuer, NOT an EVM wallet address, as verifying an EVM wallet address will require working with keccak-256 hashes, which circom does not natively support. This can potentially be implemented in a future iteration, once we are confident that it won’t adversely performance to an unacceptable extent.
The verifying contract will need to convert the issuer public key to an EVM wallet address and verify that the issuer is a trusted / expected issuer. They will also need to manually verify the date in the public input against the current date.
User Flow
From a user’s perspective, what this will look like is an application that will allow the user to search and and select a number of attestations. Once a number of attestations have been selected the user can create a proof of ownership. The application will request a signature for each selected attestation from the user. Depending on the application’s design, these signatures can be created and cached locally at some prior point. The user can then supply the proof to some on-chain contract of a dapp, or potentially even as another attestation.
Example Circuit
See below for a quick sketch of what a ZKP circuit would look like. This example is written in Circom. Note that this is a very quick sketch and likely contains some errors, but hopefully it’s enough to illustrate the main idea. The code below uses sha256 and ECDSA, which while technically possible, may result in a long proof generation times, and it’s worth benchmarking against a circuit using MiMC or Poseidon and EdDSA.
pragma circom 2.0.0;
include "hashes/sha256.circom";
include "ecdsa.circom";
template AttestationVerifier() {
signal input attestations[NUM_ATTESTATIONS][NUM_FIELDS];
signal input issuerSignatures[NUM_ATTESTATIONS][2];
signal input subjectSignatures[NUM_ATTESTATIONS][2];
signal input currentDate;
signal input issuerPublicKeys[NUM_ATTESTATIONS][2];
signal output schemaIds[NUM_ATTESTATIONS];
component merkleTree[NUM_ATTESTATIONS];
component issuerSigVerify[NUM_ATTESTATIONS];
component subjectSigVerify[NUM_ATTESTATIONS];
for (var i = 0; i < NUM_ATTESTATIONS; i++) {
// Hash the metadata fields and create Merkle Tree
component hash[NUM_FIELDS];
for (var j = 0; j < NUM_FIELDS; j++) {
hash[j] = Sha256();
hash[j].left = attestations[i][j];
hash[j].right = 0; // Padding with 0 for simplicity
}
// Combine hashes to create Merkle root
merkleTree[i] = Sha256();
merkleTree[i].left = hash[0].out;
for (var k = 1; k < NUM_FIELDS; k++) {
merkleTree[i].right = hash[k].out;
if (k < NUM_FIELDS - 1) {
merkleTree[i] = Sha256();
merkleTree[i].left = merkleTree[i].out;
}
}
// Verify issuer signature
issuerSigVerify[i] = EcdsaVerify();
issuerSigVerify[i].sigR = issuerSignatures[i][0];
issuerSigVerify[i].sigS = issuerSignatures[i][1];
issuerSigVerify[i].msg = merkleTree[i].out;
issuerSigVerify[i].Q = issuerPublicKeys[i];
// Verify subject signature
subjectSigVerify[i] = EcdsaVerify();
subjectSigVerify[i].sigR = subjectSignatures[i][0];
subjectSigVerify[i].sigS = subjectSignatures[i][1];
subjectSigVerify[i].msg = merkleTree[i].out;
subjectSigVerify[i].Q[0] = attestations[i][4]; // Subject address is the 5th field
subjectSigVerify[i].Q[1] = 0; // Assuming Q is a point on curve, need to derive full point
// Date verification
signal isValidDate;
isValidDate <== (attestations[i][2] < currentDate); // attestedDate is the 3rd field
// Output schema ID
schemaIds[i] <== attestations[i][0]; // schemaId is the 1st field
}
}
component main = AttestationVerifier();
N.B: I would have liked to have fleshed this proposal out a lot more and post something a bit more developed, but I’m going with pace-over-perfection for this RFC to just get the conversation started and get people’s thoughts and also gauge interest. All feedback is very much appreciated!
Another important note: the proposal above is very much Verax-centric, but it would be MUCH more beneficial to able to adapt the proposal to a standard that would work with multiple (or any) attestation registry.