RFC for Privacy Preserving On-Chain Attestations

Mitigating Privacy Leakage via Data Correlation Attacks using ZKPs

Background and Motivation

As on-chain attestations increases in adoption, the risks of individual users exposing more information than they had originally intended to become greater. The main reason that this will happen is because users may not be able to appreciate how much information about themselves they are registering on-chain until it’s too late, i.e. until it is already on an immutable public ledger, at which point it is too late. The other reason that this may happen is that more information can be derived through correlation and triangulation of data points, than can be revealed by each individual datum in isolation. There have been a number of studies of such “data correlation attacks” outside of web3 (e.g. see this paper) but this has not been applied in practice to the areas of on-chain attestations due to the nascent nature of the space.

Overview

The general idea is to issue an attestation to an Ethereum address in such a way that the attestation subject can prove that they control the key to the address to which the attestation was issued, and that the issued attestation is based on a specific schema id issued from a specific issuer.

This will allow a user to prove that they have a number of attestations of specific types, issued by specific issuers, without revealing which specific attestations they are, or which addresses they have been issued to.

In order to do this, we will compile a ZKP circuit that conforms to a specific standard, as described:

  • The circuit will accept as private inputs:
    • Attestations array: an 2D array of attestations, each attestation is also an array that corresponds to the metadata fields of that attestation
    • Issuers signatures array: an array of signatures from the issuers of the respective attestations, verified against the address in the attester field of the attestation metadata
    • Subject signatures array: an array of signatures corresponding to the addresses the attestations were issued
  • The circuit will accept as public inputs:
    • The current date: this can be any arbitrary date, but it is assumed that the date will be verified as part of the zk-proof verification process
    • Issuers public keys array: an array of public keys corresponding to the issuers signatures array.
  • The circuit will have the following public outputs:
    • An array of schema ids corresponding to each attestation
  • The circuit will iterate over the attestations array, and on every iteration the circuit will:
    • hash the value of each metadata field in the attestation and create a merkle tree
    • verify the signature in the corresponding element of the issuers signature array against the merkle root of attestation metadata it just created, and the given public key retrieved from the corresponding element of the issuers public keys array
    • verify the signature in the corresponding element of the subject signatures array against the merkle root of the attestation data and the address in the subject field of the attestation metadata
    • verify that the value of the attestedDate field is less than the value of the current date input
    • add the schemaId field from the attestation metadata to the public outputs of circuit

The values the attestation metadata include:

  • schemaId
  • attester
  • attestedDate
  • expirationDate
  • subject
  • attestationData

All inputs are 32 bytes.

Discussion

We don’t need all the attestation metadata. Fields such as revoked, replacedBy, revocationDate, version, and portal are not strictly necessary for the proof, as these serve no purpose. For example, the revoked field is not useful, because the user can simply use the attestation data of the attestation before the attestation was revoked, and the circuit will have no awareness that there was any subsequent revocation.

This exposes one of the limitations of this mechanism, in that it cannot prove that any attestations have not been revoked.

The attestationData field will need to be a hash of the attestation data, not the actual attestation payload. This hash can be used for specific applications, whereby the hash of the attestation data can be a merkle root of some merklized attestation payload, which can be used to perform selective disclosure. This is just for the circuit input, the attestation payload can still anything, but it just needs to be hashed before being input to the circuit.

The issuer’s signature needs to be stored with the attestation, in order for this mechanism to work. To this end, we propose that the specification will require that the attestation schema contains an issuerSignature field. This will allow a client application to be able to know where to retrieve the signatures needed to create the proof.

This mechanism is predicated upon the requirement that the subject field of the attestation is the raw public key of the issuer, NOT an EVM wallet address, as verifying an EVM wallet address will require working with keccak-256 hashes, which circom does not natively support. This can potentially be implemented in a future iteration, once we are confident that it won’t adversely performance to an unacceptable extent.

The verifying contract will need to convert the issuer public key to an EVM wallet address and verify that the issuer is a trusted / expected issuer. They will also need to manually verify the date in the public input against the current date.

User Flow

From a user’s perspective, what this will look like is an application that will allow the user to search and and select a number of attestations. Once a number of attestations have been selected the user can create a proof of ownership. The application will request a signature for each selected attestation from the user. Depending on the application’s design, these signatures can be created and cached locally at some prior point. The user can then supply the proof to some on-chain contract of a dapp, or potentially even as another attestation.

Example Circuit

See below for a quick sketch of what a ZKP circuit would look like. This example is written in Circom. Note that this is a very quick sketch and likely contains some errors, but hopefully it’s enough to illustrate the main idea. The code below uses sha256 and ECDSA, which while technically possible, may result in a long proof generation times, and it’s worth benchmarking against a circuit using MiMC or Poseidon and EdDSA.

pragma circom 2.0.0;

include "hashes/sha256.circom";
include "ecdsa.circom";

template AttestationVerifier() {
    signal input attestations[NUM_ATTESTATIONS][NUM_FIELDS];
    signal input issuerSignatures[NUM_ATTESTATIONS][2];
    signal input subjectSignatures[NUM_ATTESTATIONS][2];
    signal input currentDate;
    signal input issuerPublicKeys[NUM_ATTESTATIONS][2];
    signal output schemaIds[NUM_ATTESTATIONS];

    component merkleTree[NUM_ATTESTATIONS];
    component issuerSigVerify[NUM_ATTESTATIONS];
    component subjectSigVerify[NUM_ATTESTATIONS];

    for (var i = 0; i < NUM_ATTESTATIONS; i++) {
        // Hash the metadata fields and create Merkle Tree
        component hash[NUM_FIELDS];
        for (var j = 0; j < NUM_FIELDS; j++) {
            hash[j] = Sha256();
            hash[j].left = attestations[i][j];
            hash[j].right = 0; // Padding with 0 for simplicity
        }

        // Combine hashes to create Merkle root
        merkleTree[i] = Sha256();
        merkleTree[i].left = hash[0].out;
        for (var k = 1; k < NUM_FIELDS; k++) {
            merkleTree[i].right = hash[k].out;
            if (k < NUM_FIELDS - 1) {
                merkleTree[i] = Sha256();
                merkleTree[i].left = merkleTree[i].out;
            }
        }

        // Verify issuer signature
        issuerSigVerify[i] = EcdsaVerify();
        issuerSigVerify[i].sigR = issuerSignatures[i][0];
        issuerSigVerify[i].sigS = issuerSignatures[i][1];
        issuerSigVerify[i].msg = merkleTree[i].out;
        issuerSigVerify[i].Q = issuerPublicKeys[i];

        // Verify subject signature
        subjectSigVerify[i] = EcdsaVerify();
        subjectSigVerify[i].sigR = subjectSignatures[i][0];
        subjectSigVerify[i].sigS = subjectSignatures[i][1];
        subjectSigVerify[i].msg = merkleTree[i].out;
        subjectSigVerify[i].Q[0] = attestations[i][4]; // Subject address is the 5th field
        subjectSigVerify[i].Q[1] = 0; // Assuming Q is a point on curve, need to derive full point

        // Date verification
        signal isValidDate;
        isValidDate <== (attestations[i][2] < currentDate); // attestedDate is the 3rd field

        // Output schema ID
        schemaIds[i] <== attestations[i][0]; // schemaId is the 1st field
    }
}

component main = AttestationVerifier();

N.B: I would have liked to have fleshed this proposal out a lot more and post something a bit more developed, but I’m going with pace-over-perfection for this RFC to just get the conversation started and get people’s thoughts and also gauge interest. All feedback is very much appreciated!


Another important note: the proposal above is very much Verax-centric, but it would be MUCH more beneficial to able to adapt the proposal to a standard that would work with multiple (or any) attestation registry.

1 Like