RFC for Privacy Preserving On-Chain Attestations

Mitigating Privacy Leakage via Data Correlation Attacks using ZKPs

Background and Motivation

As on-chain attestations increases in adoption, the risks of individual users exposing more information than they had originally intended to become greater. The main reason that this will happen is because users may not be able to appreciate how much information about themselves they are registering on-chain until it’s too late, i.e. until it is already on an immutable public ledger, at which point it is too late. The other reason that this may happen is that more information can be derived through correlation and triangulation of data points, than can be revealed by each individual datum in isolation. There have been a number of studies of such “data correlation attacks” outside of web3 (e.g. see this paper) but this has not been applied in practice to the areas of on-chain attestations due to the nascent nature of the space.

Overview

The general idea is to issue an attestation to an Ethereum address in such a way that the attestation subject can prove that they control the key to the address to which the attestation was issued, and that the issued attestation is based on a specific schema id issued from a specific issuer.

This will allow a user to prove that they have a number of attestations of specific types, issued by specific issuers, without revealing which specific attestations they are, or which addresses they have been issued to.

In order to do this, we will compile a ZKP circuit that conforms to a specific standard, as described:

  • The circuit will accept as private inputs:
    • Attestations array: an 2D array of attestations, each attestation is also an array that corresponds to the metadata fields of that attestation
    • Issuers signatures array: an array of signatures from the issuers of the respective attestations, verified against the address in the attester field of the attestation metadata
    • Subject signatures array: an array of signatures corresponding to the addresses the attestations were issued
  • The circuit will accept as public inputs:
    • The current date: this can be any arbitrary date, but it is assumed that the date will be verified as part of the zk-proof verification process
    • Issuers public keys array: an array of public keys corresponding to the issuers signatures array.
  • The circuit will have the following public outputs:
    • An array of schema ids corresponding to each attestation
  • The circuit will iterate over the attestations array, and on every iteration the circuit will:
    • hash the value of each metadata field in the attestation and create a merkle tree
    • verify the signature in the corresponding element of the issuers signature array against the merkle root of attestation metadata it just created, and the given public key retrieved from the corresponding element of the issuers public keys array
    • verify the signature in the corresponding element of the subject signatures array against the merkle root of the attestation data and the address in the subject field of the attestation metadata
    • verify that the value of the attestedDate field is less than the value of the current date input
    • add the schemaId field from the attestation metadata to the public outputs of circuit

The values the attestation metadata include:

  • schemaId
  • attester
  • attestedDate
  • expirationDate
  • subject
  • attestationData

All inputs are 32 bytes.

Discussion

We don’t need all the attestation metadata. Fields such as revoked, replacedBy, revocationDate, version, and portal are not strictly necessary for the proof, as these serve no purpose. For example, the revoked field is not useful, because the user can simply use the attestation data of the attestation before the attestation was revoked, and the circuit will have no awareness that there was any subsequent revocation.

This exposes one of the limitations of this mechanism, in that it cannot prove that any attestations have not been revoked.

The attestationData field will need to be a hash of the attestation data, not the actual attestation payload. This hash can be used for specific applications, whereby the hash of the attestation data can be a merkle root of some merklized attestation payload, which can be used to perform selective disclosure. This is just for the circuit input, the attestation payload can still anything, but it just needs to be hashed before being input to the circuit.

The issuer’s signature needs to be stored with the attestation, in order for this mechanism to work. To this end, we propose that the specification will require that the attestation schema contains an issuerSignature field. This will allow a client application to be able to know where to retrieve the signatures needed to create the proof.

This mechanism is predicated upon the requirement that the subject field of the attestation is the raw public key of the issuer, NOT an EVM wallet address, as verifying an EVM wallet address will require working with keccak-256 hashes, which circom does not natively support. This can potentially be implemented in a future iteration, once we are confident that it won’t adversely performance to an unacceptable extent.

The verifying contract will need to convert the issuer public key to an EVM wallet address and verify that the issuer is a trusted / expected issuer. They will also need to manually verify the date in the public input against the current date.

User Flow

From a user’s perspective, what this will look like is an application that will allow the user to search and and select a number of attestations. Once a number of attestations have been selected the user can create a proof of ownership. The application will request a signature for each selected attestation from the user. Depending on the application’s design, these signatures can be created and cached locally at some prior point. The user can then supply the proof to some on-chain contract of a dapp, or potentially even as another attestation.

Example Circuit

See below for a quick sketch of what a ZKP circuit would look like. This example is written in Circom. Note that this is a very quick sketch and likely contains some errors, but hopefully it’s enough to illustrate the main idea. The code below uses sha256 and ECDSA, which while technically possible, may result in a long proof generation times, and it’s worth benchmarking against a circuit using MiMC or Poseidon and EdDSA.

pragma circom 2.0.0;

include "hashes/sha256.circom";
include "ecdsa.circom";

template AttestationVerifier() {
    signal input attestations[NUM_ATTESTATIONS][NUM_FIELDS];
    signal input issuerSignatures[NUM_ATTESTATIONS][2];
    signal input subjectSignatures[NUM_ATTESTATIONS][2];
    signal input currentDate;
    signal input issuerPublicKeys[NUM_ATTESTATIONS][2];
    signal output schemaIds[NUM_ATTESTATIONS];

    component merkleTree[NUM_ATTESTATIONS];
    component issuerSigVerify[NUM_ATTESTATIONS];
    component subjectSigVerify[NUM_ATTESTATIONS];

    for (var i = 0; i < NUM_ATTESTATIONS; i++) {
        // Hash the metadata fields and create Merkle Tree
        component hash[NUM_FIELDS];
        for (var j = 0; j < NUM_FIELDS; j++) {
            hash[j] = Sha256();
            hash[j].left = attestations[i][j];
            hash[j].right = 0; // Padding with 0 for simplicity
        }

        // Combine hashes to create Merkle root
        merkleTree[i] = Sha256();
        merkleTree[i].left = hash[0].out;
        for (var k = 1; k < NUM_FIELDS; k++) {
            merkleTree[i].right = hash[k].out;
            if (k < NUM_FIELDS - 1) {
                merkleTree[i] = Sha256();
                merkleTree[i].left = merkleTree[i].out;
            }
        }

        // Verify issuer signature
        issuerSigVerify[i] = EcdsaVerify();
        issuerSigVerify[i].sigR = issuerSignatures[i][0];
        issuerSigVerify[i].sigS = issuerSignatures[i][1];
        issuerSigVerify[i].msg = merkleTree[i].out;
        issuerSigVerify[i].Q = issuerPublicKeys[i];

        // Verify subject signature
        subjectSigVerify[i] = EcdsaVerify();
        subjectSigVerify[i].sigR = subjectSignatures[i][0];
        subjectSigVerify[i].sigS = subjectSignatures[i][1];
        subjectSigVerify[i].msg = merkleTree[i].out;
        subjectSigVerify[i].Q[0] = attestations[i][4]; // Subject address is the 5th field
        subjectSigVerify[i].Q[1] = 0; // Assuming Q is a point on curve, need to derive full point

        // Date verification
        signal isValidDate;
        isValidDate <== (attestations[i][2] < currentDate); // attestedDate is the 3rd field

        // Output schema ID
        schemaIds[i] <== attestations[i][0]; // schemaId is the 1st field
    }
}

component main = AttestationVerifier();

N.B: I would have liked to have fleshed this proposal out a lot more and post something a bit more developed, but I’m going with pace-over-perfection for this RFC to just get the conversation started and get people’s thoughts and also gauge interest. All feedback is very much appreciated!


Another important note: the proposal above is very much Verax-centric, but it would be MUCH more beneficial to able to adapt the proposal to a standard that would work with multiple (or any) attestation registry.

1 Like

I read through this write-up and have a few questions to better help me understand the goal and what is actually going on here. Some of this is just me restating what I understand from the writeup, so please correct me if I say something wrong.

It seems like your goal here is to generate a proof of ownership over a set of attestations that can be provided to a 3rd party. Essentially, you can prove that you have created some arbitrary attestation(s) that belong to a list of schema IDs without exposing the actual attestations to the 3rd parties themselves, correct? The circuit will validate the attestation information against the attester field, among other checks, to ensure the claim of ownership is valid.

The rest of this message assumes that the previous statements are correctly derived from your writeup.

My concern is this. The attester is the authority generating the attestations. The attester is also the one generating the proof. As a 3rd party, I have to trust the attester is feeding me valid information, since I truly cannot validate the contents or claims of said attestations. Sure, you could argue that the attestations clearly exist and belong to the attester because of the ZKP, but my concern is that these attestations could easily be bogus with garbage attestation data submitted by the attester.

As a malicious attester, I could make any false claim, submit an attestation using this claim with the set schema ID, generate a proof using your circuit, and unless this circuit has very specific instructions to validate all fields of the attestation against the schema requirements, there is absolutely no way a 3rd party could trust the proof because they cannot trust that the proof is verifying a valid attestation object. The 3rd party will just know that I, the attester, created an attestation with some schema ID(s) that the 3rd party is looking for and they will have no way of understanding that the data I put in my attestation(s) is bogus. Your current circuit proof of concept only accepts a hash of the data anyway, so there is no way that you could perform the necessary checks in the circuit to allow 3rd parties to trust the attestations submitted to begin with. Even if you could, it would be extremely difficult to scale, since every schema would realistically need its own circuit for verifying the contents of attestations.

I think that attestations are really only valid sources of truth in two generalized situations: when you trust the attester as an authority or when you can verify the attestation data contents against a claim. So, I guess my real question is the following: how can I, a 3rd party, rely on this system? What kinds of trust assumptions do I need to make in order to place full faith in this system?

Thanks for reviewing and providing feedback, much appreciated!

The attester isn’t the authority generating the proof, the owner of the attestations (i.e. the subject) is the one generating the proofs. The attester is the entity that provides the attestations (i.e. could be GitHub via zkPass).

The proof is intended to be generated by the owner of the attestations, not the attester / issuer of the attestations. This is perhaps a symptom of unclear nomenclature, and might be something that needs to be addressed. Perhaps I can change attester to issuer to make it less confusing.

The proof will require signatures from the holder of the attestation, so a third party cannot create an acceptable proof without the attestation owner’s consent.

The verifying party will need to verify the proof but also the public inputs. The public inputs will include the public keys of the issuers, and the repot will verify that the attestations have the respective valid signatures. Similarly the proof will verify the attestations are owned by the person presenting the proof (via checking their signatures against the public keys / addresses the attestations were issued to.

In short: this scheme still require verifying parties to check the addresses of the issuers (from the public inputs of the proof) to make sure they trust them, which is the same as it is today. The proof will need to be presented by the subject of the attestations, accompanied by a signature (or in a transaction from their wallet).

I think the concerns you raise highlight some shortcomings though:

  1. the nomenclature is confusing, attester should be issuer, and perhaps subject should be holder
  2. the design leverages the fact that attestations on Verax and Sign have an attester field that can be leveraged, for EAS/BAS we could potentially leverage the `from’ field for this.

Ah yes, after reading more of Verax’s documentation, I can see one fundamental difference between how attestations are handled between Verax’s platform and Sign’s platform. Verax requires attesters to be registered/whitelisted before issuing attestations, whereas on Sign Protocol, anyone can be an attester (and can subsequently attest their own arbitrary data for almost any schema ID). This difference is where my concern was being raised - an attester/issuer could be the same as the subject/holder/recipient on Sign Protocol.

Sign Protocol treats schemas as a data format and really nothing more, unless custom logic is attached to a schema. People can attach custom logic to schemas via smart contract hooks (which makes a schema more specific to a particular issuer as you can attach whitelists, among other checks) - this addition makes our flow more similar to Verax’s platform where you could limit attesters for a particular schema ID.

This was also something that I wasn’t quite sure about because I am new to circom. I recognized that the output was public, but I didn’t know which input signals you also desired to be public. I did not realize that the attesters’/issuers’ public keys would be sent along with the schema IDs.

So, with this said, as a 3rd party looking to use this, I can expect to receive the following information: the attester(s) public key(s), the schema ID(s), the proof, and its accompanying signature by the user’s wallet. I can also expect that the proof will only be generated if the user’s wallet owns the provided attestations.

If these are all valid assumptions, I can’t find any flaws or limitations that would modify the trust assumptions already expected in an attestation system like Verax or Sign, so it looks good to me.

That’s good to hear, and thanks again for spending time reviewing it.

Am I correct in thinking that this would work for Sign Protocol by using the attester field in the attestation metadata?

The other thing that was brought up (h/t to Mo from Brevis) is that we should explore some sort of zk-address-binding, because having the user create a separate signature for every individual attestation they want to prove ownership of isn’t a great UX. This is an open sea of exploration for now.

Yep! It would just be extremely important for 3rd parties to verify that they trust the attester. The reason being, anybody could be an attester on Sign instead of having preset attesters like on Verax. We would need to make sure this is communicated/documented well.

This would definitely be an interesting area of research, although I am struggling to think of a secure/anonymous way to do this off the top of my head. Definitely need to follow up on this idea.

That’s a good point, this should be front-and-center for sure!