Internet-Draft DP-PPM June 2024
Chen, et al. Expires 12 December 2024 [Page]
Workgroup:
Privacy Preserving Measurement
Internet-Draft:
draft-wang-ppm-differential-privacy-latest
Published:
Intended Status:
Informational
Expires:
Authors:
J. Chen
Apple Inc.
A. McMillan
Apple Inc.
C. Patton
Cloudflare
K. Talwar
Apple Inc.
S. Wang
Apple Inc.

Differential Privacy Mechanisms for DAP

Abstract

Differential Privacy (DP) is a property of a secure aggregation mechanism that ensures that no single input measurement significantly impacts the distribution of the aggregate result. This is a stronger property than what systems like the Distributed Aggregation Protocol (DAP) are designed to achieve. This document describes a variety of DP mechansisms applicable to DAP, and, for a variety of common use cases, lifts DAP to a protocol that also achieves DP.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at https://wangshan.github.io/draft-wang-ppm-differential-privacy/draft-wang-ppm-differential-privacy.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-wang-ppm-differential-privacy/.

Discussion of this document takes place on the Privacy Preserving Measurement Working Group mailing list (mailto:ppm@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/ppm/. Subscribe at https://www.ietf.org/mailman/listinfo/ppm/.

Source for this draft and an issue tracker can be found at https://github.com/wangshan/draft-wang-ppm-differential-privacy.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 12 December 2024.

Table of Contents

1. Introduction

[TO BE REMOVED BY RFC EDITOR: The source for this draft and and the reference implementations of mechanisms and policies can be found at https://github.com/wangshan/draft-wang-ppm-differential-privacy.]

Multi-party computation systems like the Distributed Aggregation Protocol [DAP] enable secure aggregation of measurements generated by individuals without handling the measurements in the clear. This is made possible by using a Verifiable Distributed Aggregation Function [VDAF], the core cryptographic component of DAP. Execution of A VDAF involves: a large set of "Clients" who produce cryptographically protected measurements, called "reports"; a small number of "Aggregators" who consume reports and produce the cryptographically protected aggregate; and a "Collector" who consumes the plaintext aggregate result. Distributing the computation of the aggregate in this manner ensures that, as long as one Aggregator is honest, no attacker can learn an honest Client's measurement.

Depending on the application, protecting the measurements may not be sufficient for privacy, since the aggregate itself can reveal privacy-sensitive information. As an illustrative example, consider using DAP/VDAF to summarize the distribution of the heights of respondents to a survey. If one of the respondents is especially short or tall, then their contribution is likely to skew the summary statistic in a way that reveals their height. Ideally, no individual measurement would have such a significant impact on the aggregate result, but in general such leakage is inevitable for exact aggregates. Adding some carefully chosen noise to the aggregates can however help hide the contribution of one respondent.

This intuition can be formalized by the notion of differential privacy [DMNS06]. Differentially privacy is a property of an algorithm or protocol that computes some function of a set of measurements. We say the algorithm or protocol is "differentially private", or "DP", if the probability of observing a particular output does not change significantly as a result of removing one of the measurements (or substituting it with another).

VDAFs are not DP on their own, but they can be composed with a variety of mechanisms that endow them with this property. All such mechanisms work by introducing noise into the computation that is carefully calibrated for a number of application-specific parameters, including the structure and number of measurements, the desired aggregation function, and the degree of "utility" required. Intuitively, a high degree of privacy can be achieved by adding a lot of noise; but adding too much noise can reduce the usefulness of the aggregate result.

Noise can be introduced at various steps at the computation, and by various parties. Depending on the mechanism: the Clients might add noise to their own measurements; and the Aggregators might add noise to their aggregate shares (the values they produce for the Collector).

In this document, we shall refer to the composition of DP mechanisms into a scheme that provides (some notion of) DP as a "DP policy". For some policies, noise is added only by the Clients or only by the Aggregators, but for others, both Clients and Aggregators may participate in generating the noise.

The primary goal of this document is to specify how DP policies are implemented in DAP. It does so in the following stages:

  1. Section 3 describes the notion(s) of DP that are compatible with DAP. Security is defined in a few different "trust models" in which we assume that some fraction of the parties execute the protocol honestly. Of course in reality, whether such assumptions hold is usually outside of our control. Thus our goal is to design DP policies that still provide some degree of protection in more pessimistic trust models. (We call this "hedging".)

  2. Section 4 specifies various mechanisms required for building DP systems, including algorithms for sampling from discrete Laplace and Gaussian distributions.

  3. Section 5 defines DP policies that are implemented with DP mechanisms, their composition with VDAFs, and their execution semantics for DAP. Section 5.1 then demonstrates how to execute VDAFs with DP policies.

  4. Section 6 specifies compositions of concrete VDAFs with concrete DP policies for achieving DP for specific DAP use cases.

The following considerations are out-of-scope for this document:

  1. DP policies have a few parameters that need to be tuned in order to meet the privacy/utility tradeoff required by the application. This document provides no guidance for this process.

  2. This document describes a particular class of narrowly-scoped DP policies. Other, more sophisticated policies are possible. [TODO: Add citations. Here we're thinking of things like DPrio, which may be more appropriate to specify as DAP report extensions.]

  3. The mechanisms described in Section 4 are intended for use beyond DAP/VDAF. However, this document does not describe general-purpose DP policies; those described in Section 5 are tailored to specific VDAFs or classes of VDAFs.

2. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

This document uses Python3 for describing algorithms.

Let exp(EPSILON) denote raising the numeric constant e to the power of EPSILON.

3. Security Goals and Trust Model

3.1. Differential Privacy

DP formalizes a property of any randomized algorithm that takes in a sequence of measurements, aggregates them, and outputs an aggregate result. Intuitively, this property guarantees that no single measurement significantly impacts the value of the aggregate result. This intuition is formalized by [DMNS06] as follows.

  • CP: The following is might be too jargony for an informative RFC, but for now I think we're all just trying to agree on the definition. Once we have consensus among ourselves, we can punt this to the appendix and leave a less formal description here.

DP requires specifying the notion of "neighboring" datasets, that determines what information is being hidden. The most common notion for our setting would be the following:

We say that two batches of measurements D1 and D2 are "neighboring" if they are the same length and contain all the same measurements except one (i.e., the symmetric difference between the multisets contains two elements). We denote this definition as "replacement-DP" (or "substitution-DP"). Appendix B.2 discusses other notions of adjacency that may be appropriate in some settings.

Let p(S, D, r) denote the probability that randomized algorithm S, on input of measurements D, outputs aggregate result r.

A randomized algorithm S is called "EPSILON-DP" if for all neighboring D1 and D2 and all possible aggregate results r, the following inequality holds:

p(S, D1, r) <= exp(EPSILON) * p(S, D2, r)

In other words, the probability that neighboring inputs produce a given aggregate result differs by at most a constant factor, exp(EPSILON).

One can think of EPSILON as a measure of how much information about the measurements is leaked by the aggregate result: the smaller the EPSILON, the less information is leaked by S. For most DP applications, EPSILON will be a small constant, e.g. 0.1 or 0.5. See Appendix B for details.

This notion of EPSILON-DP is sometimes referred to as "pure-DP". The following is a relaxation of pure-DP, called "approximate-DP", from [DR14]. A randomized algorithm S is called "(EPSILON, DELTA)-DP" if for all neighboring D1 and D2 and all possible aggregate results r, the following inequality holds:

p(S, D1, r) <= exp(EPSILON) * p(S, D2, r) + DELTA

Compared to pure-DP, approximate-DP loses an additive factor of DELTA in the bound. DELTA can intuitively be understood as the probability that a piece of information is leaked (e.g. a Client measurement is leaked), so DELTA is typically taken to be polynomially small in the batch size or smaller, i.e., some value much smaller than 1 / batch_size. Allowing for a small DELTA can in many cases allow for much smaller noise compared to pure-DP mechanisms. See Section 4 for details.

Other variants of DP are possible; see the literature review in Appendix B for details.

3.2. Sensitivity

Differential privacy noise sometimes needs to be calibrated based on the SENSITIVITY of the aggregation function used to compute the aggregate result over Client measurements. Sensitivity characterizes how much a change in a Client measurement can affect the aggregate result. In this document, we focus on the L1 and L2 sensitivity. We define them as the maximum of some distance between two aggregate results computed from any neighboring datasets:

  • L1 Sensitivity: the distance is defined as the sum of the absolute values of differences at all coordinates of the neighboring aggregate results.

  • L2 Sensitivity: the distance is defined as the square root of the sum of squares of the differences at all coordinates of the neighboring aggregate results.

3.3. Trust Models

When considering whether a given DP policy is sufficient for DAP, it is not enough to consider the mechanisms used in isolation. It is also necessary to consider the process by which the policy is executed. In particular, our threat model for DAP considers an attacker that participates in the upload, aggregation, and collection protocols and may use its resources to attack the underlying cryptographic primitives (VDAF [VDAF], HPKE [RFC9180], and TLS [RFC8446]).

To account for such attackers, our goal for DAP is "computational-DP" as described by [MPRV09] (Definition 4, "SIM-CDP"). This formalizes the amount of information a computationally bounded attacker can glean about the measurements generated by honest Clients over the course of its attack on the protocol. We consider an attacker that controls the network and statically corrupts a subset of the parties.

We say the protocol is pure-DP (resp. approximate-DP) if the view of any computationally bounded attacker can be efficiently simulated by a simulator that itself is pure-DP (or approximate-DP) as defined above. (The simulator takes the measurements as input and outputs a value that should be computationally indistinguishable from the transcript of the protocol's execution in the presence of the attacker.)

Whether this property holds for a given DP policy depends on which parties can be trusted to execute the protocol correctly (i.e., which parties are not corrupted by the attacker). We consider three, increasingly pessimistic trust models.

  • KT(issue#28): Here we seem to be assuming corrupted = malicious. Is there any benefit to a more refined distinction (i.e. honest-but-curious vs malicious). I suspect we would always want secure against malicious, but perhaps there are settings where we are OK with security against bad behavior that is not catchable during an audit.

3.3.1. OAMC: One-Aggregator-Most-Clients

Assume that most Clients and one Aggregator are honest and that the other Aggregator (DAP involves just two Aggregators) and the Collector are controlled by the attacker. When all Clients are honest, this corresponds to the same trust model as the base DAP protocol. The degree of privacy provided (i.e., the value of EPSILON for pure-DP) for most protocols in this setting would degrade gracefully as the number of honest Clients decreases.

3.3.2. OAOC: One-Aggregator-One-Client

Assume that at least one Aggregator is honest. The other Aggregator, the Collector, and all but one Clients are controlled by the attacker. The goal is to provide protection for the honest Client's measurement.

  • TODO(issue#15) For the simple Aggregator-noise mechanism, if the malicious Aggregator cheats by not adding noise, then the aggregate result is not DP from the point of view of the honest Aggregator, unless the honest Aggregator "forgets" the randomness it used. Describe this "attack" in "Security Considerations" and say why it's irrelevant.

3.4. OC: One-Client

Assume that all parties, including all but one Client, both Aggregators, and the Collector are controlled by the attacker. The best the honest Client can hope for is that its measurement has "local-DP". This property is defined the same way as pure- or approximate-DP, but the bound on EPSILON that we would aim to achieve for local-DP would typically be larger than that in a more optimistic trust model.

3.5. Hedging

If a trust model's assumptions turn out to be false in practice, then it is desirable for a DP policy to maintain some degree of privacy in a more pessimistic trust model. For example, a deployment of DAP might provide some mechanism to ensure that all reports that are consumed were generated by trusted Clients (e.g., a Trusted Execution Environment (TEE) at each Client). This gives us confidence that our assumptions in the OAMC trust model hold. But if this mechanism is broken (e.g., a flaw is found in the TEE), then it is desirable if the policy remains DP in the OAOC or OC model, perhaps with a weaker bound.

4. DP Mechanisms

This section describes various mechanisms required for implementing DP policies. The algorithms are designed to securely expand a short, uniform random seed into a sample from the target given distribution.

Each mechanism has internal parameters that determine how much noise will be added to its input data. Note that a mechanism that is initialized with its internal parameters can achieve different combinations of DP parameters, e.g. (EPSILON, DELTA)-DP, or (EPSILON', DELTA')-DP, where EPSILON < EPSILON' and DELTA > DELTA', because if we make EPSILON larger (i.e., weaker privacy), we may achieve a smaller DELTA and thereby a smaller overall DP bound (i.e., stronger privacy).

A concrete DP mechanism implements the following methods:

Putting this together a DP mechanism is a concrete subclass of DPMechanism defined below:

class DpMechanism:
    # The data type applicable to this `DpMechanism`. The type is the
    # same for the noised data and the un-noised data.
    DataType = None
    # Debiased data type after removing bias added by the noise. For
    # most of the mechanisms, `DebiasedDataType == DataType`.
    DebiasedDataType = None

    def add_noise(self, data: DataType) -> DataType:
        """Add noise to a piece of input data. """
        raise NotImplementedError()

    def sample_noise(self, dimension: int) -> DataType:
        """
        Sample noise with the initialized `DpMechanism`. `dimension`
        is used to determine the length of the output if `DataType` is
        a list.
        """
        raise NotImplementedError()

    def debias(self,
               data: DataType,
               meas_count: int) -> DebiasedDataType:
        """
        Debias the data due to the added noise, based on the number of
        measurements `meas_count`. This doesn't apply to all DP
        mechanisms. Some Client randomization mechanisms need this
        functionality.
        """
        return data

4.1. Discrete Laplace

  • TODO: Specify a Laplace sampler from Algorithm 2 of [CKS20] (#10).

4.2. Discrete Gaussian

  • TODO: Specify a Gaussian sampler from Algorithm 3 of [CKS20] (#10).

4.3. Symmetric RAPPOR

This section describes Symmetric RAPPOR, a DP mechanism first proposed in [EPK14], and refined in Appendix C.1 of [MJTB_22]. (The specification here reflects the refined version.) It is initialized with a parameter EPSILON_0. It takes in a bit vector and outputs a noisy version with randomly flipped bits.

Symmetric RAPPOR applies "binary randomized response mechanism" at each coordinate. Binary randomized response takes in a single bit x. The bit is flipped to 1 - x with probability 1 / (exp(EPSILON_0) + 1). For example, if EPSILON_0 is configured to be 3, and the input to binary randomized response is a 0, the bit will be flipped to be 1 with probability 1 / (exp(3) + 1), otherwise, it will stay as a 0.

Under OC trust model, by applying binary randomized response with EPSILON_0 parameter to its measurement, the Client gets EPSILON_0-DP with deletion (Definition II.4 of [EFMR_20], and Definition C.1 of [MJTB_22]). A formal definition of deletion-DP is elaborated in Section 4.3.1.

Symmetric RAPPOR generalizes binary randomized response mechanism by applying it independently at all coordinates of a Client's bit vector. Under OAMC trust model, it is proven in Appendix C.1.3 of [MJTB_22] that strong (EPSILON, DELTA)-DP can be achieved by aggregating a batch of noisy Client measurements, each of which is a bit vector with exactly one bit set, and is noised with symmetric RAPPOR. The final aggregate result needs to be "debiased" due to the noise added by the Clients. Although the raw aggregate result is a vector of integers, the debiased aggregate result is a vector of floats, because the best estimate of the true sum depends on exp(EPSILON_0), and generally won't be an integer.

Since the noise generated by each Client at each coordinate is independent, and as the number of Clients n grows, the noise distribution at each coordinate approximates a Gaussian distribution, with mean 0, and standard deviation sqrt(n * exp(EPSILON_0) / (exp(EPSILON_0) - 1)^2), as proved by Theorem C.2 of [MJTB_22].

4.3.1. EPSILON_0-DP in Deletion-DP

  • JC: We only add a definition of deletion-DP here since this is likely the only mechanism that provides deletion-DP in the OC trust model. Putting it in overview might confuse readers early on, because we said only replacement-DP applies to DAP.

Let P1(S, D, E) denote the probability that a randomized algorithm S, on an input measurement D, outputs a noisy measurement E. Let P2(R, E) denote the probability that a noisy output from the reference distribution R is equal to E. A randomized algorithm S is said to be EPSILON_0-DP in the deletion-DP model, if there exists a reference distribution R, such that for all possible Client measurements D, and all possible noisy outputs E, we have:

-EPSILON_0 <= ln(P1(S, D, E) / P2(R, E)) <= EPSILON_0

Intuitively, if we think of the reference distribution R as an average, noisy Client measurement, deletion-DP makes it hard to distinguish the Client measurement D from the measurement from an average Client.

4.3.2. Reference Implementation

Symmetric RAPPOR is specified by the DP mechanism SymmetricRappor below.

  • TODO: We could make the sampler more efficient if we use binomial.

import random

class SymmetricRappor(DpMechanism):
    DataType = list[int]
    # Debiasing produces an array of floats.
    DebiasedDataType = list[float]

    def __init__(self, eps0: float):
        self.eps0 = eps0
        self.p = 1.0 / (math.exp(eps0) + 1.0)

    def add_noise(self, data: DataType) -> DataType:
        # Apply binary randomized response at each coordinate, based
        # on Appendix C.1.1 of {{MJTB+22}}.
        return list(map(
            lambda x: 1 - x if self.coin_flip() else x,
            data
        ))

    def sample_noise(self, dimension: int) -> DataType:
        # Sample binary randomized response at each coordinate on an
        # all zero vector.
        return [int(self.coin_flip()) for coord in range(dimension)]

    def debias(self,
               data: DataType,
               meas_count: int) -> DebiasedDataType:
        # Debias the data based on Appendix C.1.2 of {{MJTB+22}}.
        exp_eps = math.exp(self.eps0)
        return list(map(
            lambda x: (x * (exp_eps + 1) / (exp_eps - 1) -
                       meas_count / (exp_eps - 1)),
            data
        ))

    def coin_flip(self):
        return random.random() < self.p

5. DP Policies for VDAFs

The section defines a generic interface for DP policies for VDAFs. A DP policy composes the following operations with execution of a VDAF:

  1. An optional Client randomization mechanism that adds noise to Clients' measurements prior to sharding.

  2. An optional Aggregator randomization mechanism that adds noise to an Aggregator's aggregate share prior to unsharding.

  3. An optional debiasing step that removes the bias in DP mechanisms (i.e. DpMechanism.debias) after unsharding.

The composition of Client and Aggregator randomization mechanisms defines the DP policy for a VDAF, and enforces the DP guarantee. In particular, a concrete DP policy is a subclass of DpPolicy:

class DpPolicy:
    # Client measurement type.
    Measurement = None
    # Aggregate share type, owned by an Aggregator.
    AggShare = None
    # Aggregate result type, unsharded result from all Aggregators.
    AggResult = None
    # Debiased aggregate result type.
    DebiasedAggResult = None

    def add_noise_to_measurement(self,
                                 meas: Measurement,
                                 ) -> Measurement:
        """
        Add noise to measurement, if required by the Client
        randomization mechanism. The default implementation is to do
        nothing.
        """
        return meas

    def add_noise_to_agg_share(self,
                               agg_share: AggShare,
                               ) -> AggShare:
        """
        Add noise to aggregate share, if required by the Aggregator
        randomization mechanism. The default implementation is to do
        nothing.
        """
        return agg_share

    def debias_agg_result(self,
                          agg_result: AggResult,
                          meas_count: int,
                          ) -> DebiasedAggResult:
        """
        Debias aggregate result, if either of the Client or
        Aggregator randomization mechanism requires this operation,
        based on the number of measurements `meas_count`. The default
        implementation is to do nothing.
        """
        return agg_result

5.1. Executing DP Policies with VDAFs

The execution of DpPolicy with a Vdaf can thus be summarized by the following computational steps (these are carried out by DAP in a distributed manner):

def run_vdaf_with_dp_policy(
    dp_policy: DpPolicy,
    Vdaf: Vdaf,
    agg_param: Vdaf.AggParam,
    measurements: list[DpPolicy.Measurement],
):
    """Run the DP policy with VDAF on a list of measurements."""
    nonces = [gen_rand(Vdaf.NONCE_SIZE)
              for _ in range(len(measurements))]
    verify_key = gen_rand(Vdaf.VERIFY_KEY_SIZE)

    out_shares = []
    for (nonce, measurement) in zip(nonces, measurements):
        assert len(nonce) == Vdaf.NONCE_SIZE

        # Each Client adds Client randomization noise to its
        # measurement.
        noisy_measurement = \
            dp_policy.add_noise_to_measurement(measurement)
        # Each Client shards its measurement into input shares.
        rand = gen_rand(Vdaf.RAND_SIZE)
        (public_share, input_shares) = \
            Vdaf.shard(noisy_measurement, nonce, rand)

        # Each Aggregator initializes its preparation state.
        prep_states = []
        outbound = []
        for j in range(Vdaf.SHARES):
            (state, share) = Vdaf.prep_init(verify_key, j,
                                            agg_param,
                                            nonce,
                                            public_share,
                                            input_shares[j])
            prep_states.append(state)
            outbound.append(share)

        # Aggregators recover their output shares.
        for i in range(Vdaf.ROUNDS-1):
            prep_msg = Vdaf.prep_shares_to_prep(agg_param,
                                                outbound)

            outbound = []
            for j in range(Vdaf.SHARES):
                out = Vdaf.prep_next(prep_states[j], prep_msg)
                (prep_states[j], out) = out
                outbound.append(out)

        # The final outputs of the prepare phase are the output shares.
        prep_msg = Vdaf.prep_shares_to_prep(agg_param,
                                            outbound)
        outbound = []
        for j in range(Vdaf.SHARES):
            out_share = Vdaf.prep_next(prep_states[j], prep_msg)
            outbound.append(out_share)

        out_shares.append(outbound)

    num_measurements = len(measurements)
    # Each Aggregator aggregates its output shares into an
    # aggregate share, and adds any Aggregator randomization
    # mechanism to its aggregate share. In a distributed VDAF
    # computation, the aggregate shares are sent over the network.
    agg_shares = []
    for j in range(Vdaf.SHARES):
        out_shares_j = [out[j] for out in out_shares]
        agg_share_j = Vdaf.aggregate(agg_param, out_shares_j)
        # Each Aggregator independently adds noise to its aggregate
        # share.
        noised_agg_share_j = \
            dp_policy.add_noise_to_agg_share(agg_share_j)
        agg_shares.append(noised_agg_share_j)

    # Collector unshards the aggregate.
    agg_result = Vdaf.unshard(agg_param, agg_shares,
                              num_measurements)
    # Debias aggregate result.
    debiased_agg_result = dp_policy.debias_agg_result(
        agg_result, num_measurements
    )
    return debiased_agg_result

5.2. Executing DP Policies in DAP

  • TODO: Specify integration of a DpPolicy into DAP.

6. Use Cases

6.1. Histograms

Many applications require aggregating histograms in which each Client submits a bit vector with exactly one bit set, also known as, "one-hot vector".

We describe two policies that achieve (EPSILON, DELTA)-DP for this use case. The first uses only a Client randomization mechanism and targets the OAMC trust model. The second uses only an Aggregator randomization mechanism and targets the more stringent OAOC trust model. We discover that both policies in different settings of EPSILON and DELTA provide comparable utility, except that the policy in the OAOC trust model requires all Aggregators to independently add noise, so we lose some utility when more than one Aggregator is honest.

6.1.1. Prio3MultiHotHistogram with Client Randomization

Client randomization allows Clients to protect their privacy by adding noise to their measurements directly, as described in Appendix B.1. Analyses ([FMT20] and [FMT22]) have shown that, in the OAMC trust model, we get good approximate-DP by aggregating noisy Clients' measurements with Client randomization. In this policy, we will describe how to achieve approximate-DP, with each Client applying symmetric RAPPOR to its measurement.

Our target VDAF is Prio3Histogram as specified in [VDAF]. This uses the Histogram circuit to enforce one-hotness of the measurement. Due to the noising mechanism, a less strict circuit is required that tolerates a bounded number of non-0 entries in the vector. We call this Prio3MultiHotHistogram.

  • JC: Specify Prio3MultiHotHistogram. This may end up in the base VDAF draft. In the meantime, there is a reference implementation here: https://github.com/cfrg/draft-irtf-cfrg-vdaf/blob/main/poc/vdaf_prio3.py

The Client randomization we will use here is the symmetric RAPPOR mechanism of Section 4.3, which is initialized with a EPSILON_0 parameter. We get (EPSILON, DELTA)-DP in the aggregate result, as long as there are at least batch_size number of honest Clients, each of which applies symmetric RAPPOR to its measurement, and contributes the noisy measurement to the batch. The (EPSILON, DELTA)-DP degrades gracefully as the number of honest Clients decreases, i.e., we can still achieve (EPSILON', DELTA)-DP, where EPSILON' is larger than EPSILON.

  • TODO(junyechen1996): Justify why RR with EPSILON_0 + batch_size can achieve (EPSILON, DELTA)-DP in the aggregate result.

Because applying symmetric RAPPOR to an one-hot Client measurement can cause the noisy measurement to have multiple bits set, we need to check the noisy measurement has at most m number of 1s, per Section 4.5 of [TWMJ_23], to ensure robustness against malicious Clients, who attempt to bias the final histogram by setting many coordinates to be 1.

Assume the length of the Client measurement is d, and there is exactly one bit set. For the d - 1 coordinates with 0s, the probability p_0 of changing a coordinate from 0 to 1 is 1 / (exp(EPSILON_0) + 1) per Section 4.3, so we can model the number of 1s in the noisy measurement as a binomial random variable C with number of trials d - 1, and probability p_0, plus the one bit that is already set. Our goal is to ensure the probability p of 1 + C exceeding m is small enough, i.e., the false positive rate of a noisy measurement from an honest Client having more than m bits is at most p. This is equivalent to finding m and p, such that the cumulative distribution function (CDF) satisfies Pr(C <= m - 1) >= 1 - p.

Once we find m, we will use it to instantiate Prio3MultiHotHistogram to perform verification and aggregation. The final aggregate result is debiased based on the number of measurements according to Section 4.3, in order to reduce the bias introduced during Client randomization.

class MultiHotHistogramWithClientRandomization(DpPolicy):
    Field = MultiHotHistogram.Field
    Measurement = MultiHotHistogram.Measurement
    AggShare = list[Field]
    AggResult = MultiHotHistogram.AggResult
    DebiasedAggResult = SymmetricRappor.DebiasedDataType

    def __init__(self, eps0: float):
        # TODO(junyechen1996): Justify how eps0 + batch_size can
        # achieve `(EPSILON, DELTA)`-DP.
        self.rappor = SymmetricRappor(eps0)

    def add_noise_to_measurement(self,
                                 meas: Measurement,
                                 ) -> Measurement:
        return self.rappor.add_noise(meas)

    def debias_agg_result(self,
                          agg_result: AggResult,
                          meas_count: int,
                          ) -> DebiasedAggResult:
        return self.rappor.debias(agg_result, meas_count)
6.1.1.1. Utility

As discussed in Section 4.3, as the number of Clients n increases, the noise at each coordinate of the debiased aggregate result approximates a Gaussian distribution with mean 0, and standard deviation sqrt(n * exp(EPSILON_0) / (exp(EPSILON_0) - 1)^2). We will look at the standard deviation generated by symmetric RAPPOR from n Clients, with parameter choices that achieve various combinations of (EPSILON, DELTA)-DP.

Table 1: Utility of Pure Client Randomization for histogram use case.
EPSILON DELTA Standard deviation Internal Parameters
0.317 1e-9 26.1337 n = 100000, EPSILON_0 = 5.0
0.906 1e-9 12.2800 n = 100000, EPSILON_0 = 6.5
1.528 1e-9 9.5580 n = 100000, EPSILON_0 = 7.0

6.1.2. Prio3Histogram with Aggregator Randomization

Aggregator Randomization requires Aggregators to add noise to their aggregate shares before outputting them. Under OAOC trust model, we can achieve good EPSILON-DP, or (EPSILON, DELTA)-DP, as long as at least one of the Aggregators is honest. The amount of noise needed by the Aggregator randomizer typically depends on the target DP parameters EPSILON and DELTA, and also the SENSITIVITY Section 3.2 of the aggregation function.

In this section, we describe how to achieve (EPSILON, DELTA)-DP for Prio3Histogram Section 7.4.4 of [VDAF] by asking each Aggregator to independently add discrete Gaussian noise to its aggregate share.

We use the discrete Gaussian mechanism described in Section 4.2, which has a mean of 0, and is initialized with a SIGMA parameter that stands for the standard deviation of the Gaussian distribution. In order to achieve (EPSILON, DELTA)-DP in the OAOC trust model, all Aggregators need to independently add discrete Gaussian noise to all coordinates of their aggregate shares.

Theorem 8 in [BW18] shows how to compute SIGMA parameter for continuous Gaussian, based on the given (EPSILON, DELTA)-DP parameters and L2-sensitivity, and Theorem 7 in [CKS20] shows a similar result for discrete Gaussian. For the current use case, the L2-sensitivity is sqrt(2), because transforming an one-hot vector into another will affect two coordinates, e.g., transforming an one-hot vector [1, 0] to [0, 1] changes L2-sensitivity by sqrt((1 - 0)^2 + (0 - 1)^2) = sqrt(2). Algorithm 1 in [BW18] elaborates on how to compute SIGMA, and we will refer to the calculation in the open sourced code [AGM].

  • JC: We will need to provide an explanation of the parameter calculation in the draft itself, instead of merely referring to the code.

  • KT: [CKS20] mentions in Theorem 7 about the difference of approximate-DP achieved by discrete [CKS20] and continuous Gaussian [BW18] that: The discrete and continuous Gaussian attain almost identical guarantees for large SIGMA, but the discretization creates a small difference that becomes apparent for small SIGMA. Therefore, if we use Algorithm 1 from [BW18], we will need to be careful when SIGMA is small, i.e., the desired (EPSILON, DELTA)-DP is weak.

  • JC: It's also worth exploring the utility of discrete Gaussian proven by Definition 3 in [CKS20], which defines the concentrated-DP achieved by discrete Gaussian. Concentrated-DP is then converted to approximate-DP, which is our target here. As a first draft, we won't overwhelm readers with other types of DP.

class HistogramWithAggregatorRandomization(DpPolicy):
    Field = Histogram.Field
    # A measurement is an unsigned integer, indicating an index less
    # than `Histogram.length`.
    Measurement = Histogram.Measurement
    AggShare = list[Field]
    AggResult = Histogram.AggResult
    # The final aggregate result should be a vector of signed
    # integers, because discrete Gaussian could produce negative
    # noise that may have a larger absolute value than the count
    # before noise.
    DebiasedAggResult = list[int]

    def __init__(self, epsilon: float, delta: float):
        # TODO(junyechen1996): Consider using fixed precision or large
        # decimal for parameters like `epsilon` and `delta`. (#23)
        # Transforming an one-hot vector into another will affect two
        # coordinates, e.g. from [1, 0, 0] to [0, 1, 0], so the change
        # in L2-sensitivity is `sqrt(1 + 1) = sqrt(2)`.
        dgauss_sigma = agm.calibrateAnalyticGaussianMechanism(
            epsilon, delta, math.sqrt(2.0)
        )
        self.dgauss_mechanism = \
            DiscreteGaussianWithZeroMean(dgauss_sigma)

    def add_noise_to_agg_share(self,
                               agg_share: AggShare,
                               ) -> AggShare:
        """
        Sample discrete Gaussian noise, and merge it with the
        aggregate share.
        """
        noise_vec = self.dgauss_mechanism.sample_noise(len(agg_share))
        result = []
        for (agg_share_i, noise_vec_i) in zip(agg_share, noise_vec):
            if noise_vec_i < 0:
                noise_vec_i = Field.MODULUS + noise_vec_i
            result.append(agg_share_i + self.Field(noise_vec_i))
        return result

    def debias_agg_result(self,
                          agg_result: AggResult,
                          meas_count: int,
                          ) -> DebiasedAggResult:
        # TODO(junyechen1996): Interpret large unsigned integers as
        # negative values or 0 properly. For now, directly return it,
        # since we haven't fully implemented discrete Gaussian
        # mechanism (#10).
        return agg_result
6.1.2.1. Utility

We will demonstrate the utility of this policy in the table below in terms of the standard deviation SIGMA in discrete Gaussian needed to achieve various combinations of (EPSILON, DELTA)-DP, based on the open-sourced code in [AGM].

It's worth noting that if more than one Aggregator is honest, we will lose some utility because each Aggregator is independently adding noise. The standard deviation in the aggregate result thus becomes SIGMA * sqrt(c), where c is the number of honest Aggregators. In the table below, the numbers in "Standard deviation" column assume c = 2. The numbers in "Standard deviation (OAOC)" column assume only one Aggregator is honest.

Table 2: Utility of Pure Aggregator Randomization for histogram use case.
EPSILON DELTA Standard deviation Standard deviation (OAOC)
0.317 1e-9 33.0788 23.3903
0.906 1e-9 12.0777 8.5402
1.528 1e-9 7.3403 5.1904

7. Security Considerations

TODO Security

8. IANA Considerations

This document has no IANA actions.

9. References

9.1. Normative References

[DAP]
Geoghegan, T., Patton, C., Rescorla, E., and C. A. Wood, "Distributed Aggregation Protocol for Privacy Preserving Measurement", Work in Progress, Internet-Draft, draft-ietf-ppm-dap-07, , <https://datatracker.ietf.org/doc/html/draft-ietf-ppm-dap-07>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[RFC8446]
Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, , <https://www.rfc-editor.org/rfc/rfc8446>.
[RFC9180]
Barnes, R., Bhargavan, K., Lipp, B., and C. Wood, "Hybrid Public Key Encryption", RFC 9180, DOI 10.17487/RFC9180, , <https://www.rfc-editor.org/rfc/rfc9180>.
[VDAF]
Barnes, R., Cook, D., Patton, C., and P. Schoppmann, "Verifiable Distributed Aggregation Functions", Work in Progress, Internet-Draft, draft-irtf-cfrg-vdaf-07, , <https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-vdaf-07>.

9.2. Informative References

[AGM]
"analytic-gaussian-mechanism", n.d., <https://github.com/BorjaBalle/analytic-gaussian-mechanism>.
[BS16]
Bun, M. and T. Steinke, "Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds", , <https://arxiv.org/abs/1605.02065>.
[BW18]
Balle, B. and Y. Wang, "Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising", , <https://arxiv.org/abs/1805.06530>.
[CKS20]
Canonne, C. L., Kamath, G., and T. Steinke, "The Discrete Gaussian for Differential Privacy", , <https://arxiv.org/abs/2004.00010>.
[DMNS06]
Dwork, C., McSherry, F., Nissim, K., and A. Smith, "Calibrating Noise to Sensitivity in Private Data Analysis", , <https://link.springer.com/chapter/10.1007/11681878_14>.
[DR14]
Dwork, C. and A. Roth, "The Algorithmic Foundations of Differential Privacy", , <https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf>.
[EFMR_20]
Erlingsson, Ú., Feldman, V., Mironov, I., Raghunathan, A., Song, S., Talwar, K., and A. Thakurta, "Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation", , <https://arxiv.org/abs/2001.03618>.
[EPK14]
Erlingsson, Ú., Pihur, V., and A. Korolova, "RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response", , <https://arxiv.org/abs/1407.6981>.
[FMT20]
Feldman, V., McMillan, A., and K. Talwar, "Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling", , <https://arxiv.org/abs/2012.12803>.
[FMT22]
Feldman, V., McMillan, A., and K. Talwar, "Stronger Privacy Amplification by Shuffling for Rényi and Approximate Differential Privacy", , <https://arxiv.org/abs/2208.04591>.
[KM11]
Kifer, D. and A. Machanavajjhala, "No free lunch in data privacy", , <https://dl.acm.org/doi/abs/10.1145/1989323.1989345>.
[KOV15]
Kairouz, P., Oh, S., and P. Viswanath, "The Composition Theorem for Differential Privacy", , <http://proceedings.mlr.press/v37/kairouz15.pdf>.
[Mir17]
Mironov, I., "Rényi Differential Privacy", , <https://arxiv.org/abs/1702.07476>.
[MJTB_22]
McMillan, A., Javidbakht, O., Talwar, K., Briggs, E., Chatzidakis, M., Chen, J., Duchi, J., Feldman, V., Goren, Y., Hesse, M., Jina, V., Katti, A., Liu, A., Lyford, C., Meyer, J., Palmer, A., Park, D., Park, W., Parsa, G., Pelzl, P., Rishi, R., Song, C., Wang, S., and S. Zhou, "Private Federated Statistics in an Interactive Setting", , <https://arxiv.org/abs/2211.10082>.
[MPRV09]
Mironov, I., Pandey, O., Reingold, O., and S. Vadhan, "Computational Differential Privacy", n.d., <https://link.springer.com/chapter/10.1007/978-3-642-03356-8_8>.
[TWMJ_23]
Talwar, K., Wang, S., McMillan, A., Jina, V., Feldman, V., Basile, B., Cahill, A., Chan, Y., Chatzidakis, M., Chen, J., Chick, O., Chitnis, M., Ganta, S., Goren, Y., Granqvist, F., Guo, K., Jacobs, F., Javidbakht, O., Liu, A., Low, R., Mascenik, D., Myers, S., Park, D., Park, W., Parsa, G., Pauly, T., Priebe, C., Rishi, R., Rothblum, G., Scaria, M., Song, L., Song, C., Tarbe, K., Vogt, S., Winstrom, L., and S. Zhou, "Samplable Anonymous Aggregation for Private Federated Data Analysis", , <https://arxiv.org/abs/2307.15017>.

Appendix A. Contributors

Pierre Tholoniat Columbia University pierre@cs.columbia.edu

Appendix B. Overview of Differential Privacy

Differential privacy is a set of techniques used to protect the privacy of individuals when analyzing user's data. It provides a mathematical framework that ensures the analysis of a dataset does not reveal identifiable information about any specific individuals. The advantage of differential privacy is that it provides a strong, quantifiable and composable privacy guarantee. The main idea of differential privacy is to add carefully calibrated noise to the results, which makes it difficult to determine with high certainty whether a specific individual's data was included in the results or not.

B.1. Differential privacy levels

  • KT: I think we should distinguish between the randomizer and the DP guarantee. So I have attempted to use Client randomizer and Aggregator randomizer to describe the noise addition by those two, and Local DP and Aggregator DP to refer to the privacy guarantee. The distinction is important because Client randomizer + aggregate gives an aggregator DP guarantee.

There are two levels of privacy protection: local differential privacy (local DP) and aggregator differential privacy (aggregator DP).

  • OPEN ISSUE: or call it secure aggregator dp, or central dp.

In the local-DP settings, Clients apply noise to their own measurements. In this way, Clients have some control to protect the privacy of their own data. Any measurement uploaded by a Client will be have local dp, Client's privacy is protected even if none of the Aggregators is honest (although this protection may be weak). Furthermore, one can analyze the aggregator DP guarantee with privacy amplification by aggregation, assuming each Client has added the required amount of local DP noise, and there are at least minimum batch size number of Clients in the aggregation.

In Aggregator randomization settings, an Aggregator applies noise on the aggregation. This approach relies on the server being secure and trustworthy. Aggregators built using DAP protocol is ideal for this setting because DAP ensures no server can access any individual data, but only the aggregation.

If there are no local DP added from client, noise added to the aggregation provides the privacy guarantee of the aggregation.

  • JC: For now, we have been assuming either Client randomization or Aggregator randomization gives the target DP parameters. Theoretically one can use the Aggregator randomization together with Client randomization to achieve DP. For example, if the DP guarantee can be achieved with Client randomization from a batch size number of Clients, and batch size is not reached when a data collection task expires, each Aggregator can "compensate" the remaining noise, by using the same Client randomizer, on the missing number of Clients being the gap between actual number of Clients and target batch size.

B.2. How to define neighboring batches

There are primarily two models in the literature for defining two "neighboring batches": deletion (or removal) of one measurement, and replacement (or substitution) of one measurement with another [KM11]. In the DAP setting, the protocol leaks the number of measurements in each batch collected and the appropriate version of deletion-DP considers substitution by a fixed value (e.g. zero). In other words, two batches of measurements D1 and D2 are "neighboring" for deletion-DP if D2 can be obtained from D1 by replacing one measurement by a fixed reference value.

In some cases, a weaker notion of adjacency may be appropriate. For example, we may be interested in hiding single coordinates of the measurement, rather than the whole vector of measurements. In this case, neighboring datasets differ in one coordinate of one measurement.

B.3. Protected entity

  • TODO: Chris P to fill: user or report, given time

B.4. Privacy budget and accounting

There are various types of DP guarantees and budgets that can be enforced. Many applications need to query the Client data multiple times, for example:

  • Federated machine learning applications require multiple aggregates to be computed over the same underlying data, but with different machine learning model parameters.

  • [MJTB_22] describes an interactive approach of building histograms over multiple iterations, and Section 4.3 describes a way to track Client-side budget when the Client data is queried multiple times.

  • TODO: have citations for machine learning

It’s hard for Aggregator(s) to keep track of the privacy budget over time, because different Clients can participate in different data collection tasks, and only Clients know when their data is queried. Therefore, Clients must enforce the privacy budget.

There could be multiple ways to compose DP guarantees, based on different DP composition theorems. In the various example DP guarantees below, we describe the following:

  • A formal definition of the DP guarantee.

  • Composition theorems that apply to the DP guarantee.

B.5. Pure EPSILON-DP, or (EPSILON, DELTA)-approximate DP

Pure EPSILON-DP was first proposed in [DMNS06], and a formal definition of (EPSILON, DELTA)-DP can be found in Definition 2.4 of [DR14].

The EPSILON parameter quantifies the "privacy loss" of observing the outcomes of querying two databases differing by one element. The smaller EPSILON is, the stronger the privacy guarantee is, that is, the outcomes of querying two adjacent databases are more or less the same. The DELTA parameter provides a small probability of the privacy loss exceeding EPSILON.

One can compose multiple (EPSILON, DELTA)-approximate DP guarantees, per Theorem 3.4 of [KOV15]. One can also compose the guarantees in other types of guarantee first, such as Rényi DP Appendix B.5.1, and then convert the composed guarantee to approximate DP guarantee.

B.5.1. (ALPHA, TAU)-Rényi DP

A formal definition of Rényi DP can be found in Definitions 3 and 4 of [Mir17].

The intuition behind Rényi-DP is to use TAU parameter to measure the divergence of probability distributions of querying two adjacent databases, given Rényi order parameter ALPHA. The smaller the TAU parameter, the harder it is to distinguish the outputs from querying two adjacent databases, and thus the stronger the privacy guarantee is.

One can compose multiple Rényi DP guarantees based on Proposition 1 of [Mir17]. After composition, one can convert the (ALPHA, TAU)-Rényi DP guarantee to (EPSILON, DELTA)-approximate DP, per Proposition 12 of [CKS20].

B.5.2. Zero Concentrated-DP

A formal definition of zero Concentrated-DP can be found in Definition 1.1 of [BS16].

Zero Concentrated-DP uses different parameters from Rényi-DP, but uses a similar idea to measure the output distribution divergence of querying two adjacent databases.

One can compose multiple zCDP guarantees, per Lemma 1.7 of [BS16].

B.6. Data type and Noise type

Differential Privacy guarantee can only be achieved if data type is applied with the correct noise type.

  • TODO: Junye to fill, mention DAP is expected to ensure the right pair of VDAF and DP mechanism

  • TODO: Chris P: we will mention Prio3SumVec because that's what we use to describe aggregator DP with amplification

Authors' Addresses

Junye Chen
Apple Inc.
Audra McMillan
Apple Inc.
Christopher Patton
Cloudflare
Kunal Talwar
Apple Inc.
Shan Wang
Apple Inc.