watermarking for privacy (with no durian)
In Scalable watermarking for identifying large language model outputs, published by Dathathri, See, Ghaisas, et (many) al. in Nature of 23 October 2024, the authors propose an algorithm to (voluntarily) watermark synthetic texts to identify them as such, through a statistical test. Here are a few quotes to relate to the authors’ solution.
“LLMs generate text based on preceding context (…) given a sequence of input text x<t = x1, …, xt−1 consisting of t − 1 tokens from a vocabulary V, the LLM computes the probability distribution pLM(⋅∣x<t) of the next token xt given the preceding text x<t. To generate the full response, xt is sampled from pLM(⋅∣x<t), and the process repeats until either a maximum length is reached or an end-token is generated.
In a watermarking scheme, a sampling algorithm is an algorithm that takes as input a probability distribution p ∈ ΔV and a random seed and returns a token.
Tournament sampling selects a token from the LLM distribution that is likely to score higher under the random watermarking functions (…) Given the selection of tokens xt based on higher g-values, we expect watermarked text generally to score higher under this score than unwatermarked text (…) [It] requires g-values to decide which tokens win each match in the tournament. Intuitively, we want a function that takes a token x ∈ V, a random seed and the layer number ℓ ∈ {1, …, m}, and outputs a g-value gℓ(x, r) that is a pseudorandom sample from some probability distribution fg (the g-value distribution).”
At the Ocean privacy workshop, someone came with the question of trusting (or not) synthetic data and I remembered this article. Suggesting watermarking for said synthetic data by having providers or agents (privately) running a disclosed or registered code that delivers a watermark which provides a strong support to the synthetic data being produced likewise. I remain uncertain this is at realistic.

Leave a comment