Home

Specifications

Schema

Commentary

Mark Wahl


Web Design by
Kristen Lanum

Commentary by Mark Wahl, CISA

Organizing principles for systems:
Attacks on anonymized social networks and fudging oracles (20070616)

In an anonymized social network scenario, the social network is represented as a graph data structure, in which there are nodes (users) and edges (links or communications between users), and might be viewable even by a non-participant of the social network. The nodes have arbitrary identifiers: someone viewing the graph for data analysis can differentiate between two nodes, but the node identifier is distinct from the actual user's name or userid.

The WWW 2007 conference paper "Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography" by Lars Backstrom (Cornell), Cynthia Dwork (Microsoft Research) and Jon Kleinberg (Cornell) discusses two kinds of attacks on an anonymized social network: passive attacks, in which users find themselves in the anonymized network and then discover connections of other users, and active attacks, in which an attacker creates user accounts and uses these accounts to find targeted users.

The researchers used data derived from the LiveJournal web site, with a graph of 4.4 million anonymized nodes and 77 million edges, and found that

"the creation of 7 nodes by an attacker (with degrees comparable to those of typical nodes in the network) can compromise the privacy of roughly 2400 edge relations on average. Moreover, experimental evidence suggests that it may be very difficult to determine whether a social network has been compromised by such an active attack."

and for the passive attack,

"...since we find in practice that the passive attack will succeed for the majority of the population, it says in effect that most people in a large social network have laid the groundwork for a privacy-breaching attack simply through their everyday actions, without even realizing it."

The authors mention in their conclusions the potential role of differential privacy. The paper "differential privacy" by Cynthia Dwork suggests a model in which

"any given disclosure will be, within a small multiplicative factor, just as likely whether or not the individual participates in the database. As a consequence, there is a nominally increased risk to the individual in participating, and only nominal gain to be had by concealing or misrepresenting one’s data. Note that a bad disclosure can still occur, but our guarantee assures the individual that it will not be the presence of her data that causes it, nor could the disclosure be avoided through any action or inaction on the part of the user."

The 2006 Theory of Cryptography conference paper "calibrating noise to sensitivity in private data analysis" by Cynthia Dwork, Frank McSherry (Microsoft Research), Kobbi Nissim (Ben-Gurion University) and Adam Smith (Weizmann Institute) discusses how this could be provided by an identity provider,

"... a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user."

However, the paper "the price of privacy and the limits of LP decoding" by Cynthia Dwork, Frank McSherry and Kunal Talwar (Microsoft Research) suggests that

"... any privacy mechanism, interactive or non-interactive, providing reasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private."