The Cathie Marsh Centre for Census and Survey Research
By accessing this site you agree to be tracked by Google analytics cookies.

An Investigation of the Disclosure Risk Issues Posed by the GRID


Funder: ESRC E-Science Pilots Project
Researchers: Mark Elliot, Stephen Pickles, Kingsley Purdam and Duncan Smith
Dates: November 2003 to October 2004.

Personal data is increasingly becoming a key tool in service delivery and policy making. The expansion of data collected and data sets compiled will provide new opportunities to study and explain social issues, and inform policy and service delivery. GRID technology opens up a range of opportunities to enhance existing data sources and data quality to inform research, policy and service delivery. Possibilities at present include: linking data and data sets online; distributed data storage and data processing for large-scale data sets and complex analyses; data mining across different data sets; real time data updates.

Background

The use of the GRID in handling personal data raises a number of new issues in respect of privacy and disclosure control. Different data sets are likely to have been collected under different terms of use and they are also likely to contain variables that have different levels of sensitivity and different levels of disclosure risk.

Aims and Objectives

To provide an initial assessment of the additional risks of statistical disclosure posed by the GRID.

  1. To examine the current availability of personal data over the GRID, and that projected in the near future, to produce a scenario analysis of the disclosure risk arising from such data.
  2. To examine the compatibility of confidentiality agreements across key anonymised data sets that could be used as part of the GRID data resources.
  3. To design methods of disclosure risk assessment which take account of the additional disclosure risks posed by the GRID and GRID available data.
  4. To implement pilot software demonstrating the disclosure risk assessment methods.

Methods

  • The collection and analysis of confidentiality agreements across a number of case study surveys and pilot data sets. This includes semi-structured interviews with database managers.
  • A review of publicly available data in the UK using online search techniques and form filed analysis.
  • An investigation of graphical methods to account for the relationships between distinct databases.
  • An empirically based simulation, leading to the development of pilot software.

Out of this research came CCSR Occassional Paper 25 - An Experiment in naïve Bayesian Record Linkage.

The software used in this project is available for download and reuse: software - SDC.zip and documentation - SDC_API.zip.

University of Manchester CCSR