An Investigation of the Disclosure Risk Issues Posed by the GRID
Funder: ESRC E-Science Pilots Project
Researchers: Mark Elliot, Stephen Pickles, Kingsley Purdam and Duncan Smith
Dates: November 2003 to October 2004.
Personal data is increasingly becoming a key tool in service delivery and policy making. The expansion of data collected and data sets compiled will provide new opportunities to study and explain social issues, and inform policy and service delivery. GRID technology opens up a range of opportunities to enhance existing data sources and data quality to inform research, policy and service delivery. Possibilities at present include: linking data and data sets online; distributed data storage and data processing for large-scale data sets and complex analyses; data mining across different data sets; real time data updates.
BackgroundThe use of the GRID in handling personal data raises a number of new issues in respect of privacy and disclosure control. Different data sets are likely to have been collected under different terms of use and they are also likely to contain variables that have different levels of sensitivity and different levels of disclosure risk. Aims and ObjectivesTo provide an initial assessment of the additional risks of statistical disclosure posed by the GRID.
|
Methods
Out of this research came CCSR Occassional Paper 25 - An Experiment in naïve Bayesian Record Linkage. The software used in this project is available for download and reuse: software - SDC.zip and documentation - SDC_API.zip. |
