Understanding Synthetic Data
Dates: 13th October 2010
Duration: 1 day (9.30am — 4:30pm)
Level: Intermediate
Course Fee: £175 (£125 for those from educational institutions)
CCSR offer 5 free places to research staff and students within the Faculty of Humanities.
Course Leader:
Jerry Reiter
Course Requirements: Participants are expected to be familiar with statistical modeling at the level of linear and logistic regression analysis.
Course Summary
This training course will cover approaches to protecting the confidentiality of public use microdata based on multiple imputation, also called synthetic data. Topics will include:
a) overview of common disclosure control methods
b) motivation for synthetic data
c) types and examples of synthetic data
d) analysis methods for synthetic data
e) generation of synthetic data
f) disclosure risk assessment for synthetic data
g) data utility evaluation of synthetic data
When done with this course, participants should understand the benefits and limitations of synthetic data approaches; know of approaches for generating, evaluating, and analyzing synthetic data; and, have useful references for learning more about the method.
Participants are expected to be familiar with statistical modeling at the level of linear and logistic regression analysis. The course emphasizes conceptual understanding and intuitions more than mathematical derivations. The course will include pointers to useful computational tools for implementing synthetic data. The course will not include live generation of synthetic data.
