1. Introduction
The 1991 SARs have been widely used for a range of high quality research. The key findings and publications are available on the SARs web site.
Following extensive consultation with users a request was submitted to ONS in September 2001 for three datasets: a 3 per cent Individual SAR, a 1 per cent Household SAR and a 5 per cent Small Area Microdata file (SAM). The latter was particularly requested by geographers who were concerned to obtain more geographical detail at the expense of individual information. The full details of the justification for the request and the specifications can be downloaded from www.ccsr.ac.uk/sars/2001/request/sarequest.pdf.
In brief the request sought to:
However, increased concerns about confidentiality of microdata has resulted in the Individual and Household files being significantly less detailed than the original request and also less detailed than the 1991 SARs. In response to this ONS have established Controlled Access Microdata Samples (CAMS) which are only accessible within a safe setting in the statistical offices of the UK.
Both concern over under-enumeration and the importance of obtaining accurate population estimates and also concern over confidentiality have had marked influences on the 2001 census. The former has resulted in the ‘One Number Census’ and the latter has influenced both the outputs available and also their timing. Both are discussed in the sections below.
1.1 The One Number Census
The 2001 Census aimed to maximise coverage and to make an accurate estimate
of the people missed. The 1991 Census was thought to have had a substantially
larger under-count than in previous censuses with about 2 per cent of
the population of GB missed entirely and a further 1.6 percent for whom
records were imputed.
The One Number Census was designed to produce figures from the 2001 Census that are adjusted for under-enumeration and which are consistent across all forms of output and at the smallest geographical area. The term ‘One Number Census’ indicates a departure from the 1991 Census where preliminary figures from the census count were published and then later figures, adjusted for under-enumeration, were published. The One Number Census approach makes all adjustments as part of the census processing. Thus the One Number Census results in a database of the complete population for the UK from which all census outputs – including the SARs - are drawn.
The key stages of the ONC can be summarised as follows:
a) A Census Coverage Survey (CCS), undertaken independently of the Census,
was designed to establish the coverage of the 2001 Census. For the CCS,
the UK was divided into one hundred and twelve areas, each with a population
of about 500,000. These areas are known as design groups and are made
up of whole LADs or groups of smaller LADs. The CCS took place in all
of these design groups.
b) The CCS records are matched with those from the Census using a combination
of automated and clerical matching.
c) Populations for each design group, by age and sex, are estimated using
a combination of standard estimation techniques.
d) Small area estimation techniques are used to estimate Local Authority
District populations by age and sex.
e) Households and individuals estimated to be missed by the Census are
imputed to produce a fully adjusted Census database.
f) All ONC population estimates are quality assured using demographic
analysis and aggregate level administrative data.
More detail on the One Number Census is available at www.statistics.gov.uk/nsbase/census2001/pdfs/oncguide.pdf.
2. Individual Licensed SAR
The Individual SAR (Licensed) is safe data that is available to registered users for analysis outside ONS. It is a 3 per cent sample and contains 1,843,530 individuals and includes information on age, gender, ethnicity, health, employment status, housing, amenities, family type, geography, social class, education, distance to work, workplace, hours worked and migration. The 3 per cent sample is an increase by comparison with 2 per cent in 1991.
In addition, the ONS have added occupational coding, not available in the census tables, for individuals aged 16-65 who last worked more than 5 years ago but less than ten years ago and for those aged 65-74 who were not currently working at the census but who had worked in the previous ten years. A full list of variables is available.
The lowest level of geography is the Government Office Region, although Inner and Outer London are separately identified. This represents a significant reduction by comparison with the 1991 where large Local Authorities (population 120K and over) were separately identified. A quick comparison between the 1991 and 2001 SARs can be found in section 5.5 of the user guide.
The data are available online to registered users. There is no charge for academic use. Public sector bodies can obtain the data free of charge and the business sector are charged £1000 per file. The licence will entitle the organisation to receive the data which may then be accessed by ten people, all of whom will have to sign a user undertaking. Details of registration and access are available at www.ccsr.ac.uk/sars/access/.
3. Special Licence Household Sample of Anonymised Records (SL-HSAR)
The Special Licence Household SAR (SL-HSAR) is a 1 per cent sample of households and all those individuals in those households from the 2001 Census. It is a hierarchical file allowing linkages to be made between individuals within families and households. The Special Licence Household SAR contains information on age, gender, ethnicity, marital status, social class, education and employment status. It also includes household level variables, e.g. housing tenure and number of cars. A number of derived variables have been added, for example, the number of full time earners in a household or the age of the youngest dependent child in a household.
The 2001 Special Licence Household SAR is available for England and Wales only. The SL-HSAR represents a one percent sample of all households drawn from the 2001 Census. It comprises 225436 household records and 525715 individual records. Individual records are available only for households with 11 or fewer residents. Household records include a small number of empty households. For households of 12 or over, only household level variables are available – there are no individual records. To protect confidentiality age has been grouped into two-year bands and there is no geographical breakdown available. There has been a small amount of perturbation to protect confidentiality.
Whilst the actual numbers of individuals and households are relatively small, large households are not randomly distributed in the population. For example, the loss of this information would disproportionately affect Pakistani and Bangladeshi ethnic groups and would bias estimates of overcrowding and various forms of deprivation.
This file is only available
under an Office for National Statistics (ONS) special licence, via the
UK Data Archive. Users have to agree to keep the data under secure conditions
and institutions are responsible for ensuring these conditions are met.
For more information see the UKDA
web page.
The specification and codebook of the SL-HSAR can be found in the SARs web pages.
The sample excludes those in communal establishments. It includes households with dummy forms and also ‘students living-away’ who provide very limited individual information because they should be fully enumerated at their usual term-time residence.
The variable ‘popbase’ allows users to identify these categories
and select their required population base. The file was sampled from the
‘one-number census’ database. This includes data which was
imputed for non-respondent individuals or households. Imputed cases can
be identified using the “oncperim” variable.
4. Small Area Microdata
The Small Area Microdata (SAM) is a 5 per cent sample of individuals from the 2001 Census with local authority as the lowest level of geography. Because of confidentiality concerns there is less individual detail than on the Individual SAR. The strength of the data lies in the more detailed geography. The SAM is for all countries of the UK, with 2.96 million cases. Local Authority is the lowest level of geography for England and Wales, Council Areas for Scotland and Parliamentary Constituencies for Northern Ireland. The Scilly Isles have been merged with Penwith and the City of London with Westminster. For Scotland, Orkney and Shetland are merged into one area. All other areas are identified. The file contains less individual detail than the Individual SAR:
• Age in 13 categories
• Economic activity in 4 categories
• Ethnic group in 13 categories in England and Wales, 8 in Scotland
and 2 in Northern Ireland
• NS-SEC in 8 categories
Further details are on the SAM web pages.
5. Controlled Access Microdata Samples
In recognition of the reduction in detail in the Individual SAR, a more detailed dataset is available in a safe setting. Initially the data is available (version 1) with full ethnic group information (16 categories), age in single years to 95, SOC minor and 60 categories of industry. This file will also contain details of local authority and the Index of Multiple deprivation. A codebook and specifications for the individual data are currently available on the CAMS web pages.
A Household Controlled Access Microdata Sample (Household CAMS) is also available and includes data for Scotland and Northern Ireland. A codebook is available.
The files are available for research use only and applications must be made to the Census Research Access Board at the Office for National Statistics. Details of access are available for the Individual CAMS and the Household CAMS.
A quick
comparison of the different specifications for all variants of the
2001 SAR files can be found in section 4 of this user guide.
6. CAMS Test File
The CAMS test file is a sub-sample of 298,912 cases from the Individual Controlled Access Microdata Sample from the 2001 Census. Variables have been perturbed to ensure that no sample members can be identified. Perturbation has retained the correct distribution of each variable but the relationships between variables will not give expected results. The test file can therefore be used to develop and test syntax for analyses before going to use the CAMS at ONS. It cannot be used to test exploratory analyses with any reliability, nor will you be able to test statistical procedures which are dependent on distributions. It is not suitable for research purposes and has been provided only as a dummy dataset for preparing syntax in advance of using the CAMS file.
The protection comes from perturbing multiple variables per person with a high probability of change and by providing no indication of whether a variable value has been perturbed.
Availability of the synthetic test dataset
The CAMS test file is available for download under the standard End User Licence as SPSS and STATA files. For further information see the SARs web pages.
Coverage
The CAMS test file only covers:
• England and Wales
• Individuals in private households.
and excludes:
• Students living away from their parental home
• households of size 6 or more
These limitations do not apply to the full CAMS files. Other differences apply in terms of variable availability. Users should consult the codebook for the full CAMS file prior to applying to use the data or preparing syntax.
The following variables
are in the full CAMS file but not in the test file
| cestatux |
Status in communal establishment (extended) |
| cetypews | Type of Communal Establishment, England Wales and Scotland |
| cetypn | Type of Communal Establishment, Northern Ireland |
| cobpuk | Country of birth |
| combgn | Community background, religion or religion brought up in |
| distmov0 | Distance of move for migrants - distance in bands |
| dstwrk0 | Distance to Work (Including Study in Scotland) |
| ethnx | Ethnic Group for Northern Ireland |
| ethsx | Ethnic Group for Scotland |
| furn | Accomodation Furnished (Scotland Only) |
| gaelread | Whether reads Gaelic (Scotland only) |
| gaelspk | Whether speaks Gaelic (Scotland only) |
| gaelstnd | Whether understand Gaelic (Scotland only) |
| gaelwrit | Whether Writes Gaelic (Scotland only) |
| irisread | Whether reads Irish (NI only) |
| irisspk | Whether Speaks Irish (NI only) |
| irisstnd | Whether Understands Irish (NI only) |
| iriswrit | Whether Writes Irish (NI only) |
| isco | International standard classification of occupations |
| qualvs | (Scotland only) - Level of Highest Qualifications (Aged 16 to 74) |
| relgn | Religion (Northern Ireland) belongs to/brought up in |
| relgs1 | Religion belongs to (SCOTLAND) |
| relgs2 | Religion brought up in (SCOTLAND) |
| roomsflr | Rooms used by Household on More than 1 Floor (NI only) |
| socunit | Occupations (SOC 2000 unit) |
| tenursn | Tenure of Accommodation (Scotland and Northern Ireland only) |
| urbrurs | Urban/Rural - Scotland |
The following variables are in the test file but not in the full CAMS file
| dcobuk | Derived country of birth |
| ddistmov | Distance moved for migrants-derived banded variable |
Last updated 25 April 2007