1. Sources of information
Most of the information in this section is taken from the ONS web site
at www.statistics.gov.uk/census2001/methodology.asp.
Fuller information can be found on that site.
More information for Scotland can be found in 'Scotland: Taking Scotland’s
2001 Census' at
www.gro-scotland.gov.uk/grosweb/grosweb.nsf/pages/cencr102.
All this material remains Crown Copyright.
Census 2001 General report for England and Wales:
This reviews the entire Census operation from the early consultation and planning stages, to the production and dissemination of outputs and evaluation. It provides a wealth of detail about how the Census was carried out and what lessons have been learned to take forward in the plans for any future censuses. It is aimed at both the experienced and occasional user of census data, but it is hoped the wider public may also find it useful and informative. For the full report please see the ONS web site.
The Census 2001 Quality report for England and Wales provides information
about all aspects of quality relating to the 2001 Census. It provides
an overview of the quality issues and the studies and analyses that have
been carried out to improve the quality of Census data. The report deals
with the life cycle of the Census project stage by stage, and then provides
measures of each of the attributes of quality as defined by the European
Statistical System. The final part describes the components of quality
of the data for each Census question. In conjunction with the Census 2001
General report for England and Wales, it provides a comprehensive evaluation
of the strengths and weaknesses of the Census operation. For the full
report please see the ONS web site.
2. Measures of quality in the 2001 Census
This section is extracted from
ONS Census 2001 quality report for England and Wales. For more information
see the ONS
web site.
This section discusses the concept of quality that is often described as ‘fitness for purpose’ in terms of user needs. The Office for National Statistics (ONS) has adopted the European Statistical System for the description of quality, which is based on these attributes:
• Relevance
• Accuracy
• Timeliness
• Accessibility and clarity
• Comparability
• Coherence
The section defines each of these attributes and explains how they impact on Census quality. The aim is to combine the various processes undertaken during the Census that contribute to that quality attribute.
Relevance:
Relevance reflects the degree to which statistical information meets the
needs and priorities expressed by users. Users need input into the topics,
concepts and definitions underlying data to ensure its relevance. ONS
has maintained close links with users, and undertook extensive user consultations
with the identified communities including Central Government, Local Authorities
(LAs), the health service, business and academics to gain an understanding
of users’ requirements for census information.
Accuracy:
Accuracy is the closeness between an estimated value and the unknown true
population value. There is no single aggregate or overall measure of accuracy.
However, it can be measured or described in terms of error, or the potential
significance of error, introduced through sources such as coverage, response
and processing.
The main errors that impacted on data accuracy were sampling and non-sampling error. As the 2001 Census was a measure of the whole population, there was no sampling error directly associated with it. However, as some under-enumeration occurred, sampling error was introduced by way of the imputation of additional people by the ONC process.
Timeliness:
Timeliness is the length of time between the date of the Census and the
availability of data. The time lag between fieldwork and results should
be minimised to permit the information to be of greatest value.
The 2001 Census was held on 29 April 2001 and the first results were published 17 months later on 30 September 2002. Data releases were staggered so that headline results were released as quickly as possible, with more detailed statistics released later. This allowed time for additional processing where required.
Accessibility and clarity:
Accessibility reflects the availability of information, taking into account
the suitability of the form the information is available in, the media
of dissemination and the availability of metadata. The affordability of
that information to users in relation to its value to them is also important.
Access to the main 2001 Census results is free and use is unrestricted. This is a major change from previous censuses, and reflects wider policies on access to government information.
Comparability:
Statistics are most useful when they enable comparisons across space and
time. The need for comparability with the 1991 Census was a key factor
in the design of the 2001 Census. However, changes in questions, concepts
and definitions between 1991 and 2001 were necessary to take into account
the need for harmonisation with other government surveys, to reflect changing
customer requirements and to take account of new and improved data collection
and processing methodologies.
Key changes between the 1991 and the 2001 Census included:
• Changes in population definitions, eg the enumeration of students
at their term-time residence rather than their vacation address as in
1991.
• Changes in geographic boundaries between 1991 and 2001.
• Changes in the methodologies used. The 2001 Census was the first
to adjust for under enumeration using the One Number Census process.
• Changes to the questions asked in the Census.
Coherence:
Coherence of data and information reflects the degree to which data can
be logically connected across other data sets. Statistics are coherent
if they are based on common definitions, classifications and methodological
standards. The messages that statistics convey to users will clearly relate
to each other and not contradict if the data is coherent.
ONS has been developing a programme of work to join up different statistics, ensuring a coherent, integrated presentation of data to users. Definitions have been harmonised across surveys, but there are differences in approach between, for example, self-completion questionnaires and interview surveys. Thus the Census questions to establish economic activity rates were somewhat more limited than those asked on the Labour Force Survey, where the interviewer can probe more deeply if required.
3. Response to the 2001 census
The proportion of people returning a census form in England and Wales
was 94 per cent. In Northern Ireland it was estimated that 95.2 per cent
of the population responded to the census and 95.3 per cent of the population
in private households.
The total overall response for England and Wales was 98 per cent –
including 4 per cent of the population estimated to be resident in households
identified by enumerators but who were imputed. Table 1 shows the components
of response at the 1991 and 2001 censuses.
| Table 1: Components of UK Census response and coverage rates for 1991 and 2001 - England and Wales | ||||||
| England | Wales | England & Wales | ||||
| 1991 | 2001 | 1991 | 2001 | 1991 | 2001 | |
| A People on returned forms: Census Response Rate |
96 | 94 | 97 | 94 | 96 | 94 |
| B Other people in identified households |
2 | 4 | 1 | 4 | 2 | 4 |
| A+B Total overall response |
98 | 98 | 98 | 98 | 98 | 98 |
| C People not included on returned forms and people in wholly missed households |
2 | 2 | 2 | 2 | 2 | 2 |
| Total | 100 | 100 | 100 | 100 | 100 | 100 |
| Proportion of population covered in census results: Census Coverage Rate 1991: A+B; 2001: A+B+C | 98 | 100 | 98 | 100 | 98 | 100 |
| Note: The 1991 rates shown are subject to slight change, but this does not affect the conclusions to be drawn from this analysis. | ||||||
Response by age and sex
Under-enumeration does not occur uniformly across all age-sex groups.
Response rates were lowest for persons in their twenties, particularly
men. Response levels by age-sex group for England and Wales as a whole
varied from 98 per cent for females aged 75-79 to 87 per cent for males
aged 20-24. A spreadsheet showing detailed rates for 1991 and 2001 is
available from the ONS web site.
Census response has declined between 1991 and 2001 for most age-sex groups.
Response rates have also declined for large-scale Government Surveys during
the 1990s and response rates for censuses conducted in other countries
have also fallen over the past decade, for example the level of under-enumeration
observed in the censuses of both Australia and New Zealand rose between
1996 and 2001.
A summary of response patterns is given below:
Response by area
Under-enumeration also varied by area with lowest response rates for inner city areas where characteristics known to be related to census non-response are most prevalent - multi-occupancy and higher proportions of non-English speaking population etc.
Table 2 shows response rates for area types. The 2001 census response was lower in all area categories than 1991, with broadly similar proportionate drops across all areas with the exception of Inner London and Outer London, which have higher decreases in response rates. Inner London had the lowest response rate in 1991 and recorded the largest absolute drop in 2001.
| Table 2 Census Response by area - England & Wales | ||||||
| All people | Male | Female | ||||
| 1991 | 2001 | 1991 | 2001 | 1991 | 2001 | |
| Inner London | 88% | 78% | 86% | 77% | 90% | 79% |
| Outer London | 96% | 90% | 95% | 89% | 96% | 90 |
| Main Metropolitan areas | 94% | 92% | 92% | 91% | 96% | 92% |
| Other metropolitan areas | 97% | 95% | 96% | 95% | 98% | 96% |
| Non-metropolitan cities | 95% | 94% | 93% | 93% | 96% | 94% |
| Other non-metropolitan areas | 97% | 96% | 97% | 96% | 98% | 97% |
| Cardiff, Newport & Swansea | 95% | 93% | 94% | 93% | 97% | 94% |
| Other Welsh areas | 97% | 94% | 97% | 94% | 98% | 95% |
| Total | 96% | 94% | 95% | 93% | 97% | 94% |
| Note: The 1991 rates shown are subject to slight change, but this does not affect the conclusions to be drawn from this analysis. | ||||||
4. The One Number Census
The 2001 Census aimed to maximise coverage and to make an accurate estimate
of the people missed. The 1991 Census was thought to have had a substantially
larger under-count than in previous censuses with about 2 per cent of
the population of GB missed entirely and a further 1.6 per cent for whom
records were imputed.
The One Number Census was designed to produce figures from the 2001 Census that are adjusted for under-enumeration and which are consistent across all forms of output and at the smallest geographical area. The term ‘One Number Census’ indicates a departure from the 1991 Census where preliminary figures from the census count were published and then later figures, adjusted for under-enumeration, were published. The One Number Census approach makes all adjustments as part of the census processing. Thus the One Number Census results in a database of the complete population for the UK from which all census outputs – including the SARs - are drawn.
Through the One Number Census the final census database should hold 100 per cent of the population. The One Number Census (ONC) aimed to integrate the 2001 Census counts with the estimated level of under-enumeration in the Census - that is the number of households and people not counted. It adjusted the Census database for the estimated undercount so that all statistics sum to ‘One Number’ - the national estimate of the population.
4.1 Step by step guide to the ONC
The One Number Census process involved a number of stages:
One of the key elements of the ONC was an independent follow-up survey, the Census Coverage Survey (CCS). This involved face to face interviews with a sample of 320,000 households from every local authority in the UK. But by combining the results of the census and the CCS, it was possible in 2001 to estimate the total resident population - the 'one number' - to a high level of precision, plus or minus 0.2 per cent.
4.2 The Census Coverage Survey (CCS)
The CCS was specifically designed to enable census population counts to
be adjusted for under-enumeration at the national, local and small area
level. It consisted of a completely independent and intensive face-to-face
survey of a sample of over 16,000 postcodes containing 320,000 households
drawn from all local authorities in England and Wales. The sample design
took into account the uneven distribution of under-enumeration across
the country by stratifying by a 'Hard to Count' index based upon characteristics
likely to be associated with under-enumeration, such as the number of
multi-occupied addresses.
The CCS was operationally independent from the census enumeration exercise. The CCS sample postcodes were kept confidential, CCS interviewers did not have any sight of the address lists produced in carrying out the census, nor the census forms returned in the area in which they were interviewing. The interviewers focused on making as many calls as necessary to achieve an interview and the timing of these calls was varied to maximise the probability of making contact.
The CCS in England and Wales achieved a response from 91 per cent of the households identified by interviewers. This is a high response rate for such a large-scale voluntary survey when compared to other national surveys. The survey succeeded in meeting its objective of identifying households and persons that had been missed by the 2001 census.
4.3 Quality assurance
All the ONC population estimates were subject to rigorous quality assurance. The population of each local authority by age and sex were considered in a consistent and detailed manner - this involved comparison against diagnostic ranges derived from rolled-forward population estimates and aggregated administrative sources (such as Birth Registration and Pensions data). Where the ONC estimates fell outside of the diagnostic ranges, extensive checks of the ONC results were undertaken with respect to sample sizes, outliers etc and contingency action was taken if any issues were identified.
The quality assurance process included analysis for each local authority of a number of specific population subgroups known from 1991 to be prone to under-enumeration. These were full-time students, home armed forces, foreign armed forces and their dependents and prisoners. The estimates for these subgroups were compared with data from other official sources to determine whether the results were plausible.
4.4 Dependence between census and CCS
For the ONC process to produce unbiased estimates of the population it is necessary for the census and Census Coverage Survey to be as independent of each other as possible. Practical arrangements were put in place to achieve this with census and CCS operations being kept entirely separate on the ground. If the two attempts at enumerating the same population are independent, it is possible to not only estimate those missed by either the census or CCS but to also estimate those missed by both - the dual system approach.
Through this approach, independence of process was achieved. However, there is an additional component of dependence which needs to be taken into account. This is dependence caused by the fact that those people who are difficult to count in a census are also difficult to count in a post-enumeration survey such as the CCS. This was expected and a methodology was developed to identify those areas where dependency was marked and to adjust for that dependence. This added an additional 230,000 to the ONC population estimates for England and Wales as a whole.
4.5 Overcount
Part of the CCS interview was also aimed at identifying any potential overcount in the 2001 census, that is persons incorrectly enumerated as resident at more than one address. Examples include second homes and children from broken homes living a proportion of time with each parent. Analysis of responses to the CCS indicated that the level of overcount in the 2001 census was negligible - less than 0.1 per cent of the population were estimated to have been counted twice.
The One Number Census Guide provides full details
of how the ONC was conducted and is available at www.statistics.gov.uk/census2001/onc.asp.
5. Quality of responses
In the 2001 Census a person was taken to exist if at least two of the
name, date of birth and sex fields were completed. Generally, forms were
accepted that contained a minimum of four items of information: name,
date of birth, sex and marital status. Table 3, extracted from information
on the ONS web site at www.statistics.gov.uk/census2001/proj_qr.asp,
shows item non-response rates to topic areas for England and Wales.
It is evident that item non-response is lowest for age, sex and marital
status and highest for company size and professional qualifications.
| Table 3: Item non-response rates, 2001 Census: England and Wales | |
| Topic | England and Wales |
| Age | 0.5% |
| Sex | 0.4% |
| Marital status | 0.8% |
| Student flag | 1.3% |
| CoB | 2.5% |
| Ethnic | 2.9% |
| Welsh Language | 5.5% |
| Religion | 7.6% |
| Health | 3.1% |
| Carer | 6.1% |
| Long-term illness | 3.9% |
| Address 1 year ago | 4.5% |
| Quals | 6.2% |
| Prof quals | 17.2% |
| Work last week | 2.1% |
| Employment status | 6.6% |
| Company size | 13.9% |
| Occupation * | 3.2% |
| Supervisor | 6.8% |
| Industry * | 7.8% |
| Workplace postcode | 7.8% |
| Method of Travel | 6.3% |
| Hours worked | 8.0% |
| Relationship to Person 1 | 3.5% |
| Accommodation type | 3.0% |
| Self-contained | 3.9% |
| Rooms | 5.4% |
| Bath/shower | 2.5% |
| Lowest floor level | 4.0% |
| Central Heating | 2.2% |
| No. of cars | 2.7% |
| Tenure | 3.4% |
| Landlord | 2.9% |
In 1991 the Census validation Survey provided evidence of the accuracy – or at least the consistency – with which census questions were answered. The Census Coverage Survey did not fulfil this function and the only information available comes from the Census Test that was conducted in 1997. More details can be found at www.statistics.gov.uk/census2001/proj_qr.asp.
6. Edit and imputation
As part of the planning to take account of under-enumeration through the
One Number Census, much more information was imputed in the 2001 Census
than for 1991. It is of great importance to analysts, particularly those
using microdata, to understand the methods used for editing and imputation.
For the SARs, all imputed records have been flagged to that users can
choose whether or not to include them in any analysis.
The material below is drawn from www.statistics.gov.uk/census2001/editimputevrep.asp.
More details are available at that site.
An Edit and Donor Imputation System (EDIS) was devised for the 2001 census.
Values would be set to missing for imputation if edit could not resolve
an inconsistency. A person was taken to exist if at least two of the name,
date of birth and sex fields were completed.
For the 2001 Census edit and imputation followed the principles below:
Methodology
EDIS can be sub-divided into five elements.
Multi-tick rules dealt with cases where more than one box was
ticked but only one option was allowed. In some cases there was a rule
for selecting one tick. If more than half the boxes were ticked or a set
of priorities for accepting one tick could not sensibly be made, the answer
was treated as missing and a value was supplied at the imputation stage.
Range checks were applied to prevent answers being outside an
acceptable range. These were set to missing for subsequent imputation.
Examples were households with 0 or more than 99 rooms, or with more than
20 cars, people with a date of birth before 1891 or after Census Day,
who last worked before 1941 or who worked more than 99 hours per week.
Filter rules were applied to resolve some inconsistencies and
to decide which fields should be set to 'No Code Required' where questions
were answered but should not have been. For example, people under 16 or
over 75 were not required to answer any of the employment questions. The
variable Activity Last Week was also derived at this stage.
A set of Edit rules was applied to missing items or responses which appeared
to be in error or inconsistent when compared with other data (such as
married couples of the same sex, a child less than 13 years younger than
its parents, or a married person under 16). These are known as hard checks.
In determining how to resolve such inconsistencies, the Fellegi/Holt principle
of making the minimum number of changes was followed as far as possible.
Thus if a person was under 16, married and had answered employment questions
such as occupation, Age would be set to missing, since the inconsistency
could be resolved with the least change by imputing a value for Age between
16 and 74.
Edit also identified unlikely, but not impossible responses. In some cases
rules were applied to eliminate these, for example, a purpose-built flat
was considered unlikely to have more than 10 rooms, and for reasons explained
below the value was set to ‘Missing’ for imputation. In others
no further action was taken, e.g. where people under 35 were retired from
paid work. The number of these ‘soft checks’ was reported
but the data were not changed as a result.
All items which were missing after the Edit stage were dealt with by the
Imputation component, which is described below.
Imputation was applied when there was no answer on the Census
form, it failed the multi-tick rules or was invalid, or the filter rules
or Edit marked it for imputation to resolve an inconsistency.
The principle of a Donor Imputation System is to search for a single donor
household to supply all the missing variables in a recipient household.
Exceptions are imputation for postcode of usual address one year ago and
of workplace, which were carried out at a later stage than imputation
for other variables.
The search looked at all records in an Estimation Area, a group of contiguous Local Authority Districts of about 500,000 population. The method searched for a donor using up to five matching variables, which were determined by the fields requiring imputation on the recipient record. Values were copied over from the donor household to fill the missing values on the recipient record. Consistency checks were then applied and the donor rejected if any check failed.
Potential donor households were scored using a second set of matching
variables relating to all people in the household. In addition, potential
donors were penalised if they had been used before as a donor or if any
of their fields had been edited or imputed. A record could not be used
as a donor if any of the fields to be imputed were also missing on the
donor. If potential donors still scored equally, the donor geographically
closest to the recipient was chosen. However, to improve efficiency of
the searching procedure, if a suitable donor was found who lived within
5,000 metres of the recipient, this person was accepted and no further
search took place to find a closer donor.
The intention was to use joint imputation where possible, i.e. selecting
a single donor household to impute for all the people with missing values
in a recipient household so as to preserve the joint distributions between
variables. If a suitable donor household could not be found for joint
imputation, separate donors were sought to provide values for each person
in the household requiring imputation, if necessary reducing the number
of matching variables.
A fallback stage was also required as donor imputation failed to work
for a few people. Most of these were imputed by testing possible values
at random until one could be found which met the consistency criteria
(a ‘cold deck’ approach). A small number of households could
still not be completely resolved because of inconsistencies in age and
relationships between people. As a final stage, if all else failed (‘son
of fallback’) those containing up to eight people were completely
replaced by synthetic households drawn at random from a set of the same
household size, and households of nine or more people were corrected clerically.
The aim of imputation was to estimate the distribution of missing values
accurately, so as to take account of any differences between the characteristics
of respondents and non-respondents (non-response bias). It was not expected
that the imputed values for every individual would be precisely accurate.
In comparison with 1991, EDIS was more comprehensive. It was applied to
all variables, including qualifications, relationships, occupation, industry,
hours worked, workplace address and means of transport to work, which
were only analysed for a 10 per cent sample of households and communals
in 1991.
There was some manual intervention in the 1991 processing system, such
as clerical checking of missing or inconsistent items which exceeded certain
tolerances. EDIS was almost entirely automatic as clerical intervention
was limited to households of more than eight people which failed the fallback
stage.
Edit
A total of 13.7 million edits were carried out on the data for 11.8m people.
The base population for EDIS was 49.4m people in England and Wales, including
some 0.6m students living away from home during term-time for whom only
a few demographic and relationship questions applied at their home address.
The eight most frequently executed edits accounted for 91 per cent of
the total. These were:
| 4.50m | Professional qualifications set to None where missing but educational qualifications was answered |
| 2.29m | Carer set to No where missing unless Activity Last Week was also missing |
| 1.66m | Workplace size set to 1-9 where person was self-employed |
| 1.08m | Travel to work set to “work mainly at/from home” where workplace address was “mainly work at/from home” |
| 1.03m | Supervisor set to No if missing, unless occupation was also missing |
| 1.01m | Health set to Good if missing, unless Activity Last Week was also missing |
| 0.59m | Professional qualifications set to missing if answered but educational qualifications was missing |
| 0.40m | Missing Country of birth set to that of either siblings, parents or other related people in the household who have the same Country of birth |
Imputation
One or more items needed to be imputed for 13.8m people - that is 28.0
per cent of the population who returned Census forms. Of these, 4.7m were
dealt with by joint imputation. 10.0m were imputed using individual imputation,
including all those in single person households. 9.8m of the individual
imputed cases used a donor household of the same size as the recipient’s
and the remaining 0.2m a household of different size. 0.4m people required
imputation using the cold deck fallback method. Over 1m people had some
items imputed by one method and some by another, hence there is some double-counting.
23.4 per cent of the population were used once as donors, 2.1 per cent
twice and 0.1 per cent three or more times. In the SARs the variable EDISDONO
provides information for whether an individual was used as a donor and,
if so, how many times.
For household variables, 2.5m needed imputation, 11 per cent of all households.
0.08m were dealt with by fallback and the remainder by joint imputation.
Almost all the donor households for joint imputation were used once each.
Person variables
| Total (including imputed) | Non- response | Imputed | Non- response | Imputed | |
| 000s | 000s | 000s | % | % | |
| Age | 49,359 | 262 | 278 | 0.53 | 0.56 |
| Sex | 49,359 | 199 | 219 | 0.40 | 0.44 |
| Marital status | 49,359 | 372 | 158 | 0.76 | 0.32 |
| Student flag | 49,359 | 622 | 641 | 1.26 | 1.30 |
| Country of birth | 48,848 | 1,211 | 829 | 2.48 | 1.70 |
| Ethnic group | 48,848 | 1,405 | 1,421 | 2.88 | 2.91 |
| Welsh language | 2,754 | 153 | 153 | 5.54 | 5.57 |
| Religion | 48,848 | 3,721 | - | 7.62 | - |
| Health | 48,848 | 1,525 | 531 | 3.12 | 1.09 |
| Carer | 48,848 | 2,967 | 693 | 6.07 | 1.42 |
| Long-term illness | 48,848 | 1,899 | 1,915 | 3.89 | 3.92 |
| Address one year ago | 48,848 | 2,198 | 2,213 | 4.50 | 4.53 |
| Educational qualifications | 35,367 | 2,187 | - | 6.18 | - |
| Professional qualifications | 35,367 | 6,094 | - | 17.23 | - |
| Highest qualification | 35,367 | - | 2,150 | - | 6.09 |
| Working last week | 35,367 | 737 | - | 2.08 | - |
| Activity last week | 35,367 | - | 1,301 | - | 3.69 |
| Employment status | 33,686 | 2,205 | 2,058 | 6.55 | 6.14 |
| Workplace size | 33,686 | 4,689 | 3,067 | 13.92 | 9.15 |
| Supervisor | 33,686 | 2,294 | 1,119 | 6.81 | 3.34 |
| Occupation - currently working | 21,741 | 694 | 759 | 3.19 | 3.48 |
| Occupation - all ever worked | 29,335 | 4,051 | 4,051 | 13.81 | 13.81 |
| Industry - currently working | 21,741 | 1,702 | 1,777 | 7.83 | 8.15 |
| Industry - all ever worked | 29,335 | 5,400 | 5,400 | 18.41 | 18.41 |
| Workplace address | 22,396 | 1,744 | 1,426 | 7.79 | 6.42 |
| Method of travel | 22,533 | 1,410 | 1,127 | 6.26 | 5.07 |
| Hours worked | 22,533 | 1,804 | 1,506 | 8.00 | 6.77 |
| Relationship to Person 1 | 28,065 | 971 | 1,326 | 3.46 | 4.73 |
Note: The ‘Total’ column refers to the number of people in
scope for the question, i.e.:
Age
Age was not reported or was out of range (born after Census day or more
than 110 years old) for 240,000 people. It was set to missing for a further
23,000 on grounds of inconsistency, mainly because people who were not
single and who had answered three or more employment questions had their
age captured as under 16.
The distribution of imputed ages followed that of the remainder of the
population except for a shortfall among the 0, 6-15 and 76-80 age groups.
This is primarily because some people were imputed as aged between 16
and 74 who may have been outside this age range because some employment
questions had been answered. The shortfall in babies under 1 year old
occurred where their address one year ago had not been stated as ‘no
usual address’. The effect in an area of 100,000 population would
typically be that 2 or 3 under 1’s would have been imputed as over
1.
Sex
Sex was missing for 185,000 people and multi-ticked for 14,000, 0.4% of
the population in total. There were no edit actions which directly affected
this question: if a husband and wife, or the parents of a child, were
of the same sex the relevant relationships were imputed. A further 20,000
had values imputed by 'son of fallback'.
The sexes were imputed in the ratio of 51:49 in favour of females, very
similar to the proportions among the remainder of the population. The
accuracy of imputations was assessed by comparing the imputed values with
people’s names in a sample of areas. This showed that 75 per cent
of imputations were correct. Among the incorrect values there was a very
slight bias towards imputing females. The net effect would be to count
four people out of every 100,000 as female rather than male.
Marital Status
There were 373,000 missing or multi-ticked cases for marital status, representing
0.8 per cent of the population. 232,000 of these were children under 16
who were set to Single in edit. A further 6,000 under 16s had marital
status changed to Single. Imputation was applied to the remainder. Married
and Re-married were less likely to be imputed than among the remainder
of the population.
Student
Question 5 on the person schedule asked whether a person was a schoolchild
or student in full-time education. 1.3 per cent of people failed to answer
or multi-ticked the question, of whom 13 per cent were imputed as students
compared with 21 per cent in the remainder of the population.
Country of Birth
Country of birth was omitted by 2.5 per cent of people. Of these, 88 per
cent were imputed as born in the United Kingdom, compared to 92 per cent
in the remainder of the population. People born in Africa, Asia and North
America were imputed in higher proportion than the remainder of the population.
Ethnic Group
The non-response rate for ethnic group was 2.9 per cent. 89 per cent of
these were imputed as White compared with 92 per cent in the remainder
of the population. There were higher proportions of imputed people in
the Mixed, Asian and Black groups.
| Imputed | Total (including imputed) | |||
| 000s | % | 000s | % | |
| White | 1,260 | 88.7 | 45,065 | 92.3 |
| Mixed | 24 | 1.7 | 605 | 1.2 |
| Asian | 80 | 5.6 | 1,925 | 3.9 |
| Black | 43 | 3.0 | 868 | 1.8 |
| Chinese and other | 13 | 0.9 | 382 | 0.8 |
Welsh Language
The question asking whether people could understand spoken Welsh, or speak,
read or write the language, was asked of all people living in Wales. There
was a 5.5 per cent non-response rate. No knowledge of Welsh was imputed
slightly more often than for the remainder of the population.
Religion
As the question on religion was voluntary, non-responses were not imputed
but will appear in tables as ‘not stated’. The national non-response
rate was 7.6 per cent.
General Health
This question asked whether over the last twelve months a person’s
health had on the whole been good, fairly good or not good. The non-response
rate was 3.1 per cent, but an edit rule set the value to good unless Activity
Last Week was also missing. This reduced the number requiring imputation
to 1.1 per cent. Among these people, Fairly Good and Not Good were imputed
slightly more frequently than in the remainder of the population.
Carer
Question 12 referred to voluntary help or support given to family members,
friends or neighbours. The rate of non-response was 6.1 per cent. Missing
values were set to No by an edit rule unless Activity Last Week was also
missing, and children under 5 were also assumed to not be providing care.
Of the remaining 1.3 per cent of the population, 11 per cent were imputed
as Carers in comparison to 10 per cent among the remainder of the population.
Long-term Illness
There was a 3.9 per cent non-response rate to this question, which asked
about any long-term illness, health problem or disability which limited
the person’s daily activities or the work they could do. 22 per
cent of these were imputed as having such a condition in comparison with
18 per cent among the remainder of the population.
Address One Year Ago
This question had a non-response rate of 4.5 per cent. No usual address
was imputed more often than among the remainder of the population, mainly
because there was a high rate of non-response for children under 1.
Qualifications
This topic was covered by two questions, on educational and professional
qualifications, which had non-response rates of 6.2 per cent and 17.2
per cent respectively. Where missing, professional qualifications was
set to None by an edit rule if the educational qualifications was answered.
Professional qualifications was set to missing if educational qualifications
was not answered. Taking the responses to the two questions together,
a new variable called highest qualification was derived. After applying
the edit rules, 6.1 per cent of people needed to have highest qualification
imputed. People with imputed values were more likely to have no qualifications
(Level 0) than the remainder of the population.
Activity Last Week
This variable shows whether a person was working in the week prior to
Census day, and if not whether they were looking for work, waiting to
start a job, retired, student, looking after home/family, permanently
sick or disabled, or otherwise economically inactive. This information
is derived from Questions 18 to 22 on the Census form for people aged
16 to 74.
Problems were found with the pattern of responses to these and other employment
questions which was caused by the format of Question 18 (Last week, were
you doing any work). Some people ticked No or multi-ticked this question,
but then went on to give details of their present job in answer to Questions
32 to 36. The filter rules were amended to accommodate this pattern so
that they were treated as working.
Non-response to working last week was 2.1 per cent. The value was changed
in certain cases depending on the pattern of responses to looking for
work etc (questions 19-22), ever worked and year last worked (question
23), details of current or last job at questions 25-30 and current job
at questions 32-35.
In total, 3.7 per cent of Activity Last Week values were imputed. These
were biased towards looking for work and most of the economically inactive
categories, especially retired and students. Only 34 per cent were imputed
as working compared with 64 per cent in the remainder of the population
aged 16-74. Generally it was people at the extremes of the age range who
failed to respond to these questions, which explains the preponderance
of retired people and students among the imputed values.
Employment Status
Question 25 asked whether each person who had ever worked was an employee,
or self-employed with or without employees in their current or last job.
Non-responses and multi-ticks amounted to 6.5 per cent of those who should
have answered the question. These all went through imputation, and ‘Employee’
was imputed more frequently than among the remainder of the population.
Size of Workplace
The non-response rate for this question was 13.9 per cent. An edit rule
was applied to set the number of workers to 1-9 where a person was self-employed
without employees. This left 6.5 per cent to be imputed, of whom slightly
fewer were set to 1-9 workers than among the remainder of the population,
and slightly more in the 10-24 and 25-499 ranges.
Occupation and Industry
The non-response rate for occupation was 3.2 per cent among currently
working people, including 0.7 per cent inadequately described responses.
When all people who had ever worked are considered, non-response rose
to 13.1 per cent. The imputed population was slightly biased towards people
in major groups 4 (administrative and secretarial), 7 (sales and customer
services), 8 (process, plant and machine operatives) and 9 (elementary
occupations). Occupation groups 2 (professional) and 3 (associate professional
and technical occupations) were under-represented.
A similar pattern can be found in non-response to the question on industry.
Non-response was 7.8 per cent among current workers, including 0.6 per
cent inadequately described, but reached 17.9 per cent taking into account
all people who have worked. Imputation created more people working in
sections A (agriculture), F (construction) and O (social and personal
services) and fewer in D (manufacturing), J (banking, finance, insurance),
L (public administration) and M (education).
It should be noted that the full codes were imputed for missing occupation
and industry data. However, the primary matching variables for these fields
were defined at the major group level. Thus if industry was reported but
occupation was missing, a donor would have been sought within the same
major industry group, and that person’s occupation copied into the
recipient’s record. In some cases an unlikely occupation/industry
combination may have been created at the individual code level.
Supervisor
Question 29 asked whether people supervised any other employees in their
current or last job. The non-response rate was 6.8 per cent. An edit rule
set missing answers to No unless occupation was also missing. This accounted
for about half the non-response. Of the remainder, 25 per cent were imputed
as supervisors compared with 30 per cent among the remainder of the population.
Workplace Address
There was a 7.8 per cent rate of non-response to this question, but some
values could be deduced from the answers to method of travel to work.
This left 6.4 per cent to be imputed. Of these, fewer were imputed as
working at or from home than amongst the remainder of the population.
Method of Travel to Work
This question was asked only of currently working people. Non-response
was 6.3 per cent, which was reduced to 5.0 per cent by a set of edits.
The imputed values were biased towards public transport users and those
travelling by foot and away from working at/from home or driving a car
or van.
Hours worked
The non-response rate was 8.0 per cent, and imputation favoured the 0-19
hours per week range compared to the pattern among the remainder of the
population.
Household Variables
| Total (including imputed) | Non- response | Imputed | Non- response | Imputed | |
| 000s | 000s | 000s | % | % | |
| Accommodation type | 22,305 | 671 | 671 | 3.01 | 3.01 |
| Self-contained | 22,305 | 870 | 870 | 3.90 | 3.90 |
| Number of rooms | 20,542 | 1,117 | 1,116 | 5.44 | 5.21 |
| Bath/shower and toilet | 20,542 | 503 | 503 | 2.45 | 2.35 |
| Lowest floor level | 22,305 | 897 | 919 | 4.02 | 4.12 |
| Central heating | 20,542 | 539 | 442 | 2.62 | 2.17 |
| Number of cars | 20,542 | 669 | 554 | 3.26 | 2.72 |
| Tenure | 20,542 | 797 | 685 | 3.88 | 3.36 |
| Landlord | 6,582 | - | 175 | - | 2.94 |
Note: The ‘Total’ column refers to the number of households
in scope for the question, i.e.:
Accommodation Type
There was a 3.0 per cent non-response rate for this question, which was
asked of all households. Imputed values were more likely to be a purpose-built
flat, part of a converted or shared house, or a commercial building, and
less likely to be a detached or semi-detached house.
Self-contained
This question had a non-response rate of 3.9 per cent. Of imputed households,
1.5 per cent were given not self-contained status compared with 1.1 per
cent among the remainder of the household population.
Number of Rooms
Question H3 provided two boxes for the number of rooms occupied by a household,
so that any value from 1 to 99 could be entered. Early analysis of processed
data showed that there were some problems which needed to be addressed:
After carrying out an analysis of households with more than 10 rooms,
rules were put in place to set values to missing where they were greater
than a number which depended on accommodation type. Number of rooms was
subsequently imputed. No limit was applied to detached houses.
Imputation was slightly more likely to set a value of 3 or 4 rooms, and
less likely to impute 5 or more rooms, compared with the remainder of
the household population.
Bath/shower and toilet
Question H4 asked whether a bath/shower and toilet was available for use
only by the household. There was a non-response rate of 2.5 per cent and
slightly more households were imputed as lacking sole use than among the
remainder of the household population.
Lowest floor level
4.0 per cent of households failed to answer this question. Fewer were
imputed as having ground floor as their lowest level of accommodation
than the remainder of the household population.
Central heating
This question had a non-response rate of 2.6 per cent. Non-respondents
were slightly more likely to lack central heating than for the remainder
of the household population.
Number of cars
There was a 3.3 per cent non-response to this question. 35 per cent of
these households were imputed as having no cars compared with 26 per cent
for the remainder of the household population.
Tenure and Landlord
Non-response to these questions was 3.9 per cent for tenure and 2.9 per
cent for landlord. Those not answering were more likely to be renting
and less likely to be outright owners than in the remainder of the population.
Among tenants, there was little bias towards any type of landlord among
the imputed group.
Comparison with 1991
In general, the biases found in the imputed values for the person and
household variables were in the same direction as those present in the
1991 Census data, but were less marked. For example, 52 per cent of those
imputed in 1991 for marital status were assigned as Single compared with
41 per cent in the Census population. In 2001 the corresponding proportions
were 49 per cent and 44 per cent. 53 pre cent of non-respondents were
imputed as having no car in 1991, considerably higher than the 32 per
cent among reporting households. In 2001, when the non-response rate had
risen from 1.0 per cent to 3.3 per cent, the gap had narrowed to 34 per
cent among imputed households and 27 per cent in those who responded.
How well did EDIS work?
Within EDIS, a number of assumptions were based on age being correct rather
than other items. However, year of birth was occasionally mis-stated,
not scanned correctly or given a wrong value during processing. Particularly
when there was an error in the next to last digit of the year, EDIS may
have imputed for a range of items where no value was needed, or conversely
set reported data to ‘no code required’.
Late changes to the questionnaire design had some impact on EDIS. Splitting the qualifications questions into two parts, which occurred after the 1999 Rehearsal, meant that new rules had to be devised for 2001 to deal with professional qualifications. This turned out to be the question having the largest non-response rate as many people considered that it did not apply to them.
As extra room had to be found on the form for qualifications, the question on work last week was squashed into a smaller space. As a result, two of the bullet points which appeared separately on the Rehearsal form were conflated into one, and it appears from the pattern of responses to this and later questions that some form-fillers misunderstood the question and answered No when they were actually in work. A resolution was found to this problem by amending the filter rules for the derivation of Activity Last Week but a small number of answers may have been miscoded as a result of the extra complication which was introduced.
A single edit and imputation system was designed to deal with the censuses in England, Wales, Scotland and Northern Ireland, which all had slightly different requirements. Variations in the design of the Census form and in editing requirements meant that great attention had to be devoted to ensuring that the processing for each country was carried out to the desired standards.
Conclusion
EDIS was successful in its main aim of providing a complete and consistent
database of values for all people who completed Census returns. It did
so efficiently and largely followed standard principles of making minimum
changes to the data. There were complications in its development including
late amendments, some of which could have been avoided with earlier access
to live data and others which were due to changes between Rehearsal and
the final version of the Census. However, these issues were identified
at an early stage of Census processing.
Further results on the performance of EDIS will be reported in the 2001
Census Quality Report, which is due to be published later this year.
Last updated 25 April 2007