The accuracy and quality of data in the 2001 Census

1. Sources of information


Most of the information in this section is taken from the ONS web site at www.statistics.gov.uk/census2001/methodology.asp. Fuller information can be found on that site.


More information for Scotland can be found in 'Scotland: Taking Scotland’s 2001 Census' at
www.gro-scotland.gov.uk/grosweb/grosweb.nsf/pages/cencr102.


All this material remains Crown Copyright.

Census 2001 General report for England and Wales:

This reviews the entire Census operation from the early consultation and planning stages, to the production and dissemination of outputs and evaluation. It provides a wealth of detail about how the Census was carried out and what lessons have been learned to take forward in the plans for any future censuses. It is aimed at both the experienced and occasional user of census data, but it is hoped the wider public may also find it useful and informative. For the full report please see the ONS web site.


The Census 2001 Quality report for England and Wales provides information about all aspects of quality relating to the 2001 Census. It provides an overview of the quality issues and the studies and analyses that have been carried out to improve the quality of Census data. The report deals with the life cycle of the Census project stage by stage, and then provides measures of each of the attributes of quality as defined by the European Statistical System. The final part describes the components of quality of the data for each Census question. In conjunction with the Census 2001 General report for England and Wales, it provides a comprehensive evaluation of the strengths and weaknesses of the Census operation. For the full report please see the ONS web site.

2. Measures of quality in the 2001 Census

This section is extracted from ONS Census 2001 quality report for England and Wales. For more information see the ONS web site.

This section discusses the concept of quality that is often described as ‘fitness for purpose’ in terms of user needs. The Office for National Statistics (ONS) has adopted the European Statistical System for the description of quality, which is based on these attributes:


• Relevance
• Accuracy
• Timeliness
• Accessibility and clarity
• Comparability
• Coherence

The section defines each of these attributes and explains how they impact on Census quality. The aim is to combine the various processes undertaken during the Census that contribute to that quality attribute.

Relevance:
Relevance reflects the degree to which statistical information meets the needs and priorities expressed by users. Users need input into the topics, concepts and definitions underlying data to ensure its relevance. ONS has maintained close links with users, and undertook extensive user consultations with the identified communities including Central Government, Local Authorities (LAs), the health service, business and academics to gain an understanding of users’ requirements for census information.

Accuracy:
Accuracy is the closeness between an estimated value and the unknown true population value. There is no single aggregate or overall measure of accuracy. However, it can be measured or described in terms of error, or the potential significance of error, introduced through sources such as coverage, response and processing.

The main errors that impacted on data accuracy were sampling and non-sampling error. As the 2001 Census was a measure of the whole population, there was no sampling error directly associated with it. However, as some under-enumeration occurred, sampling error was introduced by way of the imputation of additional people by the ONC process.

Timeliness:
Timeliness is the length of time between the date of the Census and the availability of data. The time lag between fieldwork and results should be minimised to permit the information to be of greatest value.

The 2001 Census was held on 29 April 2001 and the first results were published 17 months later on 30 September 2002. Data releases were staggered so that headline results were released as quickly as possible, with more detailed statistics released later. This allowed time for additional processing where required.

Accessibility and clarity:
Accessibility reflects the availability of information, taking into account the suitability of the form the information is available in, the media of dissemination and the availability of metadata. The affordability of that information to users in relation to its value to them is also important.

Access to the main 2001 Census results is free and use is unrestricted. This is a major change from previous censuses, and reflects wider policies on access to government information.

Comparability:
Statistics are most useful when they enable comparisons across space and time. The need for comparability with the 1991 Census was a key factor in the design of the 2001 Census. However, changes in questions, concepts and definitions between 1991 and 2001 were necessary to take into account the need for harmonisation with other government surveys, to reflect changing customer requirements and to take account of new and improved data collection and processing methodologies.

Key changes between the 1991 and the 2001 Census included:


• Changes in population definitions, eg the enumeration of students at their term-time residence rather than their vacation address as in 1991.
• Changes in geographic boundaries between 1991 and 2001.
• Changes in the methodologies used. The 2001 Census was the first to adjust for under enumeration using the One Number Census process.
• Changes to the questions asked in the Census.

Coherence:
Coherence of data and information reflects the degree to which data can be logically connected across other data sets. Statistics are coherent if they are based on common definitions, classifications and methodological standards. The messages that statistics convey to users will clearly relate to each other and not contradict if the data is coherent.

ONS has been developing a programme of work to join up different statistics, ensuring a coherent, integrated presentation of data to users. Definitions have been harmonised across surveys, but there are differences in approach between, for example, self-completion questionnaires and interview surveys. Thus the Census questions to establish economic activity rates were somewhat more limited than those asked on the Labour Force Survey, where the interviewer can probe more deeply if required.

3. Response to the 2001 census


The proportion of people returning a census form in England and Wales was 94 per cent. In Northern Ireland it was estimated that 95.2 per cent of the population responded to the census and 95.3 per cent of the population in private households.


The total overall response for England and Wales was 98 per cent – including 4 per cent of the population estimated to be resident in households identified by enumerators but who were imputed. Table 1 shows the components of response at the 1991 and 2001 censuses.

Table 1: Components of UK Census response and coverage rates for 1991 and 2001 - England and Wales
  England Wales England & Wales
1991 2001 1991 2001 1991 2001

A

People on returned forms: Census Response Rate

96 94 97 94 96 94

B

Other people in identified households

2 4 1 4 2 4

A+B

Total overall response

98 98 98 98 98 98

C

People not included on returned forms and people in wholly missed households

2 2 2 2 2 2
Total 100 100 100 100 100 100
Proportion of population covered in census results: Census Coverage Rate 1991: A+B; 2001: A+B+C 98 100 98 100 98 100
Note: The 1991 rates shown are subject to slight change, but this does not affect the conclusions to be drawn from this analysis.

Response by age and sex


Under-enumeration does not occur uniformly across all age-sex groups. Response rates were lowest for persons in their twenties, particularly men. Response levels by age-sex group for England and Wales as a whole varied from 98 per cent for females aged 75-79 to 87 per cent for males aged 20-24. A spreadsheet showing detailed rates for 1991 and 2001 is available from the ONS web site.


Census response has declined between 1991 and 2001 for most age-sex groups. Response rates have also declined for large-scale Government Surveys during the 1990s and response rates for censuses conducted in other countries have also fallen over the past decade, for example the level of under-enumeration observed in the censuses of both Australia and New Zealand rose between 1996 and 2001.


A summary of response patterns is given below:

Response by area

Under-enumeration also varied by area with lowest response rates for inner city areas where characteristics known to be related to census non-response are most prevalent - multi-occupancy and higher proportions of non-English speaking population etc.

Table 2 shows response rates for area types. The 2001 census response was lower in all area categories than 1991, with broadly similar proportionate drops across all areas with the exception of Inner London and Outer London, which have higher decreases in response rates. Inner London had the lowest response rate in 1991 and recorded the largest absolute drop in 2001.

Table 2 Census Response by area - England & Wales
  All people Male Female
  1991 2001 1991 2001 1991 2001
Inner London 88% 78% 86% 77% 90% 79%
Outer London 96% 90% 95% 89% 96% 90
Main Metropolitan areas 94% 92% 92% 91% 96% 92%
Other metropolitan areas 97% 95% 96% 95% 98% 96%
Non-metropolitan cities 95% 94% 93% 93% 96% 94%
Other non-metropolitan areas 97% 96% 97% 96% 98% 97%
Cardiff, Newport & Swansea 95% 93% 94% 93% 97% 94%
Other Welsh areas 97% 94% 97% 94% 98% 95%
Total 96% 94% 95% 93% 97% 94%
Note: The 1991 rates shown are subject to slight change, but this does not affect the conclusions to be drawn from this analysis.

4. The One Number Census


The 2001 Census aimed to maximise coverage and to make an accurate estimate of the people missed. The 1991 Census was thought to have had a substantially larger under-count than in previous censuses with about 2 per cent of the population of GB missed entirely and a further 1.6 per cent for whom records were imputed.

The One Number Census was designed to produce figures from the 2001 Census that are adjusted for under-enumeration and which are consistent across all forms of output and at the smallest geographical area. The term ‘One Number Census’ indicates a departure from the 1991 Census where preliminary figures from the census count were published and then later figures, adjusted for under-enumeration, were published. The One Number Census approach makes all adjustments as part of the census processing. Thus the One Number Census results in a database of the complete population for the UK from which all census outputs – including the SARs - are drawn.

Through the One Number Census the final census database should hold 100 per cent of the population. The One Number Census (ONC) aimed to integrate the 2001 Census counts with the estimated level of under-enumeration in the Census - that is the number of households and people not counted. It adjusted the Census database for the estimated undercount so that all statistics sum to ‘One Number’ - the national estimate of the population.


4.1 Step by step guide to the ONC

The One Number Census process involved a number of stages:

One of the key elements of the ONC was an independent follow-up survey, the Census Coverage Survey (CCS). This involved face to face interviews with a sample of 320,000 households from every local authority in the UK. But by combining the results of the census and the CCS, it was possible in 2001 to estimate the total resident population - the 'one number' - to a high level of precision, plus or minus 0.2 per cent.

4.2 The Census Coverage Survey (CCS)


The CCS was specifically designed to enable census population counts to be adjusted for under-enumeration at the national, local and small area level. It consisted of a completely independent and intensive face-to-face survey of a sample of over 16,000 postcodes containing 320,000 households drawn from all local authorities in England and Wales. The sample design took into account the uneven distribution of under-enumeration across the country by stratifying by a 'Hard to Count' index based upon characteristics likely to be associated with under-enumeration, such as the number of multi-occupied addresses.

The CCS was operationally independent from the census enumeration exercise. The CCS sample postcodes were kept confidential, CCS interviewers did not have any sight of the address lists produced in carrying out the census, nor the census forms returned in the area in which they were interviewing. The interviewers focused on making as many calls as necessary to achieve an interview and the timing of these calls was varied to maximise the probability of making contact.

The CCS in England and Wales achieved a response from 91 per cent of the households identified by interviewers. This is a high response rate for such a large-scale voluntary survey when compared to other national surveys. The survey succeeded in meeting its objective of identifying households and persons that had been missed by the 2001 census.


4.3 Quality assurance

All the ONC population estimates were subject to rigorous quality assurance. The population of each local authority by age and sex were considered in a consistent and detailed manner - this involved comparison against diagnostic ranges derived from rolled-forward population estimates and aggregated administrative sources (such as Birth Registration and Pensions data). Where the ONC estimates fell outside of the diagnostic ranges, extensive checks of the ONC results were undertaken with respect to sample sizes, outliers etc and contingency action was taken if any issues were identified.

The quality assurance process included analysis for each local authority of a number of specific population subgroups known from 1991 to be prone to under-enumeration. These were full-time students, home armed forces, foreign armed forces and their dependents and prisoners. The estimates for these subgroups were compared with data from other official sources to determine whether the results were plausible.

4.4 Dependence between census and CCS

For the ONC process to produce unbiased estimates of the population it is necessary for the census and Census Coverage Survey to be as independent of each other as possible. Practical arrangements were put in place to achieve this with census and CCS operations being kept entirely separate on the ground. If the two attempts at enumerating the same population are independent, it is possible to not only estimate those missed by either the census or CCS but to also estimate those missed by both - the dual system approach.

Through this approach, independence of process was achieved. However, there is an additional component of dependence which needs to be taken into account. This is dependence caused by the fact that those people who are difficult to count in a census are also difficult to count in a post-enumeration survey such as the CCS. This was expected and a methodology was developed to identify those areas where dependency was marked and to adjust for that dependence. This added an additional 230,000 to the ONC population estimates for England and Wales as a whole.


4.5 Overcount

Part of the CCS interview was also aimed at identifying any potential overcount in the 2001 census, that is persons incorrectly enumerated as resident at more than one address. Examples include second homes and children from broken homes living a proportion of time with each parent. Analysis of responses to the CCS indicated that the level of overcount in the 2001 census was negligible - less than 0.1 per cent of the population were estimated to have been counted twice.


The One Number Census Guide provides full details of how the ONC was conducted and is available at www.statistics.gov.uk/census2001/onc.asp.

5. Quality of responses


In the 2001 Census a person was taken to exist if at least two of the name, date of birth and sex fields were completed. Generally, forms were accepted that contained a minimum of four items of information: name, date of birth, sex and marital status. Table 3, extracted from information on the ONS web site at www.statistics.gov.uk/census2001/proj_qr.asp, shows item non-response rates to topic areas for England and Wales.


It is evident that item non-response is lowest for age, sex and marital status and highest for company size and professional qualifications.

Table 3: Item non-response rates, 2001 Census: England and Wales
Topic England and Wales
Age 0.5%
Sex 0.4%
Marital status 0.8%
Student flag 1.3%
CoB 2.5%
Ethnic 2.9%
Welsh Language 5.5%
Religion 7.6%
Health 3.1%
Carer 6.1%
Long-term illness 3.9%
Address 1 year ago 4.5%
Quals 6.2%
Prof quals 17.2%
Work last week 2.1%
Employment status 6.6%
Company size 13.9%
Occupation * 3.2%
Supervisor 6.8%
Industry * 7.8%
Workplace postcode 7.8%
Method of Travel 6.3%
Hours worked 8.0%
Relationship to Person 1 3.5%
   
Accommodation type 3.0%
Self-contained 3.9%
Rooms 5.4%
Bath/shower 2.5%
Lowest floor level 4.0%
Central Heating 2.2%
No. of cars 2.7%
Tenure 3.4%
Landlord 2.9%

In 1991 the Census validation Survey provided evidence of the accuracy – or at least the consistency – with which census questions were answered. The Census Coverage Survey did not fulfil this function and the only information available comes from the Census Test that was conducted in 1997. More details can be found at www.statistics.gov.uk/census2001/proj_qr.asp.


6. Edit and imputation


As part of the planning to take account of under-enumeration through the One Number Census, much more information was imputed in the 2001 Census than for 1991. It is of great importance to analysts, particularly those using microdata, to understand the methods used for editing and imputation. For the SARs, all imputed records have been flagged to that users can choose whether or not to include them in any analysis.


The material below is drawn from www.statistics.gov.uk/census2001/editimputevrep.asp. More details are available at that site.


An Edit and Donor Imputation System (EDIS) was devised for the 2001 census. Values would be set to missing for imputation if edit could not resolve an inconsistency. A person was taken to exist if at least two of the name, date of birth and sex fields were completed.

For the 2001 Census edit and imputation followed the principles below:


Methodology


EDIS can be sub-divided into five elements.


Multi-tick rules dealt with cases where more than one box was ticked but only one option was allowed. In some cases there was a rule for selecting one tick. If more than half the boxes were ticked or a set of priorities for accepting one tick could not sensibly be made, the answer was treated as missing and a value was supplied at the imputation stage.


Range checks were applied to prevent answers being outside an acceptable range. These were set to missing for subsequent imputation. Examples were households with 0 or more than 99 rooms, or with more than 20 cars, people with a date of birth before 1891 or after Census Day, who last worked before 1941 or who worked more than 99 hours per week.


Filter rules were applied to resolve some inconsistencies and to decide which fields should be set to 'No Code Required' where questions were answered but should not have been. For example, people under 16 or over 75 were not required to answer any of the employment questions. The variable Activity Last Week was also derived at this stage.


A set of Edit rules was applied to missing items or responses which appeared to be in error or inconsistent when compared with other data (such as married couples of the same sex, a child less than 13 years younger than its parents, or a married person under 16). These are known as hard checks.


In determining how to resolve such inconsistencies, the Fellegi/Holt principle of making the minimum number of changes was followed as far as possible. Thus if a person was under 16, married and had answered employment questions such as occupation, Age would be set to missing, since the inconsistency could be resolved with the least change by imputing a value for Age between 16 and 74.


Edit also identified unlikely, but not impossible responses. In some cases rules were applied to eliminate these, for example, a purpose-built flat was considered unlikely to have more than 10 rooms, and for reasons explained below the value was set to ‘Missing’ for imputation. In others no further action was taken, e.g. where people under 35 were retired from paid work. The number of these ‘soft checks’ was reported but the data were not changed as a result.


All items which were missing after the Edit stage were dealt with by the Imputation component, which is described below.


Imputation was applied when there was no answer on the Census form, it failed the multi-tick rules or was invalid, or the filter rules or Edit marked it for imputation to resolve an inconsistency.


The principle of a Donor Imputation System is to search for a single donor household to supply all the missing variables in a recipient household. Exceptions are imputation for postcode of usual address one year ago and of workplace, which were carried out at a later stage than imputation for other variables.

The search looked at all records in an Estimation Area, a group of contiguous Local Authority Districts of about 500,000 population. The method searched for a donor using up to five matching variables, which were determined by the fields requiring imputation on the recipient record. Values were copied over from the donor household to fill the missing values on the recipient record. Consistency checks were then applied and the donor rejected if any check failed.


Potential donor households were scored using a second set of matching variables relating to all people in the household. In addition, potential donors were penalised if they had been used before as a donor or if any of their fields had been edited or imputed. A record could not be used as a donor if any of the fields to be imputed were also missing on the donor. If potential donors still scored equally, the donor geographically closest to the recipient was chosen. However, to improve efficiency of the searching procedure, if a suitable donor was found who lived within 5,000 metres of the recipient, this person was accepted and no further search took place to find a closer donor.


The intention was to use joint imputation where possible, i.e. selecting a single donor household to impute for all the people with missing values in a recipient household so as to preserve the joint distributions between variables. If a suitable donor household could not be found for joint imputation, separate donors were sought to provide values for each person in the household requiring imputation, if necessary reducing the number of matching variables.


A fallback stage was also required as donor imputation failed to work for a few people. Most of these were imputed by testing possible values at random until one could be found which met the consistency criteria (a ‘cold deck’ approach). A small number of households could still not be completely resolved because of inconsistencies in age and relationships between people. As a final stage, if all else failed (‘son of fallback’) those containing up to eight people were completely replaced by synthetic households drawn at random from a set of the same household size, and households of nine or more people were corrected clerically.


The aim of imputation was to estimate the distribution of missing values accurately, so as to take account of any differences between the characteristics of respondents and non-respondents (non-response bias). It was not expected that the imputed values for every individual would be precisely accurate.


In comparison with 1991, EDIS was more comprehensive. It was applied to all variables, including qualifications, relationships, occupation, industry, hours worked, workplace address and means of transport to work, which were only analysed for a 10 per cent sample of households and communals in 1991.


There was some manual intervention in the 1991 processing system, such as clerical checking of missing or inconsistent items which exceeded certain tolerances. EDIS was almost entirely automatic as clerical intervention was limited to households of more than eight people which failed the fallback stage.


Edit


A total of 13.7 million edits were carried out on the data for 11.8m people. The base population for EDIS was 49.4m people in England and Wales, including some 0.6m students living away from home during term-time for whom only a few demographic and relationship questions applied at their home address. The eight most frequently executed edits accounted for 91 per cent of the total. These were:

4.50m Professional qualifications set to None where missing but educational qualifications was answered
2.29m Carer set to No where missing unless Activity Last Week was also missing
1.66m Workplace size set to 1-9 where person was self-employed
1.08m Travel to work set to “work mainly at/from home” where workplace address was “mainly work at/from home”
1.03m Supervisor set to No if missing, unless occupation was also missing
1.01m Health set to Good if missing, unless Activity Last Week was also missing
0.59m Professional qualifications set to missing if answered but educational qualifications was missing
0.40m Missing Country of birth set to that of either siblings, parents or other related people in the household who have the same Country of birth


Top

Imputation


One or more items needed to be imputed for 13.8m people - that is 28.0 per cent of the population who returned Census forms. Of these, 4.7m were dealt with by joint imputation. 10.0m were imputed using individual imputation, including all those in single person households. 9.8m of the individual imputed cases used a donor household of the same size as the recipient’s and the remaining 0.2m a household of different size. 0.4m people required imputation using the cold deck fallback method. Over 1m people had some items imputed by one method and some by another, hence there is some double-counting.


23.4 per cent of the population were used once as donors, 2.1 per cent twice and 0.1 per cent three or more times. In the SARs the variable EDISDONO provides information for whether an individual was used as a donor and, if so, how many times.


For household variables, 2.5m needed imputation, 11 per cent of all households. 0.08m were dealt with by fallback and the remainder by joint imputation. Almost all the donor households for joint imputation were used once each.


Person variables

  Total (including imputed) Non- response Imputed Non- response Imputed
  000s 000s 000s % %
Age 49,359 262 278 0.53 0.56
Sex 49,359 199 219 0.40 0.44
Marital status 49,359 372 158 0.76 0.32
Student flag 49,359 622 641 1.26 1.30
Country of birth 48,848 1,211 829 2.48 1.70
Ethnic group 48,848 1,405 1,421 2.88 2.91
Welsh language 2,754 153 153 5.54 5.57
Religion 48,848 3,721 - 7.62 -
Health 48,848 1,525 531 3.12 1.09
Carer 48,848 2,967 693 6.07 1.42
Long-term illness 48,848 1,899 1,915 3.89 3.92
Address one year ago 48,848 2,198 2,213 4.50 4.53
Educational qualifications 35,367 2,187 - 6.18 -
Professional qualifications 35,367 6,094 - 17.23 -
Highest qualification 35,367 - 2,150 - 6.09
Working last week 35,367 737 - 2.08 -
Activity last week 35,367 - 1,301 - 3.69
Employment status 33,686 2,205 2,058 6.55 6.14
Workplace size 33,686 4,689 3,067 13.92 9.15
Supervisor 33,686 2,294 1,119 6.81 3.34
Occupation - currently working 21,741 694 759 3.19 3.48
Occupation - all ever worked 29,335 4,051 4,051 13.81 13.81
Industry - currently working 21,741 1,702 1,777 7.83 8.15
Industry - all ever worked 29,335 5,400 5,400 18.41 18.41
Workplace address 22,396 1,744 1,426 7.79 6.42
Method of travel 22,533 1,410 1,127 6.26 5.07
Hours worked 22,533 1,804 1,506 8.00 6.77
Relationship to Person 1 28,065 971 1,326 3.46 4.73


Note: The ‘Total’ column refers to the number of people in scope for the question, i.e.:


Age


Age was not reported or was out of range (born after Census day or more than 110 years old) for 240,000 people. It was set to missing for a further 23,000 on grounds of inconsistency, mainly because people who were not single and who had answered three or more employment questions had their age captured as under 16.


The distribution of imputed ages followed that of the remainder of the population except for a shortfall among the 0, 6-15 and 76-80 age groups. This is primarily because some people were imputed as aged between 16 and 74 who may have been outside this age range because some employment questions had been answered. The shortfall in babies under 1 year old occurred where their address one year ago had not been stated as ‘no usual address’. The effect in an area of 100,000 population would typically be that 2 or 3 under 1’s would have been imputed as over 1.


Sex


Sex was missing for 185,000 people and multi-ticked for 14,000, 0.4% of the population in total. There were no edit actions which directly affected this question: if a husband and wife, or the parents of a child, were of the same sex the relevant relationships were imputed. A further 20,000 had values imputed by 'son of fallback'.


The sexes were imputed in the ratio of 51:49 in favour of females, very similar to the proportions among the remainder of the population. The accuracy of imputations was assessed by comparing the imputed values with people’s names in a sample of areas. This showed that 75 per cent of imputations were correct. Among the incorrect values there was a very slight bias towards imputing females. The net effect would be to count four people out of every 100,000 as female rather than male.


Marital Status


There were 373,000 missing or multi-ticked cases for marital status, representing 0.8 per cent of the population. 232,000 of these were children under 16 who were set to Single in edit. A further 6,000 under 16s had marital status changed to Single. Imputation was applied to the remainder. Married and Re-married were less likely to be imputed than among the remainder of the population.


Student


Question 5 on the person schedule asked whether a person was a schoolchild or student in full-time education. 1.3 per cent of people failed to answer or multi-ticked the question, of whom 13 per cent were imputed as students compared with 21 per cent in the remainder of the population.


Country of Birth


Country of birth was omitted by 2.5 per cent of people. Of these, 88 per cent were imputed as born in the United Kingdom, compared to 92 per cent in the remainder of the population. People born in Africa, Asia and North America were imputed in higher proportion than the remainder of the population.


Ethnic Group


The non-response rate for ethnic group was 2.9 per cent. 89 per cent of these were imputed as White compared with 92 per cent in the remainder of the population. There were higher proportions of imputed people in the Mixed, Asian and Black groups.

  Imputed Total (including imputed)
  000s % 000s %
White 1,260 88.7 45,065 92.3
Mixed 24 1.7 605 1.2
Asian 80 5.6 1,925 3.9
Black 43 3.0 868 1.8
Chinese and other 13 0.9 382 0.8


Welsh Language


The question asking whether people could understand spoken Welsh, or speak, read or write the language, was asked of all people living in Wales. There was a 5.5 per cent non-response rate. No knowledge of Welsh was imputed slightly more often than for the remainder of the population.


Religion


As the question on religion was voluntary, non-responses were not imputed but will appear in tables as ‘not stated’. The national non-response rate was 7.6 per cent.


General Health


This question asked whether over the last twelve months a person’s health had on the whole been good, fairly good or not good. The non-response rate was 3.1 per cent, but an edit rule set the value to good unless Activity Last Week was also missing. This reduced the number requiring imputation to 1.1 per cent. Among these people, Fairly Good and Not Good were imputed slightly more frequently than in the remainder of the population.


Carer


Question 12 referred to voluntary help or support given to family members, friends or neighbours. The rate of non-response was 6.1 per cent. Missing values were set to No by an edit rule unless Activity Last Week was also missing, and children under 5 were also assumed to not be providing care. Of the remaining 1.3 per cent of the population, 11 per cent were imputed as Carers in comparison to 10 per cent among the remainder of the population.


Long-term Illness


There was a 3.9 per cent non-response rate to this question, which asked about any long-term illness, health problem or disability which limited the person’s daily activities or the work they could do. 22 per cent of these were imputed as having such a condition in comparison with 18 per cent among the remainder of the population.


Address One Year Ago


This question had a non-response rate of 4.5 per cent. No usual address was imputed more often than among the remainder of the population, mainly because there was a high rate of non-response for children under 1.


Qualifications


This topic was covered by two questions, on educational and professional qualifications, which had non-response rates of 6.2 per cent and 17.2 per cent respectively. Where missing, professional qualifications was set to None by an edit rule if the educational qualifications was answered. Professional qualifications was set to missing if educational qualifications was not answered. Taking the responses to the two questions together, a new variable called highest qualification was derived. After applying the edit rules, 6.1 per cent of people needed to have highest qualification imputed. People with imputed values were more likely to have no qualifications (Level 0) than the remainder of the population.


Activity Last Week


This variable shows whether a person was working in the week prior to Census day, and if not whether they were looking for work, waiting to start a job, retired, student, looking after home/family, permanently sick or disabled, or otherwise economically inactive. This information is derived from Questions 18 to 22 on the Census form for people aged 16 to 74.


Problems were found with the pattern of responses to these and other employment questions which was caused by the format of Question 18 (Last week, were you doing any work). Some people ticked No or multi-ticked this question, but then went on to give details of their present job in answer to Questions 32 to 36. The filter rules were amended to accommodate this pattern so that they were treated as working.


Non-response to working last week was 2.1 per cent. The value was changed in certain cases depending on the pattern of responses to looking for work etc (questions 19-22), ever worked and year last worked (question 23), details of current or last job at questions 25-30 and current job at questions 32-35.


In total, 3.7 per cent of Activity Last Week values were imputed. These were biased towards looking for work and most of the economically inactive categories, especially retired and students. Only 34 per cent were imputed as working compared with 64 per cent in the remainder of the population aged 16-74. Generally it was people at the extremes of the age range who failed to respond to these questions, which explains the preponderance of retired people and students among the imputed values.


Employment Status


Question 25 asked whether each person who had ever worked was an employee, or self-employed with or without employees in their current or last job. Non-responses and multi-ticks amounted to 6.5 per cent of those who should have answered the question. These all went through imputation, and ‘Employee’ was imputed more frequently than among the remainder of the population.


Size of Workplace


The non-response rate for this question was 13.9 per cent. An edit rule was applied to set the number of workers to 1-9 where a person was self-employed without employees. This left 6.5 per cent to be imputed, of whom slightly fewer were set to 1-9 workers than among the remainder of the population, and slightly more in the 10-24 and 25-499 ranges.


Occupation and Industry


The non-response rate for occupation was 3.2 per cent among currently working people, including 0.7 per cent inadequately described responses. When all people who had ever worked are considered, non-response rose to 13.1 per cent. The imputed population was slightly biased towards people in major groups 4 (administrative and secretarial), 7 (sales and customer services), 8 (process, plant and machine operatives) and 9 (elementary occupations). Occupation groups 2 (professional) and 3 (associate professional and technical occupations) were under-represented.


A similar pattern can be found in non-response to the question on industry. Non-response was 7.8 per cent among current workers, including 0.6 per cent inadequately described, but reached 17.9 per cent taking into account all people who have worked. Imputation created more people working in sections A (agriculture), F (construction) and O (social and personal services) and fewer in D (manufacturing), J (banking, finance, insurance), L (public administration) and M (education).


It should be noted that the full codes were imputed for missing occupation and industry data. However, the primary matching variables for these fields were defined at the major group level. Thus if industry was reported but occupation was missing, a donor would have been sought within the same major industry group, and that person’s occupation copied into the recipient’s record. In some cases an unlikely occupation/industry combination may have been created at the individual code level.


Supervisor


Question 29 asked whether people supervised any other employees in their current or last job. The non-response rate was 6.8 per cent. An edit rule set missing answers to No unless occupation was also missing. This accounted for about half the non-response. Of the remainder, 25 per cent were imputed as supervisors compared with 30 per cent among the remainder of the population.


Workplace Address


There was a 7.8 per cent rate of non-response to this question, but some values could be deduced from the answers to method of travel to work. This left 6.4 per cent to be imputed. Of these, fewer were imputed as working at or from home than amongst the remainder of the population.


Method of Travel to Work


This question was asked only of currently working people. Non-response was 6.3 per cent, which was reduced to 5.0 per cent by a set of edits. The imputed values were biased towards public transport users and those travelling by foot and away from working at/from home or driving a car or van.


Hours worked


The non-response rate was 8.0 per cent, and imputation favoured the 0-19 hours per week range compared to the pattern among the remainder of the population.


Household Variables

  Total (including imputed) Non- response Imputed Non- response Imputed
  000s 000s 000s % %
Accommodation type 22,305 671 671 3.01 3.01
Self-contained 22,305 870 870 3.90 3.90
Number of rooms 20,542 1,117 1,116 5.44 5.21
Bath/shower and toilet 20,542 503 503 2.45 2.35
Lowest floor level 22,305 897 919 4.02 4.12
Central heating 20,542 539 442 2.62 2.17
Number of cars 20,542 669 554 3.26 2.72
Tenure 20,542 797 685 3.88 3.36
Landlord 6,582 - 175 - 2.94


Note: The ‘Total’ column refers to the number of households in scope for the question, i.e.:

Top

Accommodation Type


There was a 3.0 per cent non-response rate for this question, which was asked of all households. Imputed values were more likely to be a purpose-built flat, part of a converted or shared house, or a commercial building, and less likely to be a detached or semi-detached house.


Self-contained


This question had a non-response rate of 3.9 per cent. Of imputed households, 1.5 per cent were given not self-contained status compared with 1.1 per cent among the remainder of the household population.


Number of Rooms


Question H3 provided two boxes for the number of rooms occupied by a household, so that any value from 1 to 99 could be entered. Early analysis of processed data showed that there were some problems which needed to be addressed:


After carrying out an analysis of households with more than 10 rooms, rules were put in place to set values to missing where they were greater than a number which depended on accommodation type. Number of rooms was subsequently imputed. No limit was applied to detached houses.


Imputation was slightly more likely to set a value of 3 or 4 rooms, and less likely to impute 5 or more rooms, compared with the remainder of the household population.

Bath/shower and toilet


Question H4 asked whether a bath/shower and toilet was available for use only by the household. There was a non-response rate of 2.5 per cent and slightly more households were imputed as lacking sole use than among the remainder of the household population.

Lowest floor level


4.0 per cent of households failed to answer this question. Fewer were imputed as having ground floor as their lowest level of accommodation than the remainder of the household population.

Central heating


This question had a non-response rate of 2.6 per cent. Non-respondents were slightly more likely to lack central heating than for the remainder of the household population.

Number of cars


There was a 3.3 per cent non-response to this question. 35 per cent of these households were imputed as having no cars compared with 26 per cent for the remainder of the household population.

Tenure and Landlord


Non-response to these questions was 3.9 per cent for tenure and 2.9 per cent for landlord. Those not answering were more likely to be renting and less likely to be outright owners than in the remainder of the population. Among tenants, there was little bias towards any type of landlord among the imputed group.

Comparison with 1991


In general, the biases found in the imputed values for the person and household variables were in the same direction as those present in the 1991 Census data, but were less marked. For example, 52 per cent of those imputed in 1991 for marital status were assigned as Single compared with 41 per cent in the Census population. In 2001 the corresponding proportions were 49 per cent and 44 per cent. 53 pre cent of non-respondents were imputed as having no car in 1991, considerably higher than the 32 per cent among reporting households. In 2001, when the non-response rate had risen from 1.0 per cent to 3.3 per cent, the gap had narrowed to 34 per cent among imputed households and 27 per cent in those who responded.

How well did EDIS work?


Within EDIS, a number of assumptions were based on age being correct rather than other items. However, year of birth was occasionally mis-stated, not scanned correctly or given a wrong value during processing. Particularly when there was an error in the next to last digit of the year, EDIS may have imputed for a range of items where no value was needed, or conversely set reported data to ‘no code required’.

Late changes to the questionnaire design had some impact on EDIS. Splitting the qualifications questions into two parts, which occurred after the 1999 Rehearsal, meant that new rules had to be devised for 2001 to deal with professional qualifications. This turned out to be the question having the largest non-response rate as many people considered that it did not apply to them.

As extra room had to be found on the form for qualifications, the question on work last week was squashed into a smaller space. As a result, two of the bullet points which appeared separately on the Rehearsal form were conflated into one, and it appears from the pattern of responses to this and later questions that some form-fillers misunderstood the question and answered No when they were actually in work. A resolution was found to this problem by amending the filter rules for the derivation of Activity Last Week but a small number of answers may have been miscoded as a result of the extra complication which was introduced.

A single edit and imputation system was designed to deal with the censuses in England, Wales, Scotland and Northern Ireland, which all had slightly different requirements. Variations in the design of the Census form and in editing requirements meant that great attention had to be devoted to ensuring that the processing for each country was carried out to the desired standards.

Conclusion


EDIS was successful in its main aim of providing a complete and consistent database of values for all people who completed Census returns. It did so efficiently and largely followed standard principles of making minimum changes to the data. There were complications in its development including late amendments, some of which could have been avoided with earlier access to live data and others which were due to changes between Rehearsal and the final version of the Census. However, these issues were identified at an early stage of Census processing.


Further results on the performance of EDIS will be reported in the 2001 Census Quality Report, which is due to be published later this year.



Last updated 25 April 2007

ESRC Contact SARs Support | CCSR
These pages are maintained by the SARs support team.
Send us comments on this web page.