Methodological developments
Methodological developments
Much of the work on the SARs has exploited the novelty of the data (particularly the sample size and relatively detailed geography on the 2% SAR) to make methodological developments in a number of areas. The increasing use of multilevel modelling techniques by social scientists along with the availability of the SARs has led to developments in the understanding of the ecological fallacy and spatial variations in unemployment, deprivation and in health. The SARs have also contributed to improvements in population projections, in small area estimation and in micro-simulation of whole populations. Many other methodological developments have been made outside the academic arena.
- The ecological fallacy.
Recent years have seen an increase in the analysis of deprivation in Britain. In most studies the unit of analysis has been geographical units such as local government wards or districts. This reflects, in part, a reliance on small area statistics and local base statistics from the censuses of population. Although useful in identifying specific problem areas, this type of approach may be subject to ecological fallacy which 'arises when results from an analysis based on area-level aggregate statistics are incorrectly assumed to apply at the individual level', and this happens in spite of 'within-area homogeneity' whereby individuals in the same area tend to have similar characteristics' (Tranmer and Steel, 1998, 817). Because local areas are homogeneous, a correlation calculated from aggregated data is likely to be highly inflated representation of the true correlation between individuals. Tranmer and Steel (1998) investigate four regions - three in England and one in Australia - and find that the 'grouping variables' most associated with homogeneity of enumeration districts are similar in each region: age structure, housing type and ethnicity. This shows that knowledge of the behaviour of the grouping variables allows an adjustment of how for the correlation between other variables based on aggregated data. This adjustment greatly reduces the aggregation bias. Thus a way is found for avoiding the ecological fallacy.
- Using the SARs for assessing the relative significance of deprived place and deprived people
Areas of high levels of deprivation may be home to high proportions of particular social or demographic groups but it cannot be automatically assumed that these groups are themselves deprived. Although some studies have been based on purpose designed individual level survey data, these often lack sufficient sample sizes to effectively analyse small subgroups of the population or allow geographical disaggregation. Fieldhouse and Tye (1996) use the SAR data to investigate the social, demographic and geographical dimensions of deprivation. The distribution of individual level deprivation (deprived people) is compared with an equivalent area level index constructed from standard census output using conventional techniques.
- Using area classifications in analysis of neighbourhood effects
A classification of census enumeration districts, has been added to both SARs. Through multilevel modelling techniques, Fieldhouse and Tranmer (forthcoming) use the area classification information on the SAR to investigate geographical differences in unemployment. Other research has indicated that where a person lives can effect their propensity to unemployment (see above). However, the understanding of these relationships are confounded by the reciprocal nature of the relationship between unemployment, housing and geographical location. Fieldhouse and Tranmer examine the relative importance of the individual, the type of neighbourhood of residence, and the local labour market in which one lives in explaining variations in unemployment risk. The paper concludes that most neighbourhood level variation in unemployment is due to housing market effects, particularly through neighbourhood selection.
- To use the SARs for population projections
Unlike sample surveys which omit the institutional population, the SARs have data for people living in residential homes, hospitals, prisons or army quarters. This has proved very important for research. For example, Murphy and Wang (1996) use the SARs to make marital status population projections for England and Wales using a two-sex life table multi-state model called LIPRO. Whereas the conventional 'atomistic' approach tends not to take account of the wider socio-demographic context, the model used by the authors emphasises the role of multiple transitions in producing the observed numbers in a particular state. For instance, the model produces macro-level estimates of population parameters such as the annual numbers of births, and population (1981-2040) by sex, age and marital status, based on the application of transition rates. The model emphasises the interconnectedness of stocks and flows across time and social space and produces easily comprehensible summary measures. The model provides a valuable sight into the processes at work and is the appropriate method for forecasting such systems.
- Small area estimation using census microdata
The focus in the use of small area statistics is on the characteristics of a local area, usually in order to provide the information needed to plan services or meet customer demand. In Britain, standard census output provides tabulations for local areas using the 100% of the census records. Simpson (2000) shows that census microdata can help fill in the gaps in the small area tabulations, and estimate for local area information which cannot be obtained from the census. For example, the number of young people living alone in a local authority district is not available from the 100 per cent census tabulation. But an estimate of the correct value can be extracted from sample microdata. The SARs can be used for small area estimation in a variety of ways. The methods improve the reliability of the direct estimate from microdata by combining it with more reliable national data and with relevant data from the 100 per cent census tabulations. Characteristics not recorded by the census can be inferred for small areas by knowledge of their relationship with census variables. In addition, aggregate data, including non-census data, can be adjusted to avoid the ecological fallacy by using microdata to establish the basis for correlation between individuals or households. Finally, sub-samples of microdata can be used to simulate areas smaller than those to which the microdata are coded. The following are some of the methods discussed.
- Supplementing a small sample
The finest geographic definition of the SARs is 120k of population. For labour force forecasts in multi-ethnic areas, estimation of economic activity rates by age and sex for each ethnic group is not feasible via 100 per cent tabulation, and hardly so via the SARs. To improve the reliability of the estimates, Bradford Council (1996) supplemented the SAR data for Bradford with SAR data from larger regions containing Bradford - West Yorkshire, England and Wales - until the sample reached a minimum of 100 for a given cell.
- Consistency with 100 per cent tabulations using iterative proportional fitting (IPF)
Owing to the sampling nature and to the fact that no imputed records for households missed by the census are included, the SARs may give inconsistent data to the 100 per cent tabulation. The inconsistency is likely to be largest with smallest samples. To gain consistency and greater accuracy, the microdata results can be scaled to be consistent with the 100 per cent tabulation using the IPF method as Bradford Council did in developing labour force forecasts. (See below on SARs for policy use).
- Micro-simulation of whole populations
Micro-simulation attempts to assemble complete individual census records for the smallest areas - enumeration districts in England and Wales. For an enumeration district of n households, the method seeks a sub-sample of n records from the household microdata that best matches the 100 per cent and 10 per cent tabulations for the enumeration district. Williamson et al (1998) describe and compare algorithms for searching the SARs for sub-samples that match chosen local tabulations. They aim to reduce the discrepancy between the local tabulations and a random sub-sample of the microdata. The best methods reduce the discrepancy by around 70 per cent, thus recreating a considerable amount of the diversity between enumeration districts - although not all of it. The authors validate the simulation by recreating tabulations that were not used in the simulation itself and find good results.