Skip to content

Population

File Source(s)
population_xx.csv DE: Rosés-Wolf database on regional GDP (version 6, 2020)(pre 1990) & Eurostat(post 1990), FR: INSEE, GB: Vision of Britain (pre 1981) & ONS (post 1981) & Wiki (Northern Ireland) and Census (London), US: Fabian Eckert, Andrés Gvirtz, Jack Liang, and Michael Peters. "A Method to Construct Geographical Crosswalks with an Application to US Counties since 1790." NBER Working Paper #26770, 2020(1830-1970) & Census(1970-2010) & NBER data(2010-2018)

Coverage

Country Geographical level Period
DE 2 (nuts2) 1900-2017
FR 3 (nuts3) 1876-2017
GB 2 (nuts2) 1851-2017
US 3 (county) 1830-2018
Annual data

We proceed to a linear interpolation based on population_raw column to obtain population data for each year in population column.

Variables

Variable Description Type
country_code Country code str
statisticalAreaCode Statistical area code (nuts/fips) str
statisticalAreaName Statistical area name (literal) str
year Year int
population Population in the statistical area (in thousands) float
population_raw Population in the statistical area before correction (in thousands). Relevant for GB only (see notes below) float
Focus on US data

We obtain US population post 1970 data by aggregating county data thanks to David Dorn crossover table.

Focus on GB data

GB population data are not available at a sufficiently detailed NUTS level over long period - at least we did not find it. For instance, Rosés and Wolf (2020) only provides data at the NUTS1 level for GB. Hence, we had to build the population data for GB at the NUTS2 level ourselves. This includes 3 main stages: 1. Pre-1981 data collection, 2. Post-1981 data collection, 3. Data harmonization

Focus on DE data

Following Roses and Wolf (2020), we have merged the regions of Darmstadt and Giessen into one entity and similarly for Braunschweig and Hannover. While each of these areas correspond to a NUTS2, the multiple changes in borders make it impossible to track population estimates over time without merging the regions.

Pre-1981 data collection:

We use Vision of Britain (VoB) population data, except for London where we use data from the Census. Some VoB geographic entities have no population data though. In this case, we made our best to reconstitute the data from smaller entities with known population data. Below we detail the construction of these entities

VoB Construction
Tweeddale Peebles+Selkirkshire
Roxburgh Ettrick and Lauderdale Roxburghshire + Selkirkshire + Berwickshire/4 + Midlothian/4
Cheshire Halton + Warrington + Cheshire east + Cheshire West and Chester
Mid Glamorgan Caerphilly/2 + Bridgend + Merthyr Tydfil + Rhondda; Cynon; Taff
South Glamorgan Vale of Glamorgan + Cardiff
Clwyd Flintshire + Wrexham + Denbighshire
Dyfed Carmarthenshire + Ceredigion + Pembrokeshire
Gwent Blaenau Gwent + Caerphilly/2 + Monmouthshire + Newport + Torfaen
Vale of Glamorgan Glamorganshire

Missing VoB data (concentrated in 1871, 1901 and 1941) are filled with linear interpolation.

Once we have data for all VoB entities (real or imputed), we aggregate them to obtain population data at the NUTS2 level using the conversion table reported in statisticalareasvob_gb.csv.

Post-1981 data collection

After 1981, the ONS provides data at the local authority level for each year. Same as before, we aggregate them to obtain population data at the NUTS2 level using the conversion table reported in statisticalareaslau_gb.csv. The conversion table is based on the local authority to NUTS crossover table and the Scotish Review of NUTS boundaries.

Data harmonization

As pre-1981 data are constructed using a collection of sources creating potential flaws or approximations. Hence, we found it desirable to compare the two datasets in 1981 (the only year of overlap) to compute a correction coefficient obtained as \(\frac{population~in~1981~using~ONS~data_{NUTS2}}{population~in~1981~using~VoB~data_{NUTS2}}\). We then apply this correction coefficient to all pre-1981 data to make sure that the time series is consistent for each NUTS2 despite the data source change.

Note that for East Wales and Scotland, 1981 (and 1971 for East Wales) data are missing from VoB. We used the 1971 data and applied the national population growth rate to (roughly) estimate the VoB data and hence the correction coefficient.