(Drawing) Rings Around The World: data

Showing posts with label data. Show all posts

Wednesday, 23 January 2013

A better map of population density

If you want to produce a map of population density the usual way to go about it is to get some Census data on density in different zones (wards, Census tracts, etc) and plot it like the map below, which shows 2011 population density in London at Middle Super Output Area (MSOA) level (darker colours represent higher density).

This approach has the virtues of being quick and a fairly standard approach, but there are serious drawbacks too. The most serious one is that this map doesn't really show you the distribution of population, because it hides the fact that across large swathes of London there are no people whatsoever. Much of London's area comprises water (only partially represented in the map above in the shape of the Thames), parkland, transport, industrial or commercial property, or some other non-residential use.

Not only do choropleth maps of this kind not show these variations in land use, but by analogy the calculation of population density across the whole area of each zone will greatly understate the 'real' density of population in zones with little residential land. Look at the very centre of the map, for example. That white blob is the City of London, and this map is telling us that it has very low population density, similar to London's semi-rural outskirts. In the sense of 'population per hectare of all land', that's true, because most of the City comprises commercial property with nobody living there. But there are some residential areas in the City and in these areas people live at fairly high densities. So in terms of 'population per hectare of residential land' the map is quite misleading.

We may get a more realistic picture from a dasymetric map. This type of map combines the same kind of population data with separate data on land use, so that only the relevant areas are highlighted. For our purposes we are interested in residential land, and for that I went to the European Environment Agency's Urban Atlas maps of urban land use based on 2006 satellite data. Using R I extracted from the London map the area covered by continuous or discontinuous 'urban fabric' and also any construction sites, as most of these will be for housing. While 'urban fabric' sounds a bit general there are categories for industrial, commercial, transport, water, green, forest, leisure and other land uses so I was fairly confident that it represented residential land reasonably accurately.

Using QGIS I joined this residential land layer to the same data on population at MSOA level from the Census, recalculated population density in each MSOA on the basis of the residential land only, linked the results back onto the residential layer, and mapped it:

Click on the image for a bigger version (or find the full size 7mb behemoth here).

What we end up with is, I think, a much better map of London's population density, because it shows only the residential areas (or a close approximation) and it doesn't artificially reduce density in mostly non-residential areas like the City or indeed Bromley or neighbourhoods bordering the Lea valley.

Using this approach also changes the ranking of boroughs in terms of population density. Measured in gross terms (that is, across all land), Islington had the highest population density of any London borough in 2011 at 139 people per hectare. But looking only at residential land Islington's net population density was 181 people per hectare - higher, but not nearly as high as Tower Hamlets at 256. And this makes sense - Tower Hamlets has large areas of non-residential land (much of Canary Wharf, for example), but what residential land it does have tends to be pretty densely occupied.

I should say that this map is far from a perfect representation of reality. It has a number of flaws, such as the combination of land use data from 2006 with population data from 2011, so that it undoubtedly misses out some residential areas created in the interim. It divides the entire range of population densities into only four categories which are then treated as internally identical. And similarly, like all spatially aggregated data it hides variation within each zone, in this case MSOAs. I could have used the smaller Output Area geography, but it would have taken more time and more computing power than I wanted.

Update, 1 Feb: Here's a scrollable, zoomable version of the full-size map for you to explore:

Saturday, 24 November 2012

Map of 2001-2011 population change in London

Update: Those intelligent people in the Intelligence team at the Greater London Authority have now made a better map of population change between 2001 and 2011, which I think you should look at rather than mine (there's more GLA analysis of the Census here).

The GLA map is better because (a) it uses ward boundaries, which unlike the statistical boundaries I used have not changed over time and therefore offer a like-for-like comparison; (b) it compares Census 2011 population to the 2001 mid-year population, which the GLA thinks is a more reliable figure than the 2001 Census figure, and (c) it's interactive! So I've put my map below the fold here, just for reference.

Land values and urban history

Via the Urban Demographics blog, here's a short video of Dr Gabriel Ahlfeldt of the LSE discussing his analysis of a unique dataset of land values in Chicago over time. Apart from looking pretty, this kind of analysis is of great interest to urban economists since land values are both fundamental to understanding cities and very difficult to observe in practice, because the value of land is usually mixed in with the value of structures on it. The data Dr Ahlfeldt analyses manages to separate the two out, allowing us to see how much people are willing to pay for 'pure' location as distinct from whatever happens to be built there.

You can see from the video that land values are very high in Chicago's central business district but then drop off sharply as you move out, a sign that people will pay a very premium to locate their home or workplace (mostly the latter, in this case) in that spot. And as Dr Ahlfeldt says, that particular location has been far more valuable than any other in Chicago for the whole period covered by the data.

So even though vast numbers of boats carrying corn, lumber and pork no longer come and go via Chicago's small harbour on Lake Michigan, the legacy of that waterborne trade and the density of businesses and institutions that built up around it can still be seen in the pattern of industrial and commercial location today. This suggests a very important role for path dependency, history and perhaps chance in explaining urban form.

You can see the whole of Dr Ahlfeldt's lecture and many others at the Lincoln Land Institute here.

Monday, 23 April 2012

How Energy Performance Certificate data could be really useful, but isn't

One of the frustrating things about discussing housing in this country is that we have historically lacked some key data which would allow us to compare ourselves more accurately with other countries. For example, there are no good statistics on the size of homes we're building now. It is commonly accepted that we build very small homes compared to other new countries, but ~~the only~~ some of the evidence for this is very out of date (see this Policy Exchange report from 2005 which cites these EU statistics from 2002 which cite English House Condition Survey data from 1996). ~~It may well be true that we are building small homes at the moment but we just don't have good enough data to say for sure.~~ [Update: I completely forgot about RIBA's excellent research on this very topic. Thanks to Rebecca for reminding me. So the data gap isn't quite as large I thought, though much of the below still applies.]

Relatedly, we can't really compare our house prices with those in other countries because the simplest consistent comparison, price per square foot or square metre, is not available to us. This matters to people who make housing policy, but it also matters to people thinking of moving house between different countries.

There is a solution in sight, however. The law requires an Energy Performance Certificate to be produced for every house that is sold or rented out. An EPC is drawn up by an expert after looking over the house, and captures key information about the energy efficiency of the house. But it also captures other information, notably the type of house, its size in square metres, and its exact address. There are now about 7 million domestic EPCs, all held on a single register and as of today searchable by address. I just looked up the EPC for a house down the road from me, which is the same kind of Victorian mid-terrace as I share with friends. It pretty much confirms what I thought, which is that our house retains heat about as well as a sieve.

To get back to my point though, what this means is that we've got a huge and growing database of home sizes. And because the Land Registry has recently started releasing its data on house prices, again with the exact address provided, it should be possible to link the two datasets together to calculate the average price per square metre in different parts of the country, for new as well as old houses. More sophisticated analysis could also reveal the extent people are willing to pay for for more energy efficient homes.

I don't know whether anyone in government is working on this. As far as I can see they're not, and that wouldn't surprise me as the key department (Communities and Local Government) is these days shedding statisticians and generally doing less analytical work.

But it should be possible for academics and laypeople to analyse the data in this way. The problem is that the government has decided that EPC data should only be available in bulk to certain organisations and only if they are prepared to stump up the money for it. The costs range from 1p to 10p per record depending on how much detail you want, but in any case this quickly mounts up if you want any kind of comprehensive database at local or regional level.

The government says these prices are to cover the costs of disseminating the data. Maybe that's fair enough and maybe it isn't, but it does mean that the kind of useful analysis I've described above can't be performed by anyone outside central government. So if the CLG are determined to ration access to the EPC data by price I think it should really be doing its own analysis and making the most of this data on our behalf.

Monday, 9 April 2012

People are using their cars less (but still quite a lot)

There has been much talk here and in other countries about the apparent declines in personal car travel seen in the last few years. Generally these are relatively small declines compared to the huge increases in car travel seen in previous decades, but if per capita car travel has peaked and is falling then that would at least be noteworthy from a social point of view, and possibly quite important for transport policy - see for example this letter from the head of the Transport Planning Society arguing that the Department for Transport's London traffic forecasts are out of whack.

The chart below shows my calculation of daily car travel in miles per person in Britain from 1949 to 2010, derived from these DfT statistics on car travel and population data from the Census and from ONS mid-year estimates.

According to these figures, per capita daily car travel peaked in 2004 at 11.7 miles and has been trending slowly downwards since then. A couple of caveats are probably in order at this point: we are in a recession, which usually reduces travel, and while these figures are per person, strong population growth could in future increase total car travel even if the per capita average continues falling. Finally, the reduction in car traffic is slightly offset by a rise in light van traffic over the same period.

We've also got data for London, though only going back to 1993. The chart below uses population data from ONS but published on the London datastore. The trend here is quite different, already flatlining in 1993 and falling fairly consistently from the turn of the millennium. While car travel per capita has fallen across London, the drop is particularly large in Inner London, down 28% over the period. The average Inner London now travels a shade under four miles a day by car, compared to just over six for Outer Londoners and eleven for the average Briton.

Lastly, much of the talk in the US is about whether car travel is particularly falling among younger people. To try and get a feel for this I looked at data from the Labour Force Survey on the main mode of transport people use for getting to work. Obviously the caveat here is that commuting is only a subset of all travel, but it's the best we can do for now. The chart below shows the proportion of people in broad age groups who reported travelling to work by car in 2004 (the earliest year I could find) and in 2011.

It's really important to emphasise here that this is based on a sample survey, so the estimates have confidence intervals around them (the little black lines). This means that in most cases the change is not statistically significant - including the apparent increase in car use among 16-19 year olds. The only age bands in which there was a clear, statistically significant change over the period were 25-29, 30-34, 35-39 and 50-54 year olds, all of whom were less likely to drive to work in 2011 than in 2004. So there's evidence of a fall in car commuting, but mainly by the 'young-ish' rather than the young (only around half of whom commute by car anyway).

Friday, 2 September 2011

We need to measure subjective cycle safety

Following the recent tragic death of a cyclist in on Cavendish Road in Clapham, a local resident who witnessed the crash asked TfL what they could do to make the road safer, especially in light of a report back in 2008 which recommended that the junction in question needed to be redesigned to make it safer.

TfL said no. Here's their reasoning, posted by LCC on one of their comment threads:

...We found that in this section of Cavendish Road there had been 11 collisions in the last three years (up to 30 April 2011; collision data is supplied to us by the Police, and this is the most recent data available). None of these collisions involved cyclists and two resulted in serious injury. This particular section carries a large volume of traffic, more so in fact than the majority of other A roads in the borough. Given the volume of traffic and when compared to other, similar roads there have been a relative low number of collisions on this section of Cavendish Road. As I said, TfL is data led in its approach to progressing schemes on the Transport for London Road Network. We could not progress a safety scheme at a location at which there had been fewer collisions than at other areas. For this reason, we have no plans to progress a scheme at this section. I appreciate that this is a sensitive issue, and would be happy to discuss further...

TfL appear to be saying that they will not consider trying to make a road safer for cyclists until a sufficient number of them are killed or injured on it. The obvious problem with this is that people tend to avoid cycling on a road if it seems unsafe or threatening, but by TfL's logic a road with no cyclists must be a road that is perfectly safe to cycle on.

I think this calls for cycle campaigners to try and produce some better data on how safe and friendly cyclists or would-be cyclists perceive different streets to be. We need objective data on subjective perceptions, in other words. This kind of thing is relatively easily available at large scales from sample surveys (see this recent one from Manchester for example), but what we're really interested in is identifying which individual streets and junctions are the most cycle friendly, and which need the most work to make them safe and inviting.

A while back I proposed something along these lines on the GB Cycling Embassy forums, with the idea being to get people to rate images taken from Google Streetview, but I haven't developed it any further, mainly because I don't have any IT skills.

But now some smart people from MIT have come along and produced something very similar called Place Pulse, which posts two random Streetview images from a handful of cities in the US and Europe and asks people which looks safer. Screenshot below:

They then analyse the results to see which places look safer. It's striking how the places rated as safest are all in Austria and mostly very pedestrian friendly while the least safe are all car-dominated streetscapes in the US.

Place Pulse is focused on 'safety' in the widest sense, but it shows that you could do something similar for cycle safety, and then use the results to understand what people associate with safe or unsafe cycling environments. And then maybe we could get TfL to understand that they should be proactively trying to build streets that invite cycling.

Tuesday, 29 March 2011

Urban myths and the misuse of urban data

Following up my post about the mismeasurement of urbanisation in Egypt, I would highly recommend anyone interested in this area to read David Satterthwaite's article on 'Urban myths and the mis-use of urban data' (pdf).

Satterthwaite's article covers a lot of ground, including alarmism over 'out of control' urbanisation in Africa, the extreme paucity of census data in some areas, and problems in measuring city size and therefore various indicators of city performance. But I'm going to focus on his discussion of how the comparability of the UN's urbanisation statistics is undermined by the use of different definitions of urbanisation at country level. For example:

China’s level of urbanization in 1999 could have been 24 per cent, 31 per cent or 73 per cent, depending on which of three official definitions of urban populations was used. India appears to be a predominantly rural nation, but most of India’s rural population lives in settlements with between 500 and 5,000 inhabitants, which are considered as villages and therefore classified as rural; many more live in settlements with more than 5,000 inhabitants, which are still classified as rural. If these were classified as ‘urban’ (as they would be by some national urban definitions), India would suddenly have a predominantly urban population.

And then there's Egypt:

in 1996, 18 per cent of Egypt’s population lived in settlements with between 10,000 and 20,000 inhabitants that had many urban characteristics, including significant non-agricultural economies and occupational structures. These were not classified as urban areas – although they would have been in most other nations. If they were considered urban, this would mean that Egypt was much more urbanized, causing major changes to urban growth rates.

Remember that Egypt's central government had an incentive to systematically under-estimate its urban population, as granting city status to an area meant allocating it more funding and representation in parliament.

The lesson here is that people should be careful about using data on cross-country comparisons, they should be extra careful when it comes to data on topics like urbanisation where there is no standard definition, and they should probably be extra extra careful about data that is produced by governments like Egypt's. After all, if your argument is that the Egyptian government is dysfunctional, then shouldn't you be at least slightly sceptical about the data that government produces?

Sunday, 17 October 2010

Learning from bike hire visualisations - cheers for Dublin, jeers for Melbourne

Municipal bike hire schemes are increasingly popular around the world, and here in London our own version seems to be bedding in pretty well. One of the most interesting side-effects of the new scheme has been the number of innovative visualisations popping up, of which probably the best looking is Oliver O'Brien's. It shows the location of each docking station, the number of bikes available, which stations are completely empty or completely full, and the total number of bikes in use, along with various statistics comparing usage over time. You can even run a very cool animation of the pattern of usage over the last 48 hours.

Now Oliver has outdone himself by creating similar interactive maps for 14 other city bike hire schemes from around the world (select them from the drop-down list on the main map page). They differ not just in terms of scale (from Paris's vast Velib to Girona's tiny scheme) but also in patterns of usage, with some looking more monocentric than others.

On top of the maps, there is now also a neat set of gauges allowing you to instantly compare scheme usage (and 'imbalance' between docking stations) across all 15 bike schemes. As well as being fascinating there is a lot of potential here for other cities thinking of setting up schemes of their own to learn what works and what doesn't. As an example, Mikael at Copenhagenize.com ran a sort-of scientific comparison of usage at 8am across the European cities in the sample. I was pleasantly surprised to see my home town of Dublin coming out with the highest usage, with 20% of bikes in use. Surprisingly Paris had relatively low usage at 4.3%, though of course a scheme that starts out concentrating on a small but very busy part of town is likely to see higher average usage than one which tries to cover a huge part of the city.

Looking outside Europe, Mikael notes that Melbourne's scheme looks distinctly unpopular. In what is probably not a coincidence, it is also the only one with a law making helmets mandatory for all cyclists. There's a lesson there ...