Last year we wrote a journal paper in which we analyzed the OpenStreetMap (OSM) dataset of the United States which was published on May 28th, 2013 in the Transactions in GIS Journal. You can download a free pre-print version here. This paper has been published just on time to add to the discussion at the upcoming State of the Map United States conference which will take place in San Francisco and includes some presentations about data imports to OSM. Unfortunately, Dennis and I cannot attend the conference this year, so we decided to write a blog post with some additional and up-to-date numbers.
In January there was an announcement on the OSM mailing list that in the past few months many connectivity errors in the United States OSM dataset had been fixed. Probably a lot of these fixes can be attributed to Martijn’s Maproulette website or to Geofabrik’s OSM Inspector (OSMI) Routing View. However, a short discussion started on the mailing list about the total number of errors that are left and how long it would take to fix all those errors. Thus, we downloaded four OSM planet files dated Jan 4th 2012, June 13th 2012, Jan 2nd 2013 and Jun 2nd 2013 to get some new results. After cutting the United States dataset from the planet files, we used the same algorithm as utilized in OSMI’s Routing View, to receive some stats about the street network of the US datasets.
First of all the, the following image shows the number of errors for each dataset that we included in the analysis. The errors that were detected are separated into unconnected and duplicate ways. You can find some additional information about both error types here.
As you can see, the number of unconnected OSM ways has been rapidly reduced in the past 17 months from around 141,000 to 19,000. The number of “duplicate way” errors has been reduced from 17,500 to 11,500. You can find the exact numbers in the following table and an updated error layer on the mentioned OSMI website. In certain cases the duplicate way error created several errors for one and the same way. For these particular cases the number of unique OSM way IDs were counted.
Date – Unconnected Ways – Duplicate Ways
- Jan 4th, 2012 – 141,578 – unique 17,563 (overall errors: 535,923)
- June 13th, 2012 – 145,468 – unique 17,977 (overall errors: 518,536)
- Jan 2nd, 2013 – 15,911 – unique 12,287 (overall errors: 257,388)
- Jun 2nd, 2013 – 19,073 – unique 11,582 (overall errors: 220,451)
Overall the length of the US street network did not really change a lot. At the beginning of 2012 it was around 11.07 million km while in 2013 it is 11.1 million km, which means an increase of around 30,000 km. The following image shows the distribution of the US street network divided by different OSM road classes.
The length of the residential roads is still decreasing (-496,000 km), similar to what we saw during the analysis for our paper, while the length of the other road types (+276,000 km) and secondary/tertiary roads (+205,000 km) is increasing. This is the result of a massive retagging process of the imported TIGER/Line dataset in OSM. Dennis mentioned this already in his SotM US 2012 presentation. Motorways also experienced an increase of around +44,000 km in 2012. You will find some additional, quite interesting statistics, charts and of course maps in the aforementioned journal publication. In particular a few more thoughts and facts about the effect and impact of data imports on OSM can be found in our research study about the United States OSM dataset.