Tag: Quality

Detecting vandalism in OpenStreetMap – A case study

This blog post is a summary of my talk at the FOSSGIS & OpenStreetMap conference 2017 (german slides). I guess some of the content might be feasible for a research article, however, here we go:

Vandalism is (still) an omnipresent issue for any kind of open data project. Over the past few years the OpenStreetMap (OSM) project data has been implemented in a number of applications. In my opinion, this is one of the most important reasons why we have to bring our quality assurance to the next level. Do we really have a vandalism issue after all? Yes, we do. But first we should take a closer look at the different vandalism types.

A comparative study between different OpenStreetMap contributor groups – Outline 2016

Over the past few years I have written several blog posts about the (non-) activity of newly registered OpenStreetMap (OSM) members (2015, 2014, 2013). Similarly to the previous posts, the following image shows the gap between the number of registered and the number of active OSM members. Although the project still shows millions of new registrations, “only” several hundred thousand of these registrants actually edited at least one object. Simon showed similar results in his yearly changeset studies.

2016members

The following image shows, that the project still has some loyal contributors. More specifically, it shows the increase in monthly active members over the past few years and their consistent data contributions based on the first and latest changeset:

2016months

However, this time I would like to combine the current study with some additional research. I tried to identify three different OSM contributor groups, based on the hashtag in a contributor’s comment or the utilized editor, for the following analysis:

Unmapped Places of OpenStreetMap – 2016

Back in 2010 & 2011 I conducted several studies to detect underrepresented regions a.k.a. “unmapped” places in OpenStreetMap (OSM). More than five years later, some people asked if I could rerun the analysis. Based on the latest OSM planet dump file and Taginfo, almost 1 million places have been tagged as villages. Furthermore, around 59 million streets have a residential, unclassified or service highway value. My algorithm to find unmapped places, works as follows:

  1. Use every place node of the OSM dataset which has a village-tag (place=village).
  2. Search in a radius of ca. 700 m for a street with one of the following highway-values: residential, unclassified or service.
  3. If no street can be found, mark the place as “unmapped”!

My results for the entire OSM planet can be found under the following webpage.

unmapped

Verified OpenStreetMap contributor profiles?

The reputation of a contributor in OpenStreetMap (OSM) plays a significant role, especially when considering the quality assessment of the collected data. Sometimes it’s difficult to make a meaningful statement about a contributor by simply looking at the raw mapping work represented by the number of created objects or used tags. Therefore, it would be really helpful if we would have some additional information about the person who contributes to the project. For example: Does she/he help other contributors? Is her/his work somehow documented or based on one of the “discussed” proposals? Or does she/he work as a lone warrior in the OSM world?

In 2010 I created “How did you contribute to OpenStreetMap?” (HDYC) as a kind of fun side project. Nowadays many people use it to get some detailed information about OSM contributors. Some of you are probably familiar with the “verified” icon used on some celebrity Twitter accounts. I created a similar new feature for the aforementioned HDYC page. If you connect your related OSM accounts, your profile will be marked as “verified”.

How to detect suspicious OpenStreetMap Changesets with incorrect edits?

Since its rise in popularity, the well-known online encyclopedia Wikipedia has been struggling with manipulation or, in the worst-case, vandalism attempts. Similarly, the OpenStreetMap (OSM) project suffered several times over the past few years of cases where incorrect map data edits were made. These erroneous edits can stem at times from (new) contributors or illegal data imports (or automated edits) which have not been discussed in advance with the community or the Data Working Group (DWG) and corrupted existing project data. The current OSM wiki page gives a great overview about general guidelines and e.g. types of vandalism. Another page in the wiki also mentions a prototype of a rule based system for the automatic detection of vandalism in OSM, which I developed in 2012. However, the system has never actually been implemented. Today, the contributors of OSM can use a variety of different tools to inspect an area or particular map changes. A few of them are listed below (complete list can be found here):

The State of the Map. United States. Street Network. 2013

Last year we wrote a journal paper in which we analyzed the OpenStreetMap (OSM) dataset of the United States which was published on May 28th, 2013 in the Transactions in GIS Journal. You can download a free pre-print version here. This paper has been published just on time to add to the discussion at the upcoming State of the Map United States conference which will take place in San Francisco and includes some presentations about data imports to OSM. Unfortunately, Dennis and I cannot attend the conference this year, so we decided to write a blog post with some additional and up-to-date numbers.

Updated Status for Unmapped Places

The last unmapped places analysis for OpenStreetMap that I conducted is nearly eight months ago. So I figured it was about time to create a new one. You can read in the last blog post how my algorithm exactly works.

However, at the moment (Nov. 4th. 2011) we have (according to the Geofabrik extract) about 597 000 entries in OSM for places that are located within “Europe“. This means we have an overall increase of about 90 000 places within the past eight months. We can separate them into several types with different values:

  • City: 1093 (as of March 11th, 2011 it was 1055 ; +3.6%)
  • Town: 16213 (as of March 11th, 2011 it was 16106 ; +0.7%)
  • Suburb: 29642 (as of March 11th, 2011 it was 24913 ; +19.0%)
  • Village: 301638 (as of March 11th, 2011 it was 278691 ; +8.2%)

Routing View Europe 2011-05

First of all, sorry that I did not create a new stat regarding the Routing View past month. To all the new readers: Usually I create an analysis about the Routing View of the OpenStreetMap Inspector for each month for Europe. You can find more information about the OSM Inspector (OSMI) here. The Routing View within the OSMI “shows problems in the data related to routing and navigation”. You can read more about it here … A direkt link to the OSMI Routing View is here!

However, here are the new stats for May, 2011: we have a total of about 124000 “Unconnected Roads” and about 108000 “Duplicate Ways” (number of duplicate segments). Overall this means that we have about 17000 *new* „Unconnected Roads” errors and only ca. 1300 “Duplicate Ways” have been fixed in Europe. For the past three months we have an increment of about 2850000 new OSM way segments for routing. (May 7th: 34500000, February 20th: 31700000, January 20th: 30600000)

“Unmapped” Places in Europe?

Recently some new posts on the German OpenStreetMap-Mailing-List regarding the coverage of yet undetected regions in OpenStreetMap have been accumulating, caused by the current clearance of the BING-aerial images.

In one of my former blog posts that I wrote back in August this year, I introduced an analysis that included the search for “places” such as small villages etc. in Germany that probably had not been mapped in OSM at that time (the post in German language can be found here).

However, I repeated this analysis using the database of the routing view. This time I expanded the research area to entire Europe. In total there are 477591 places in Europe covered in OSM (at the moment). They can be separated into the following place-types:

  • city=1045
  • town=16032
  • suburb=23563
  • village=271147
  • hamlet=165804

Routing View EU 2010-11

And again, here are the new statistics for the “Routing View EU“.

Overall (according to the Geofabrik extract) the following amount of errors appear for Europe at the middle of November 2010:

  • Unconnected Roads: ca. 107500
  • Duplicate Ways (number of duplicate segments): ca. 160000

Unfortunately this means that overall only 500 unconnected streets and 22000 duplicate way segment errors have been fixed (last month we had 108000 unconnected roads and 180000 duplicate way segments errors). As always, the following image shows the amount of errors divided by country:

Wow Italy! It’s really nice to see what’s happening there! During the past month they fixed more than 9000 errors again. And now they are really catching up with Germany 🙂 But also in several other countries some people were able to reduce the amount of errors too, such as in: Albania, Denmark, Greece, Iceland, Norway or Sweden. More than 1000 errors have been fixed here in each country 🙂