Category: Analyses

Detecting vandalism in OpenStreetMap – A case study

This blog post is a summary of my talk at the FOSSGIS & OpenStreetMap conference 2017 (german slides). I guess some of the content might be feasible for a research article, however, here we go:

Vandalism is (still) an omnipresent issue for any kind of open data project. Over the past few years the OpenStreetMap (OSM) project data has been implemented in a number of applications. In my opinion, this is one of the most important reasons why we have to bring our quality assurance to the next level. Do we really have a vandalism issue after all? Yes, we do. But first we should take a closer look at the different vandalism types.

It is important to distinguish between different vandalism types. Not each and every unusual map edit should be considered as vandalism. Based on the OSM wiki page, I created the following breakdown. Generally speaking, vandalism can occur intentionally and unintentionally. Therefore we should distinguish between vandalism and bad-map-editing-behavior. Oftentimes new contributors make mistakes which are not vandalism because they do not have the expert mapper knowledge. In my opinion, only intentional map edits such as mass-deletions or “graffiti” are real cases of vandalism.

A comparative study between different OpenStreetMap contributor groups – Outline 2016

Over the past few years I have written several blog posts about the (non-) activity of newly registered OpenStreetMap (OSM) members (2015, 2014, 2013). Similarly to the previous posts, the following image shows the gap between the number of registered and the number of active OSM members. Although the project still shows millions of new registrations, “only” several hundred thousand of these registrants actually edited at least one object. Simon showed similar results in his yearly changeset studies.

2016members

The following image shows, that the project still has some loyal contributors. More specifically, it shows the increase in monthly active members over the past few years and their consistent data contributions based on the first and latest changeset:

2016months

However, this time I would like to combine the current study with some additional research. I tried to identify three different OSM contributor groups, based on the hashtag in a contributor’s comment or the utilized editor, for the following analysis:

Unmapped Places of OpenStreetMap – 2016

Back in 2010 & 2011 I conducted several studies to detect underrepresented regions a.k.a. “unmapped” places in OpenStreetMap (OSM). More than five years later, some people asked if I could rerun the analysis. Based on the latest OSM planet dump file and Taginfo, almost 1 million places have been tagged as villages. Furthermore, around 59 million streets have a residential, unclassified or service highway value. My algorithm to find unmapped places, works as follows:

  1. Use every place node of the OSM dataset which has a village-tag (place=village).
  2. Search in a radius of ca. 700 m for a street with one of the following highway-values: residential, unclassified or service.
  3. If no street can be found, mark the place as “unmapped”!

My results for the entire OSM planet can be found under the following webpage.

unmapped

OpenStreetMap Crowd Report – Season 2015

Almost one year has passed again. This means it’s time for the fourth OpenStreetMap (OSM) member activity analysis. The previous editions are online here: 2014, 2013 and 2012. Simon Poole already posted some interesting stats about the past few years. You can find all his results on the OSM wiki page. However, similar to last year, I try to dig a little deeper in some aspects.

Overall the OSM project has officially more than 2.2 million registered members (Aug, 9th 2015). For several of my OSM related webpages I create a personal OSM contributor database, based on the official OSM API v0.6. Anyway, when using this API, the final table will show a list with more than 3 million individual OSM accounts (Aug, 9th 2015). I’m not sure what the cause for this gap of almost 1 million members between the official number and the member number extracted with the API could be. Maybe some of you have a possible explanation? However, I think many accounts are created by spammers or bots.

Counting changes per Country – A different approach

OSMstats contains several statistics about the OpenStreetMap (OSM) project, such as daily-created objects, the amount of active contributors or detailed numbers for individual countries. One way to determine the sum of created or modified Node objects, is to use the minutely, hourly or daily OSM replication change files and counting the values for each country of the world. Sadly, this approach has some drawbacks. Firstly, the official files do not contain, for example, all Nodes of a modified way, which is required, when trying to find the country where the change took place. Furthermore, the determination of the country for a specific OSM object really depends on the border’s level of detail: More detailed country borders make the processing quite time-consuming. Some of you probably experienced this problem before when using Osmosis or a different OSM processing tool. Anyway, for calculating additional country statistics I tried a new approach:

  1. Determine the country of a changset based on its center position
  2. Use the changeset country information for all objects within this changeset.

A précis: Where are the US mappers at?

This blog post is a summary of Dennis’ and my State of the Map (SotM) United States presentation. Maybe some of you already know about our publication: “Comparison of Volunteered Geographic Information Data Contributions and Community Development for Selected World Regions”. From the abstract: “Our findings showed significantly different results in data collection efforts and local OSM community sizes. European cities provide quantitatively larger amounts of geodata and number of contributors in OSM …”. “Furthermore, the results showed significant data contributions by members whose main territory of interest lies more than one thousand kilometers from the tested areas.” Especially the last finding is quite interesting when considering “arm-chair-mapping” in OSM.

However, for our SotM US session we repeated some of the conducted analyses for 50 urban areas in the United States to see whether similar patterns could be determined. You can find the session abstract here; additionally the ppt slides and also a video are online. The following animation shows the number of contributor’s evolution in the US from 2007 to 2014.

The State of the Map. United States. Street Network. 2013

Last year we wrote a journal paper in which we analyzed the OpenStreetMap (OSM) dataset of the United States which was published on May 28th, 2013 in the Transactions in GIS Journal. You can download a free pre-print version here. This paper has been published just on time to add to the discussion at the upcoming State of the Map United States conference which will take place in San Francisco and includes some presentations about data imports to OSM. Unfortunately, Dennis and I cannot attend the conference this year, so we decided to write a blog post with some additional and up-to-date numbers.

Introducing OpenStreetMap Contributor Activity Areas

One month ago I wrote a blog post about a new website which allows you to see other OpenStreetMap contributors in your area. Overall the feedback was very positive, thank you very much for that! However, now it is time for a new extension to the “How did you contribute to OpenStreetMap?” (HDYC) webpage. As I mentioned in my last blog post, I used an algorithm (which is described in a paper that I wrote here) to compute and determine the activity area of a contributor based on her/his changeset centers. The following figure shows the new function that was added to the HDYC website visualizing the activity area of a contributor! Sorry Harry, as always you have to be our guinea pig, but you have a really awesome activity area 🙂

Which country has the most OpenStreetMap GPS Points?

Some of you might already know that OpenStreetMap released a first bulk GPS point dataset last weekend. It contains almost 2.8 milliard (or for readers in the US 2.8 billion) points and is provided in its raw format, which means that only coordinate information is available for each point. Unfortunately it does not include any additional information or metadata. You can read more about it at the OSM Foundation Blog.

The first idea that came to my mind was a simple comparison analysis to answer the following questions: Where are all those points located and which country has the most GPS points? In a first try I conducted some results that showed that all points are distributed over 238 countries. For my analysis I used the OSM Mapnik world boundaries from the wiki. As you can see in the following pie chart, nearly 21% of the points are located in Russia (about 570 million points) and another 18% in Germany (about 500 million points). Does Russia have so many GPS points because of the country size or is the community just exceptionally active with GPS devices? However, the strange thing is that Germany is, with about 18%, “only” on the second place this time, weird isn’t it? 😉

What Impact has the OSM License Change in Germany on the Street Network Length? – 1st Attempt –

The OpenStreetMap project will possibly finalize its license change on April 1st 2012. There are certain concerns in the community about possible data losses and to keep them as little as possible, several remapping activities have been started. A really nice overview of “Remapping principles” and “Tools to help you” can be found here.

Frederik’s OSMInspector (OSMI) and Simon’s CLEANMAP are two very handy remapping tools. Both display data that will likely be removed after April 1st due to the fact that this data was collected by contributors that did not accept the license change. In Germany you will find several areas that are affected by these changes and might even leave some new blank spots in the map. But what impact do these changes have on the total length in kilometers per street category in Germany?