Tag: Analyse

Detecting vandalism in OpenStreetMap – A case study

This blog post is a summary of my talk at the FOSSGIS & OpenStreetMap conference 2017 (german slides). I guess some of the content might be feasible for a research article, however, here we go:

Vandalism is (still) an omnipresent issue for any kind of open data project. Over the past few years the OpenStreetMap (OSM) project data has been implemented in a number of applications. In my opinion, this is one of the most important reasons why we have to bring our quality assurance to the next level. Do we really have a vandalism issue after all? Yes, we do. But first we should take a closer look at the different vandalism types.

It is important to distinguish between different vandalism types. Not each and every unusual map edit should be considered as vandalism. Based on the OSM wiki page, I created the following breakdown. Generally speaking, vandalism can occur intentionally and unintentionally. Therefore we should distinguish between vandalism and bad-map-editing-behavior. Oftentimes new contributors make mistakes which are not vandalism because they do not have the expert mapper knowledge. In my opinion, only intentional map edits such as mass-deletions or “graffiti” are real cases of vandalism.

To get an impression of the state of vandalism in the OSM project, I conducted a case study for a four week timeframe (between January 5th and February 12th, 2017). During my study I analyzed OSM edits, which mostly deleted objects from new contributors who created fictitious data or changesets for the Pokemon game. If you did not hear or read about OSM’s Pokemon phenomena, you can read more about it here. The OSM wiki page for quality assurance lists some tools that can be used for vandalism detection. However, for this study I applied my own developed OSM suspicious webpage and the quite useful augmented OSM change viewer (Achavi). Furthermore, a webpage that lists the newest OSM contributors may also be of interest to you.

So what can you do when you find a strange map edit that could be a vandalism case? The OSM help page contains an answer for that. First of all: Keep calm! Use changeset comments and try to ask in a friendly manner for the suspicious mapping reasons.

Results of the study: Overall I commented 283 Changesets in the aforementioned timeframe of four weeks. Unfortunately I did not count the number of analyzed changesets, but I assume that it should be around 1,200 (+- 200). The following chart shows the commented changesets per day. The weekends tend to have a larger number of commented/discussed changesets.

As mentioned in the introduction of the vandalism types, we should distinguish between different vandalism types. The following image shows the numbers for each category. In my prototype study, 45% of the commented changesets were vandalism related and 24% have already been reverted which was not documented in the discussion of the changeset. Sometimes I also found imported test- and fictitious data, which the initial contributor of the changeset didn’t revert. It should be clarified to everyone that the live-database should never be used for testing purposes. Interested developers can use the test API and a test database (see sandbox for editing).

Responses and spatial distribution: Overall I received 70 responses for the discussed changesets, sadly only 20 from the owner/contributor of the changeset. But, more or less every response was in a friendly manner. Most often the contributors wrote “thank you” or “I didn’t know that my changes are going to be saved in the live database”. Furthermore, if I received a response, it was within 24 hours.

The following map contains some clustered markers. Each one highlights areas where the discussed changesets are located. As you can see on the map, the commented changesets are spread almost all over the world. In some areas they tend to correlate with the number of active OSM’ers. However, here is some additional information about three selected areas: 1: USA – Several cases of Pokemon Go related and fictitious map edits. 2: Japan/China – Some mass deletions and 3: South Africa – Oftentimes new MissingMaps or HOT contributors tend to delete and redraw more or less the same objects such as buildings. I guess it was not explained well enough to these editors that this destroys the object history? However, the article about “Good practice” in the OSM wiki is quite useful in this case.

Conclusion: The study reveals that there is an ongoing issue with vandalism in OSM’s map data. I think we do need to simplify the tools for detecting vandalism. In particular we should omit work where several users review identical suspicious map edits. Maybe the best possible solution should be a tool which is integrated directly in the OSM.org infrastructure. However, my presentation also contained some statistics and charts about the OSM changeset discussions feature. This will be the content of a separate blog post in following weeks. Also, the prototype introduced at the end of my talk will (hopefully) be presented in the next few months.

Thanks to maɪˈæmɪ Dennis.

A comparative study between different OpenStreetMap contributor groups – Outline 2016

Over the past few years I have written several blog posts about the (non-) activity of newly registered OpenStreetMap (OSM) members (2015, 2014, 2013). Similarly to the previous posts, the following image shows the gap between the number of registered and the number of active OSM members. Although the project still shows millions of new registrations, “only” several hundred thousand of these registrants actually edited at least one object. Simon showed similar results in his yearly changeset studies.

2016members

The following image shows, that the project still has some loyal contributors. More specifically, it shows the increase in monthly active members over the past few years and their consistent data contributions based on the first and latest changeset:

2016months

However, this time I would like to combine the current study with some additional research. I tried to identify three different OSM contributor groups, based on the hashtag in a contributor’s comment or the utilized editor, for the following analysis:

  1. Contributors of the MissingMaps-Project: A contributors of the project usually use #missingmaps in their changeset.
  2. Contributors that utilized the Maps.Me app: The ‘created_by’-tag contains ‘MAPS.ME’.
  3. All other ‘regular’ contributors of the OSM project, who don’t have any #missingmaps in their changesets and neither used the maps.me editor.

In the past 12 months, almost 1.53 million members registered to the OSM project. So far, only 12% (181k) ever created at least one map edit: Almost 12,000 members created at least one changeset with the #missingmaps hashtag. Over 70,000 used the maps.me editor and 99,000 mapped without #missingmaps and the maps.me editor. The following diagram shows the number of new OSM contributors per month for the three aforementioned groups.

2016permonth

The release of the maps.me app (more specifically the OSM editor functionality) clearly has an impact on the monthly number of new mappers. Time for a more detailed analysis about the contributions and mapping times: The majority of the members of the groups don’t show more than two mapping days (What is a mapping day, you ask? Well, my definition would be: A mapping day is day, where a contributor created at least one changeset). Only around 6% of the newly active members are contributing for more than 7 days.

2016mappingdays

Some members of the #missingmaps group also contributed some changesets without the hashtag. But many of those members (70%) only contributed #missingmaps changesets. Furthermore, 95% of this adjusted group doesn’t map for more than two days. Anyway, despite identifying three different contributor groups, the results are looking somewhat similar. Let’s have a look at the number of map changes. The relative comparison shows that the smaller #missingmaps group produces a large number of edits. The maps.me group only generates small numbers of map changes to the project’s database.

2016mapchanges

Lastly, I conducted an analysis for three selected tag-keys: building, highway and name. The comparison shows that the #missingmaps group generates a larger number of building and highway features. In contrast “regular” OSM’ers and maps.me users contributed more primary keys such as the name- or amenity-tag.

2016tags

I think the diagrams in this blog post are quite interesting because they show that the #missingmaps mapathons can activate members that contribute many map objects. But they also indicate that the majority of these elements are traced from satellite imagery without primary attributes. In contrast the maps.me editor functionality proofed to be successful with its in-app integration and its easy usability, which resulted in a huge number of new contributors. In summary, I think it would be good to motivate contributors not only to participate in humanitarian mapathons but also to map their neighborhood in an attempt to stick to the project. Also, I guess it would be great if the maps.me editor would work on the next steps in providing easy mapping functionality for its users (of course with some sort of validation to reduce questionable edits).

Thanks to maɪˈæmɪ Dennis.

What Impact has the OSM License Change in Germany on the Street Network Length? – 1st Attempt –

The OpenStreetMap project will possibly finalize its license change on April 1st 2012. There are certain concerns in the community about possible data losses and to keep them as little as possible, several remapping activities have been started. A really nice overview of “Remapping principles” and “Tools to help you” can be found here.

Frederik’s OSMInspector (OSMI) and Simon’s CLEANMAP are two very handy remapping tools. Both display data that will likely be removed after April 1st due to the fact that this data was collected by contributors that did not accept the license change. In Germany you will find several areas that are affected by these changes and might even leave some new blank spots in the map. But what impact do these changes have on the total length in kilometers per street category in Germany?

You can find several files regarding the OSMI license change view at a Geofabrik server here. Based on the “ways” shape file that you can find there it is possible to calculate the total length of the ways, which will likely be removed with the license change. However, sadly the “ways” shape file does not include any “highway” attribute, but luckily it includes the OSM IDs. This means that to be able to do a Germany street network analysis you will have to download the Geofabrik Germany OSM *.pbf file. By applying a short script you can get all OSM way IDs in Germany with their highway=* key/value pair. Combining these with the “ways” shapefile allows us to calculate the total lengths of each highway type for “Germany” (based on the Geofabrik extract!).

The following image shows the results of a first attempt to visualize the values per street category. Overall this means that based on the current (January 15th 2012) license agreement/disagreement situation about 5.4% (94000 km) of the current street network in Germany will be removed after the license change in April. The relative difference for each highway type lies between 3-8%. Last week (Jan. 7th, 2012) the total amount in Germany was 5.9% and 103000 km.

The OSMI License Change view contains not only the data that will potentially be removed in the future but also some information on two additional feature-types: Features, which have been modified and features which have been modified in some minor way by a contributor that declines the license change. In the first case we have a total street network length of about 58000 km that is affected and in the second case about 17000 km. Remember, these numbers only reflect the situation in Germany! You can find more information about the different feature types here: “Understanding the Colour Scheme“.

Notice: This was a short hack done last night, but I think those numbers look realistic. Can anyone confirm this for Germany? I am very curious how and if these numbers will decrease in the next few weeks. What do you guys think?

thx @ maɪˈæmɪ Dennis

OSM Routing View Worldwide 2011-11

Really great news for all our non-European OpenStreetMap.org Mappers: Since last month, the OSM Routing View is available for the whole world. You can read more in Frederik’s blog post. Yesterday he sent me the latest results of the view and I did some analysis with it. To all new readers: you can find more information about the OSM Inspector (OSMI) here. The Routing View within the OSMI “shows problems in the data, related to routing and navigation” (direct link).

However, here are the new *worldwide* stats for November 2011: we have a total of about 1,3 Mio errors. We can divide them into the following groups:

  • Unconnected 1 meter: 248000
  • Unconnected 2 meter: 62000
  • Unconnected 5 meter: 170000
  • Duplicate (number of duplicate segments): 833000

The following diagram shows the amount of errors per continent:

In the following charts you can see the amount of errors separated by country and the amount of errors in detail per country for “Europe”:

*NEW*: All other non-European countries with more then 5000 errors are listed in the following chart:

The “big three” countries with the highest amount of errors are in the last chart:

As you can see it in the charts, especially the United States need a lot of work. Furthermore it seems that in Ethiopia something went wrong. Was there any data-import or something similar? Frederik does not have a sponsor for running this routing view world-wide on a daily basis right now, so please contact him if you would support us! The last Routing View blog post is online here.

thx @ *Fab*

The OpenStreetMap Evolution of Austria (2007-2011)

Currently I am working on a research paper about the OpenStreetMap evolution of Germany. For the last AGIT conference in Salzburg and the upcoming State of the Map Europe (SotM-EU) conference in Vienna I did a similar analysis about the OpenStreetMap Evolution of Austria. You can see the results in the following posters in English and German:

The OpenStreetMap Evolution of Austria (2007–2011)

The OpenStreetMap Evolution of Austria (2007–2011)

Die OpenStreetMap Entwicklung in Österreich (2007–2011)

Die OpenStreetMap Entwicklung in Österreich (2007–2011)

A further nice visualization of the OpenStreetMap data in Austria for the year 2010, can be found in a blog post by Max Kossatz.

A comparison of several routing-engines – Which one is the fastest?

In the past blog post I wrote about the newest changes and encoding techniques that have been implemented in the Open Source Routing Project (OSRM). So I think it is time do a little comparison analysis about the request/response time of several routing APIs. The main question I wanted to answer was: “Is an OpenStreetMap direction service faster than G**gle?” I tested the following direction APIs for cars (fastest): MapQuest, CloudMade, G**gle and finally OSRM. For the analyses I wrote a small Java tool, which measured the time to get a result of a routing-service. I did all tests at home with a “regular” 12kbit/s internet connection. I tested several distance levels and the results can be seen in the following table. It shows the average times of five requests for each route with a delay of 3 seconds between each request/response. Overall I did this analysis three times.

The results are quite impressive. OSRM calculates the fastest route for all five test routes! Unfortunately we do not have any information about the server infrastructure at G**gle, MapQuest or CloudMade but the OSRM engine is running on a virtual server with limited hardware resources. It seems that the CloudMade directions service does not like Paris very much, as can be seen in the following image 😉

I will try to do a second comparison with Bing Maps, YOURS and Routino. So stay tuned …

— Update —

>> The second blog post is here: Comparison of (OSM) routing-engines – Reloaded <<

thx @ maɪˈæmɪ Dennis 🙂

Edit Stats for OSM Japan

Kate created some editing stats for OpenStreetMap Japan last Thursday. You can find her blog post here: “Quick Japan Editing Stats for OpenStreetMap”

During the creation of the layers for the “Road Status in Japan”, I log some editing information of OpenStreetMap too. As I mentioned in my blog post, I use the Geofabrik extracts for Japan (Sendai region only). They have the following bounding box (thx Frederik):

polygon
1
1.412259E+02 3.663895E+01
1.427964E+02 4.038643E+01
1.411296E+02 4.038351E+01
1.394639E+02 3.665750E+01
1.412259E+02 3.663895E+01
END
END

And here are several diagrams of the editing in Japan (Sendai region only):

In numbers (March 20th, 2011 12:50):

  • Overall amount of OSM Nodes: 5138123
  • Overall amount of OSM Ways: 149978
  • Overall amount of Highways: 47156
  • Number of Barrier Nodes: 528
  • Impassable Ways: 463
  • Number of Users (Contributors): 308
  • Length of OSM Ways [km]: 29049,71
  • Length of impassable Ways [km]: 222,58

thx @ Dennis and best of luck for tomorrow!

[Update #1 – March 25th, 2011 21.00] – I have updated all diagrams above !

  • Overall amount of OSM Nodes: 5258135
  • Overall amount of OSM Ways: 169557
  • Overall amount of Highways: 59120
  • Number of Barrier Nodes: 549
  • Impassable Ways: 801
  • Number of Users (Contributors): 414
  • Tsunami:Damage Polygons: 608
  • Length of OSM Ways [km]: 32408,75
  • Length of impassable Ways [km]: 371,67

[Update #2 – April 08th, 2011] – I have updated all diagrams above !

  • Overall amount of OSM Nodes: 6304539
  • Overall amount of OSM Ways: 271768
  • Overall amount of Highways: 150556
  • Number of Barrier Nodes: 597
  • Impassable Ways: 879
  • Number of Users (Contributors): 435
  • Tsunami:Damage Polygons: 622
  • Length of OSM Ways [km]: 58700,66
  • Length of impassable Ways [km]: 387,60

The Return of “Unmapped Places in OSM EU”

My last blog post about “Unmapped Places in Europe” was read by more then 800 people. So I think it’s time to redo the analyses a second time after three months. At the moment (March 11th, 2011) we have (according to the Geofabrik extract) 505091 places in OpenStreetmap Europe. They can be separated into the following place-types:

  • city=1055 (as of Dec. 5th, 2010 it was 1045 -> +1%)
  • town=16106 (as of Dec. 5th, 2010 it was 16032 -> +0.5%)
  • suburb=24913 (as of Dec. 5th, 2010 it was 23563 -> +6%)
  • village=278691 (as of Dec. 5th, 2010 it was 271147 -> +3%)
  • hamlet=184326 (as of Dec. 5th, 2010 it was 165804 -> +11%)

During my last and also during this analysis I *only* used those places that had a corresponding “village”-value. My tool works so far as followed:

  1. Get only places with a village-tag.
  2. Search nearby (ca. 600m distance) for a street with one of the following highway-types: residential, service, living_street, cycleway, footway, pedestrian, steps or platform.
  3. If no street can be found, mark the place as “unmapped”!

Why did I only use villages? Well for the case of Germany it can be assumed that places with “higher” place-type tags such as “town” or “city” have already been mapped.

What are the “false=positives” and why is your village marked as unmapped? A village should usually have at least one of the roads mentioned above otherwise the place should be mapped as a hamlet. Would you agree with this?

The results showed that of the total 278691 villages, about 152337 (55%) (in Dec. 2010 it was 58%) who are located within “Europe”, have not been detected or mapped yet. The following diagram shows the distribution of the numbers by country.

YAY, Germany!!!

The results can again be displayed as a GPX-overlay on a map which can be found here http://resultmaps.neis-one.org :

(Remarks for http://resultmaps.neis-one.org: Not each and every country is available as an overlay. Some countries such as France, Poland and Ukraine showed longer browser loading times to display the GPX-overlays!)

thx @ Dennis

Growing agreement & relicensing OSM -Update-

My last blog post about the growing agreement to the new CTs is now nearly three months ago. Time for a short update: During the aforementioned time frame, about 32 contributors accepted the new CTs every day. Overall this means that since October 21 2010 there are about 43 contributors each day who accepted the new CTs. I updated my diagram with the latest numbers:

In December I conducted an analysis about the “Change of OSM object numbers through relicensing”. This time I only declared the last modifier of an OSM object (node/way/relation) as the owner of the object! The last and the new results can be seen in the following diagrams:

In my OSM-user-database of March 9th, 2011 a total of 120456* members are the “owners” of the following OSM objects (* Notice: Not every member of the OSM project (>350000 members) has contributed!):

  • Number of nodes: 1007604532
  • Number of ways: 85365727
  • Number of relations: 899145

As of March 9th, 2011, 8124 Users have accepted the new license. 35678 new OSM members (uid >= 286582) have accept the new contributor terms automatically. I created the following numbers of OSM objects, which will be available for relicensing (if you assume that the last modifier is the owner of the object):

  • Number of nodes: 801700665 (79,56%) (as of Dec. 15th, 2010 it was 66,52%)
  • Number of ways: 66236798 (77,59%) (as of Dec. 15th, 2010 it was 61,68%)
  • Number of relations: 716130 (79,65%) (as of Dec. 15th, 2010 it was 62,14%)

Are you still interested in any analysis regarding this topic?

thx @ Dennis

[Update – March 24th, 2011]
The following diagramm shows the above numbers of March 9th, 2011 in percent:

Updated Error Summary for Europe

This month I tried something new. But first we will start with the usual monthly stats of the OSM Inspector Routing for Europe, this time for the middle of February 2011. Overall the following amount of errors appears for “Europe”: Unconnected Roads: ca. 107000 and Duplicate Ways (number of duplicate segments): ca. 109000 (in the OSM Wiki you can find more information about the error-types). This means that altogether there are 2600 unconnected streets and 16900 duplicate way segment errors have been fixed. In total we have an increment of 1111000 new OSM way segments for routing during the past 4 weeks in Europe (01/20/2011: 30600000, 02/20/2011: 31710000).

The following image shows the amount of errors divided by country for today’s Europe OpenStreetMap dataset:

In the past month several other countries were able to reduce the amount of errors, such as in: France (-1600), Italy (-1600), Poland (-1900), Sweden (-2300) and United Kingdom (-8000!!!). So congratulation to the UK, this is your month 🙂

Now let’s take a look at the new diagram: The following image shows the amount of errors per 100 km OpenStreetMap streetnetwork data for each country.

Do you have any other ideas for additional diagrams? I think dividing the amount of errors for each country by the number of OSM ways or segments could be an interesting approach, what do you think? The last image shows the amount of errors divided by country:

thx @ Dennis