Detecting vandalism in OpenStreetMap – A case study

This blog post is a summary of my talk at the FOSSGIS & OpenStreetMap conference 2017 (german slides). I guess some of the content might be feasible for a research article, however, here we go:

Vandalism is (still) an omnipresent issue for any kind of open data project. Over the past few years the OpenStreetMap (OSM) project data has been implemented in a number of applications. In my opinion, this is one of the most important reasons why we have to bring our quality assurance to the next level. Do we really have a vandalism issue after all? Yes, we do. But first we should take a closer look at the different vandalism types.

It is important to distinguish between different vandalism types. Not each and every unusual map edit should be considered as vandalism. Based on the OSM wiki page, I created the following breakdown. Generally speaking, vandalism can occur intentionally and unintentionally. Therefore we should distinguish between vandalism and bad-map-editing-behavior. Oftentimes new contributors make mistakes which are not vandalism because they do not have the expert mapper knowledge. In my opinion, only intentional map edits such as mass-deletions or “graffiti” are real cases of vandalism.

To get an impression of the state of vandalism in the OSM project, I conducted a case study for a four week timeframe (between January 5th and February 12th, 2017). During my study I analyzed OSM edits, which mostly deleted objects from new contributors who created fictitious data or changesets for the Pokemon game. If you did not hear or read about OSM’s Pokemon phenomena, you can read more about it here. The OSM wiki page for quality assurance lists some tools that can be used for vandalism detection. However, for this study I applied my own developed OSM suspicious webpage and the quite useful augmented OSM change viewer (Achavi). Furthermore, a webpage that lists the newest OSM contributors may also be of interest to you.

So what can you do when you find a strange map edit that could be a vandalism case? The OSM help page contains an answer for that. First of all: Keep calm! Use changeset comments and try to ask in a friendly manner for the suspicious mapping reasons.

Results of the study: Overall I commented 283 Changesets in the aforementioned timeframe of four weeks. Unfortunately I did not count the number of analyzed changesets, but I assume that it should be around 1,200 (+- 200). The following chart shows the commented changesets per day. The weekends tend to have a larger number of commented/discussed changesets.

As mentioned in the introduction of the vandalism types, we should distinguish between different vandalism types. The following image shows the numbers for each category. In my prototype study, 45% of the commented changesets were vandalism related and 24% have already been reverted which was not documented in the discussion of the changeset. Sometimes I also found imported test- and fictitious data, which the initial contributor of the changeset didn’t revert. It should be clarified to everyone that the live-database should never be used for testing purposes. Interested developers can use the test API and a test database (see sandbox for editing).

Responses and spatial distribution: Overall I received 70 responses for the discussed changesets, sadly only 20 from the owner/contributor of the changeset. But, more or less every response was in a friendly manner. Most often the contributors wrote “thank you” or “I didn’t know that my changes are going to be saved in the live database”. Furthermore, if I received a response, it was within 24 hours.

The following map contains some clustered markers. Each one highlights areas where the discussed changesets are located. As you can see on the map, the commented changesets are spread almost all over the world. In some areas they tend to correlate with the number of active OSM’ers. However, here is some additional information about three selected areas: 1: USA – Several cases of Pokemon Go related and fictitious map edits. 2: Japan/China – Some mass deletions and 3: South Africa – Oftentimes new MissingMaps or HOT contributors tend to delete and redraw more or less the same objects such as buildings. I guess it was not explained well enough to these editors that this destroys the object history? However, the article about “Good practice” in the OSM wiki is quite useful in this case.

Conclusion: The study reveals that there is an ongoing issue with vandalism in OSM’s map data. I think we do need to simplify the tools for detecting vandalism. In particular we should omit work where several users review identical suspicious map edits. Maybe the best possible solution should be a tool which is integrated directly in the OSM.org infrastructure. However, my presentation also contained some statistics and charts about the OSM changeset discussions feature. This will be the content of a separate blog post in following weeks. Also, the prototype introduced at the end of my talk will (hopefully) be presented in the next few months.

Thanks to maɪˈæmɪ Dennis.

Reviewing OpenStreetMap contributions 1.0 – Managed by changeset comments and discussions?

The OSM project still records around 650 new contributors each day (out of almost 5,000 registered members per day). Some countries (such as Belgium or Spain) already provide platforms to coordinate the introduction to OSM for new mappers. Others use special scripts or intense manual work to send the newly registered contributors mails with useful information (Washington or The Netherland). However, oftentimes new contributors make, as expected, beginner-mistakes. Personally, I often detect unconnected ways, wrong tags or rare fictive data. Unfortunately, sometimes (new) members also delete, intentionally or unintentionally, existing map data.

At the end of 2014, many people were anticipating the newly introduced changeset discussions feature. A few months later, I developed a page that finds the latest discussions around the world or in your country. By now, many OSM members use changeset discussions for commenting or questioning map edits of other members.

main

However, one year ago, almost to the day, I wrote a blog post about a webpage for detecting suspicious OSM edits. In the newly updated version, I would like to combine the aforementioned changeset discussions and comments about suspicious edits to communicate with members in a more direct way. The following image shows the revised webpage.

map

Furthermore you can request all changesets of a contributor, which have been commented on. The same page can also show all comments written by a selected contributor (with all comments of the particular changeset). I think the last both features are really helpful for keeping control over your own and other changeset discussions. This should also simplify the reviewing process of changesets and map edits.

overview

As mentioned at the beginning of this blog post, some OSM groups send a welcoming e-mail to new contributors. I also saw that some mappers are welcoming new members in Taiwan with a changeset comment and information on their first changeset. Pretty neat stuff if you ask me.

Latest OSM Changeset Discussions: http://resultmaps.neis-one.org/osm-discussions
Find Suspicious OSM Changesetshttp://resultmaps.neis-one.org/osm-suspicious

Thanks to maɪˈæmɪ Dennis.

A comparative study between different OpenStreetMap contributor groups – Outline 2016

Over the past few years I have written several blog posts about the (non-) activity of newly registered OpenStreetMap (OSM) members (2015, 2014, 2013). Similarly to the previous posts, the following image shows the gap between the number of registered and the number of active OSM members. Although the project still shows millions of new registrations, “only” several hundred thousand of these registrants actually edited at least one object. Simon showed similar results in his yearly changeset studies.

2016members

The following image shows, that the project still has some loyal contributors. More specifically, it shows the increase in monthly active members over the past few years and their consistent data contributions based on the first and latest changeset:

2016months

However, this time I would like to combine the current study with some additional research. I tried to identify three different OSM contributor groups, based on the hashtag in a contributor’s comment or the utilized editor, for the following analysis:

  1. Contributors of the MissingMaps-Project: A contributors of the project usually use #missingmaps in their changeset.
  2. Contributors that utilized the Maps.Me app: The ‘created_by’-tag contains ‘MAPS.ME’.
  3. All other ‘regular’ contributors of the OSM project, who don’t have any #missingmaps in their changesets and neither used the maps.me editor.

In the past 12 months, almost 1.53 million members registered to the OSM project. So far, only 12% (181k) ever created at least one map edit: Almost 12,000 members created at least one changeset with the #missingmaps hashtag. Over 70,000 used the maps.me editor and 99,000 mapped without #missingmaps and the maps.me editor. The following diagram shows the number of new OSM contributors per month for the three aforementioned groups.

2016permonth

The release of the maps.me app (more specifically the OSM editor functionality) clearly has an impact on the monthly number of new mappers. Time for a more detailed analysis about the contributions and mapping times: The majority of the members of the groups don’t show more than two mapping days (What is a mapping day, you ask? Well, my definition would be: A mapping day is day, where a contributor created at least one changeset). Only around 6% of the newly active members are contributing for more than 7 days.

2016mappingdays

Some members of the #missingmaps group also contributed some changesets without the hashtag. But many of those members (70%) only contributed #missingmaps changesets. Furthermore, 95% of this adjusted group doesn’t map for more than two days. Anyway, despite identifying three different contributor groups, the results are looking somewhat similar. Let’s have a look at the number of map changes. The relative comparison shows that the smaller #missingmaps group produces a large number of edits. The maps.me group only generates small numbers of map changes to the project’s database.

2016mapchanges

Lastly, I conducted an analysis for three selected tag-keys: building, highway and name. The comparison shows that the #missingmaps group generates a larger number of building and highway features. In contrast “regular” OSM’ers and maps.me users contributed more primary keys such as the name- or amenity-tag.

2016tags

I think the diagrams in this blog post are quite interesting because they show that the #missingmaps mapathons can activate members that contribute many map objects. But they also indicate that the majority of these elements are traced from satellite imagery without primary attributes. In contrast the maps.me editor functionality proofed to be successful with its in-app integration and its easy usability, which resulted in a huge number of new contributors. In summary, I think it would be good to motivate contributors not only to participate in humanitarian mapathons but also to map their neighborhood in an attempt to stick to the project. Also, I guess it would be great if the maps.me editor would work on the next steps in providing easy mapping functionality for its users (of course with some sort of validation to reduce questionable edits).

Thanks to maɪˈæmɪ Dennis.

Unmapped Places of OpenStreetMap – 2016

Back in 2010 & 2011 I conducted several studies to detect underrepresented regions a.k.a. “unmapped” places in OpenStreetMap (OSM). More than five years later, some people asked if I could rerun the analysis. Based on the latest OSM planet dump file and Taginfo, almost 1 million places have been tagged as villages. Furthermore, around 59 million streets have a residential, unclassified or service highway value. My algorithm to find unmapped places, works as follows:

  1. Use every place node of the OSM dataset which has a village-tag (place=village).
  2. Search in a radius of ca. 700 m for a street with one of the following highway-values: residential, unclassified or service.
  3. If no street can be found, mark the place as “unmapped”!

My results for the entire OSM planet can be found under the following webpage.

unmapped

Overall we have more than 440,000 unmapped places in OSM. As you can see in the picture above, most of the places are around Central Africa, Saudi Arabia or China. However, I hope that this analysis helps to complete some of the missing areas or to revise some incorrect map data. Some remarks about “false=positives” or why your village is marked as unmapped? Some possible reasons: Is the used tag for your place correct? Compare the wiki page for further information. Sometimes “hamlet” could be the correct tag value. Are the nearby highways tagged correctly? (OSM wiki)

Amount of unmapped places for each continent:

  • Africa 119,084
  • Asia 241,833
  • Australia 212
  • Europe 44,819
  • North America 16,464
  • Oceania 837
  • South America 15,576

Technical Stuff: The OSM data for the analysis is prepared by a custom OSM PBF reader. The webpage, which shows the results, is based on Leaflet 1.0.0-rc1 and the really fast PruneCluster plugin.

*Update*: You’ll find the date of the latest data update in the header -> “(Date: Apr. 9th, 2018)”

Thanks to maɪˈæmɪ Dennis.

Verified OpenStreetMap contributor profiles?

The reputation of a contributor in OpenStreetMap (OSM) plays a significant role, especially when considering the quality assessment of the collected data. Sometimes it’s difficult to make a meaningful statement about a contributor by simply looking at the raw mapping work represented by the number of created objects or used tags. Therefore, it would be really helpful if we would have some additional information about the person who contributes to the project. For example: Does she/he help other contributors? Is her/his work somehow documented or based on one of the “discussed” proposals? Or does she/he work as a lone warrior in the OSM world?

In 2010 I created “How did you contribute to OpenStreetMap?” (HDYC) as a kind of fun side project. Nowadays many people use it to get some detailed information about OSM contributors. Some of you are probably familiar with the “verified” icon used on some celebrity Twitter accounts. I created a similar new feature for the aforementioned HDYC page. If you connect your related OSM accounts, your profile will be marked as “verified”.

verified

What do you have to do to get a verified contributor profile, you ask? First of all, you have to create at least 100 OSM changesets. Secondly, you need a login (username) for the OSM Help Forum, for the common OSM Forum and for the OSM Wiki. Last but not least, you have to list your OSM related accounts on your OSM profile page. After that, you should be able to see your accounts in your HDYC profile and your account will be automatically marked as verified.

Malenki already mentioned his usernames as an example in his profile. He also described it in a tiny OSM diary. Overall this feature is optional. So if you don’t want to “connect” or show your accounts for privacy protection, please don’t mention them on your OSM profile. My script checks the OSM profiles of the latest active OSM contributors every 24h. That’s it.

The HDYC profile now also shows the number of your changeset discussions and, if mentioned in your OSM profile, the page shows your Mapillary account as well.

Notice: If someone is trying to cheat with other people’s accounts, I will blacklist her/his username.

Thanks to maɪˈæmɪ Dennis.

Good #Hashtags in OpenStreetMap Changesets

#Hashtags are commonly used on Twitter to find content for a specific topic. Also in the OpenStreetMap (OSM) universe they are popular and utilized to mark changesets, which have been contributed during a special event, such as mapping parties or HOT tasks. However, in most cases, they are added in the changeset comment section. Back in November, 2015, several people discussed the pros and cons about (only) this approach. You can find a general overview of good changeset comments here. The aforementioned wiki page also shows why it is important to write a “concise and adequate“ description of the edit. Anyway, I also support the opinion that we should not generalize this statement and only add hashtags in our changeset comments. I prefer the different approach in which the contributor adds an extra changeset tag for the hashtag(s). For example, the widely used JOSM editor allows optional tags (as you can see here). On the other hand, the iD editor, which is used in many cases by new contributors, doesn’t offer this feature. However, I am sure that with some minor changes this could be fixed. A more or less complete set of recommended or mandatory changeset tags can be found here.

As a first step, I optimized my webpage to find and visualize OSM changesets with a specific comment (blog post). You can now search for any term in any tag value of all OSM changesets. So far the search only considers the changeset comments. This means that you can also search for other values such as the editor that was used or the source (imagery).

For example, you can now create interesting statistics, such as a comparison of editors used in OSM. Have a look at the kind of created objects, amount of map changes or countries …

JOSM/1.5

JOSM/1.5

iD 1.9.2

iD 1.9.2

Thanks to maɪˈæmɪ Dennis.

How to detect suspicious OpenStreetMap Changesets with incorrect edits?

Since its rise in popularity, the well-known online encyclopedia Wikipedia has been struggling with manipulation or, in the worst-case, vandalism attempts. Similarly, the OpenStreetMap (OSM) project suffered several times over the past few years of cases where incorrect map data edits were made. These erroneous edits can stem at times from (new) contributors or illegal data imports (or automated edits) which have not been discussed in advance with the community or the Data Working Group (DWG) and corrupted existing project data. The current OSM wiki page gives a great overview about general guidelines and e.g. types of vandalism. Another page in the wiki also mentions a prototype of a rule based system for the automatic detection of vandalism in OSM, which I developed in 2012. However, the system has never actually been implemented. Today, the contributors of OSM can use a variety of different tools to inspect an area or particular map changes. A few of them are listed below (complete list can be found here):

Based on the database which I use for multiple other services, I created an easy to use webpage to find suspicious OSM changesets with possibly incorrect map edits. The webpage offers some filter options such as the boundary of a country or the object change of interest. In contrast to the other aforementioned webpages you can also filter changesets based on the active “mapping days” of the contributor. A “mapping day” is a day on which the contributor created at least one changeset, independent from the registration date. I am also planning on adding additional user reputation information such as used editors or tagging behavior. And of course I am going to add some RSS feeds in the next version. The first version can be found here.

OSMSuspicious

What makes all of this different from other tools? Well, I think one of the major advantages is the simplicity of the webpage and that you can filter changesets based on the contributor activity and/or the changeset edits. In contrast to other tools, you can find changesets not only based on your area of interest, but also based on potential beginner mistakes and hopefully not vandalism attempts or fictional/ none existing map data.

Find Suspicious OSM Changesets here: http://resultmaps.neis-one.org/osm-suspicious

Thanks to maɪˈæmɪ Dennis.

OpenStreetMap Crowd Report – Season 2015

Almost one year has passed again. This means it’s time for the fourth OpenStreetMap (OSM) member activity analysis. The previous editions are online here: 2014, 2013 and 2012. Simon Poole already posted some interesting stats about the past few years. You can find all his results on the OSM wiki page. However, similar to last year, I try to dig a little deeper in some aspects.

Overall the OSM project has officially more than 2.2 million registered members (Aug, 9th 2015). For several of my OSM related webpages I create a personal OSM contributor database, based on the official OSM API v0.6. Anyway, when using this API, the final table will show a list with more than 3 million individual OSM accounts (Aug, 9th 2015). I’m not sure what the cause for this gap of almost 1 million members between the official number and the member number extracted with the API could be. Maybe some of you have a possible explanation? However, I think many accounts are created by spammers or bots.

The following chart shows a trend similar to the one of previous years: The project attracts a large number of newly registered members, but the sum of contributors that actively work on the project is fairly small. As mentioned in earlier posts, this phenomenon is nothing special for an online community project and has been analyzed for previous years already.

2015OSMMembers

Described in numbers (July 31st, 2015):

  • Registered OSM Members (OSM API): 3,032,954
  • Registered OSM Members (Official): 2,201,519
  • Members who created 1 Changeset: 562,670
  • Members who performed >= 10 Edits: 343,523
  • Members who created >=10 Changesets: 137,591

Personally, I really like the following diagram: It shows the increase in monthly contributor numbers over the past few years and their consistencies in collecting OSM data based on the first and latest contributed changeset of an OSM member. It’s great to see that at least some experienced mappers are still contributing to the project after more than five years.

2015OSMMembersSince

Some background information on how I created the stats: To retrieve the registration date of the members, I used the aforementioned OSM API. The other numbers are based on the OSM changeset dump, which is available for download here.

Next to the presented results above, you can find some daily updated statistics about the OSM project on OSMstats.

Thanks to maɪˈæmɪ Dennis.

Counting changes per Country – A different approach

OSMstats contains several statistics about the OpenStreetMap (OSM) project, such as daily-created objects, the amount of active contributors or detailed numbers for individual countries. One way to determine the sum of created or modified Node objects, is to use the minutely, hourly or daily OSM replication change files and counting the values for each country of the world. Sadly, this approach has some drawbacks. Firstly, the official files do not contain, for example, all Nodes of a modified way, which is required, when trying to find the country where the change took place. Furthermore, the determination of the country for a specific OSM object really depends on the border’s level of detail: More detailed country borders make the processing quite time-consuming. Some of you probably experienced this problem before when using Osmosis or a different OSM processing tool. Anyway, for calculating additional country statistics I tried a new approach:

  1. Determine the country of a changset based on its center position
  2. Use the changeset country information for all objects within this changeset.

map

Of course, the determined country of the changeset can “only” be generalized for the entire changeset content, but how does it compare with the current method utilized in OSMstats? I compared last week’s numbers of OSMstats for each country of the world with the newly introduced approach. In total, the number of active members per country differs for each weekday by around 3% (min. 1% and max. 5%). The average difference of created, modified and deleted Nodes per country is quite similar with 4% (min. 2% and max. 9%). The presented approach could produce partially incorrect results whenever a changeset contains border changes of two or more countries or if the center of the changeset is in the wrong country. But IMHO the assumption to use the changeset centers is sufficient to calculate results and determine changes per country. As you can see in the figure above, most OSM changesets happen in a manageable area within one country. Yes I know, exceptions prove the rule.

So, why am I doing this? The main idea behind this approach is to change the entire processing task for OSMstats within the coming weeks. The changes per country will then be based on the introduced approach. Another advantage will be, that this newly created information, gathered from the changesets, can be utilized to create additional contributor statistics.

Thanks to maɪˈæmɪ Dennis.

489 Pages about OpenStreetMap

The first book about the OpenStreetMap (OSM) project was written by Frederik Ramm and Jochen Topf, two well-known OSM enthusiasts, in 2008. The first version was in German which was later translated into an improved English version. It contains similar information as can be found in the book by Jonathan Bennett, which was published in 2010, detailing how the projects’ geodata is collected, which editors can be used, some explanations about tags, key and values and how the rendering stack works. Both books are great resources to learn about the OSM basics and to get an overview about useful software.

However, besides these more technical books, the research community has been very active in recent years and has published several articles about OSM data quality, conflation attempts with other datasets or about the contributors of the project. Each of us (Dennis Zielstra and I) wrote a dissertation with different aspects about crowd-sourced geodata and the OSM project: Dennis’ work is about OSM data quality in comparison to proprietary and governmental data with emphasis on pedestrian shortest path routing and data imports. Pascal’s work tackled the issue of how user-generated geodata can be utilized for disabled people friendly route planning. Both dissertations contain more than 13 publications in total.

Now the important part for you: Both dissertations are now freely available. You can download Dennis’ work here and Pascal’s thesis hereCombined more than “480” pages about the OpenStreetMap project!

What can you expect from our dissertations? Our work had to be more science oriented (after all they had to fulfill the strict guidelines our universities gave us to get the PhD). This means it contains a bunch of information that can be useful to other researchers; for example, methods to analyze geodata quality or an introduction on parameters that are important for disabled people in a road network. However, we always tried to make the results and findings always as understandable to the general public as possible. We always felt that VGI research about an open source project such as OSM should not only generate results that are so convoluted that only a hand-full of researchers worldwide would understand the concepts in the end. Any OSM contributor should have a benefit from the findings that are published in those dissertations and we hope we accomplished this goal. We also wished we could have published each publication in open source journals to make the results freely available to everyone but this is a whole new topic for a different blog post. Anyway, by providing the dissertations for free we basically accomplished this task now too.

And what can you not you expect from our work? We do not describe how your object of interest should be tagged or how you should run a mapping event. We feel there are already enough sources out there that tackle these issues.

Anyway, we bet that you will find some information in the dissertations about the OSM project which you have not heard about yet, such as the evolution of the German or the United States OSM street network, analyses about data imports or several research projects about contributor behavior, vandalism detection and a quite comprehensive overview about recent developments and future trends in VGI research in general.

Let us know what you think and enjoy the information overload 🙂

Dennis & Pascal (@pascal_n)