Tag: Quality

New metric for measuring the “qualitative nature” of OpenStreetMap activities @ How did you contribute ?

Back in June we had a twitter chat about potential new features for the “How did you contribute to OpenStreetMap” (HDYC) website. One suggestion was to “show more relevant information about skills, tagging system or the quality of contributions” of a project member (by J-Louis). Overall I really like the following summary by Claudius: “HDYC started off with a strong focus on quantitative metrics and you expanded it lately a lot to reflect the qualitative nature of contributions. I think there’s value to show more about which area of data someone contributed: Auto/bike/railway/water infrastructure, amenities…”.

So I finally started searching in the OpenStreetMap (OSM) wiki for any feasible information about “groups of tags” or “tag categories”. Altogether, I couldn’t discover any solution that fits perfectly to determine the areas of data a mapper contributed in. However, later I got a hint from the JOSM developers to use the presets of the well-known and popular editor. You may ask, ‘What are presets?’ “Presets in JOSM are menu-driven shortcuts to tag common object types in OpenStreetMap. They provide you with a user friendly interface to edit one or more objects at a time, suggest additional keys and values you may wish to add to those objects, and most importantly, prevent you from having to enter keys and values by hand.” You can find many different presets at the aforementioned JOSM page. However, during my data processing I utilized the “default presets”. The XML file contains many combinations of popular or established tag combinations, which contributors use when they are mapping.

So far so good, as a first step I released a new version of “Find Suspicious OpenStreetMap Changesets“. It shows the utilized presets for each changeset. This can already indicate some quality aspects such as attribute (tag) accuracy or completeness. Now, after some weeks and some minor adjustments, I started to use this collected information about applied presets to expand the metrics of a mapper’s profile. The HDYC-page now also lists which presets the mapper recently utilized during her/his contributions such as adding, modifying or removing map elements. I think this is a really useful next step towards an even more required aspect of quality assurance that we highly need with the OSM project.

Some technical details: The database behind the “Find Suspicious OpenStreetMap Changesets” webpage uses the augmented diff files of the Overpass-API. The utilized “default” preset list of the JOSM editor can be found here (Internal Preset list). The entire processing tool was developed with JAVA and uses a Postgres database to store the results. By now, only recently utilized presets of the past 60 days of the contributor’s activity are utilized and presented.

However, thank you very much for all your feedback. Hope that it helps.

Thanks to maɪˈæmɪ Dennis.

Detecting vandalism in OpenStreetMap – A case study

This blog post is a summary of my talk at the FOSSGIS & OpenStreetMap conference 2017 (german slides). I guess some of the content might be feasible for a research article, however, here we go:

Vandalism is (still) an omnipresent issue for any kind of open data project. Over the past few years the OpenStreetMap (OSM) project data has been implemented in a number of applications. In my opinion, this is one of the most important reasons why we have to bring our quality assurance to the next level. Do we really have a vandalism issue after all? Yes, we do. But first we should take a closer look at the different vandalism types.

It is important to distinguish between different vandalism types. Not each and every unusual map edit should be considered as vandalism. Based on the OSM wiki page, I created the following breakdown. Generally speaking, vandalism can occur intentionally and unintentionally. Therefore we should distinguish between vandalism and bad-map-editing-behavior. Oftentimes new contributors make mistakes which are not vandalism because they do not have the expert mapper knowledge. In my opinion, only intentional map edits such as mass-deletions or “graffiti” are real cases of vandalism.

To get an impression of the state of vandalism in the OSM project, I conducted a case study for a four week timeframe (between January 5th and February 12th, 2017). During my study I analyzed OSM edits, which mostly deleted objects from new contributors who created fictitious data or changesets for the Pokemon game. If you did not hear or read about OSM’s Pokemon phenomena, you can read more about it here. The OSM wiki page for quality assurance lists some tools that can be used for vandalism detection. However, for this study I applied my own developed OSM suspicious webpage and the quite useful augmented OSM change viewer (Achavi). Furthermore, a webpage that lists the newest OSM contributors may also be of interest to you.

So what can you do when you find a strange map edit that could be a vandalism case? The OSM help page contains an answer for that. First of all: Keep calm! Use changeset comments and try to ask in a friendly manner for the suspicious mapping reasons.

Results of the study: Overall I commented 283 Changesets in the aforementioned timeframe of four weeks. Unfortunately I did not count the number of analyzed changesets, but I assume that it should be around 1,200 (+- 200). The following chart shows the commented changesets per day. The weekends tend to have a larger number of commented/discussed changesets.

As mentioned in the introduction of the vandalism types, we should distinguish between different vandalism types. The following image shows the numbers for each category. In my prototype study, 45% of the commented changesets were vandalism related and 24% have already been reverted which was not documented in the discussion of the changeset. Sometimes I also found imported test- and fictitious data, which the initial contributor of the changeset didn’t revert. It should be clarified to everyone that the live-database should never be used for testing purposes. Interested developers can use the test API and a test database (see sandbox for editing).

Responses and spatial distribution: Overall I received 70 responses for the discussed changesets, sadly only 20 from the owner/contributor of the changeset. But, more or less every response was in a friendly manner. Most often the contributors wrote “thank you” or “I didn’t know that my changes are going to be saved in the live database”. Furthermore, if I received a response, it was within 24 hours.

The following map contains some clustered markers. Each one highlights areas where the discussed changesets are located. As you can see on the map, the commented changesets are spread almost all over the world. In some areas they tend to correlate with the number of active OSM’ers. However, here is some additional information about three selected areas: 1: USA – Several cases of Pokemon Go related and fictitious map edits. 2: Japan/China – Some mass deletions and 3: South Africa – Oftentimes new MissingMaps or HOT contributors tend to delete and redraw more or less the same objects such as buildings. I guess it was not explained well enough to these editors that this destroys the object history? However, the article about “Good practice” in the OSM wiki is quite useful in this case.

Conclusion: The study reveals that there is an ongoing issue with vandalism in OSM’s map data. I think we do need to simplify the tools for detecting vandalism. In particular we should omit work where several users review identical suspicious map edits. Maybe the best possible solution should be a tool which is integrated directly in the OSM.org infrastructure. However, my presentation also contained some statistics and charts about the OSM changeset discussions feature. This will be the content of a separate blog post in following weeks. Also, the prototype introduced at the end of my talk will (hopefully) be presented in the next few months.

Thanks to maɪˈæmɪ Dennis.

A comparative study between different OpenStreetMap contributor groups – Outline 2016

Over the past few years I have written several blog posts about the (non-) activity of newly registered OpenStreetMap (OSM) members (2015, 2014, 2013). Similarly to the previous posts, the following image shows the gap between the number of registered and the number of active OSM members. Although the project still shows millions of new registrations, “only” several hundred thousand of these registrants actually edited at least one object. Simon showed similar results in his yearly changeset studies.

2016members

The following image shows, that the project still has some loyal contributors. More specifically, it shows the increase in monthly active members over the past few years and their consistent data contributions based on the first and latest changeset:

2016months

However, this time I would like to combine the current study with some additional research. I tried to identify three different OSM contributor groups, based on the hashtag in a contributor’s comment or the utilized editor, for the following analysis:

  1. Contributors of the MissingMaps-Project: A contributors of the project usually use #missingmaps in their changeset.
  2. Contributors that utilized the Maps.Me app: The ‘created_by’-tag contains ‘MAPS.ME’.
  3. All other ‘regular’ contributors of the OSM project, who don’t have any #missingmaps in their changesets and neither used the maps.me editor.

In the past 12 months, almost 1.53 million members registered to the OSM project. So far, only 12% (181k) ever created at least one map edit: Almost 12,000 members created at least one changeset with the #missingmaps hashtag. Over 70,000 used the maps.me editor and 99,000 mapped without #missingmaps and the maps.me editor. The following diagram shows the number of new OSM contributors per month for the three aforementioned groups.

2016permonth

The release of the maps.me app (more specifically the OSM editor functionality) clearly has an impact on the monthly number of new mappers. Time for a more detailed analysis about the contributions and mapping times: The majority of the members of the groups don’t show more than two mapping days (What is a mapping day, you ask? Well, my definition would be: A mapping day is day, where a contributor created at least one changeset). Only around 6% of the newly active members are contributing for more than 7 days.

2016mappingdays

Some members of the #missingmaps group also contributed some changesets without the hashtag. But many of those members (70%) only contributed #missingmaps changesets. Furthermore, 95% of this adjusted group doesn’t map for more than two days. Anyway, despite identifying three different contributor groups, the results are looking somewhat similar. Let’s have a look at the number of map changes. The relative comparison shows that the smaller #missingmaps group produces a large number of edits. The maps.me group only generates small numbers of map changes to the project’s database.

2016mapchanges

Lastly, I conducted an analysis for three selected tag-keys: building, highway and name. The comparison shows that the #missingmaps group generates a larger number of building and highway features. In contrast “regular” OSM’ers and maps.me users contributed more primary keys such as the name- or amenity-tag.

2016tags

I think the diagrams in this blog post are quite interesting because they show that the #missingmaps mapathons can activate members that contribute many map objects. But they also indicate that the majority of these elements are traced from satellite imagery without primary attributes. In contrast the maps.me editor functionality proofed to be successful with its in-app integration and its easy usability, which resulted in a huge number of new contributors. In summary, I think it would be good to motivate contributors not only to participate in humanitarian mapathons but also to map their neighborhood in an attempt to stick to the project. Also, I guess it would be great if the maps.me editor would work on the next steps in providing easy mapping functionality for its users (of course with some sort of validation to reduce questionable edits).

Thanks to maɪˈæmɪ Dennis.

Unmapped Places of OpenStreetMap – 2016

Back in 2010 & 2011 I conducted several studies to detect underrepresented regions a.k.a. “unmapped” places in OpenStreetMap (OSM). More than five years later, some people asked if I could rerun the analysis. Based on the latest OSM planet dump file and Taginfo, almost 1 million places have been tagged as villages. Furthermore, around 59 million streets have a residential, unclassified or service highway value. My algorithm to find unmapped places, works as follows:

  1. Use every place node of the OSM dataset which has a village-tag (place=village).
  2. Search in a radius of ca. 700 m for a street with one of the following highway-values: residential, unclassified or service.
  3. If no street can be found, mark the place as “unmapped”!

My results for the entire OSM planet can be found under the following webpage.

unmapped

Overall we have more than 440,000 unmapped places in OSM. As you can see in the picture above, most of the places are around Central Africa, Saudi Arabia or China. However, I hope that this analysis helps to complete some of the missing areas or to revise some incorrect map data. Some remarks about “false=positives” or why your village is marked as unmapped? Some possible reasons: Is the used tag for your place correct? Compare the wiki page for further information. Sometimes “hamlet” could be the correct tag value. Are the nearby highways tagged correctly? (OSM wiki)

Amount of unmapped places for each continent:

  • Africa 119,084
  • Asia 241,833
  • Australia 212
  • Europe 44,819
  • North America 16,464
  • Oceania 837
  • South America 15,576

Technical Stuff: The OSM data for the analysis is prepared by a custom OSM PBF reader. The webpage, which shows the results, is based on Leaflet 1.0.0-rc1 and the really fast PruneCluster plugin.

*Update*: You’ll find the date of the latest data update in the header -> “(Date: Apr. 9th, 2018)”

Thanks to maɪˈæmɪ Dennis.

Verified OpenStreetMap contributor profiles?

The reputation of a contributor in OpenStreetMap (OSM) plays a significant role, especially when considering the quality assessment of the collected data. Sometimes it’s difficult to make a meaningful statement about a contributor by simply looking at the raw mapping work represented by the number of created objects or used tags. Therefore, it would be really helpful if we would have some additional information about the person who contributes to the project. For example: Does she/he help other contributors? Is her/his work somehow documented or based on one of the “discussed” proposals? Or does she/he work as a lone warrior in the OSM world?

In 2010 I created “How did you contribute to OpenStreetMap?” (HDYC) as a kind of fun side project. Nowadays many people use it to get some detailed information about OSM contributors. Some of you are probably familiar with the “verified” icon used on some celebrity Twitter accounts. I created a similar new feature for the aforementioned HDYC page. If you connect your related OSM accounts, your profile will be marked as “verified”.

verified

What do you have to do to get a verified contributor profile, you ask? First of all, you have to create at least 100 OSM changesets. Secondly, you need a login (username) for the OSM Help Forum, for the common OSM Forum and for the OSM Wiki. Last but not least, you have to list your OSM related accounts on your OSM profile page. After that, you should be able to see your accounts in your HDYC profile and your account will be automatically marked as verified.

Malenki already mentioned his usernames as an example in his profile. He also described it in a tiny OSM diary. Overall this feature is optional. So if you don’t want to “connect” or show your accounts for privacy protection, please don’t mention them on your OSM profile. My script checks the OSM profiles of the latest active OSM contributors every 24h. That’s it.

The HDYC profile now also shows the number of your changeset discussions and, if mentioned in your OSM profile, the page shows your Mapillary account as well.

Notice: If someone is trying to cheat with other people’s accounts, I will blacklist her/his username.

Thanks to maɪˈæmɪ Dennis.

How to detect suspicious OpenStreetMap Changesets with incorrect edits?

Since its rise in popularity, the well-known online encyclopedia Wikipedia has been struggling with manipulation or, in the worst-case, vandalism attempts. Similarly, the OpenStreetMap (OSM) project suffered several times over the past few years of cases where incorrect map data edits were made. These erroneous edits can stem at times from (new) contributors or illegal data imports (or automated edits) which have not been discussed in advance with the community or the Data Working Group (DWG) and corrupted existing project data. The current OSM wiki page gives a great overview about general guidelines and e.g. types of vandalism. Another page in the wiki also mentions a prototype of a rule based system for the automatic detection of vandalism in OSM, which I developed in 2012. However, the system has never actually been implemented. Today, the contributors of OSM can use a variety of different tools to inspect an area or particular map changes. A few of them are listed below (complete list can be found here):

Based on the database which I use for multiple other services, I created an easy to use webpage to find suspicious OSM changesets with possibly incorrect map edits. The webpage offers some filter options such as the boundary of a country or the object change of interest. In contrast to the other aforementioned webpages you can also filter changesets based on the active “mapping days” of the contributor. A “mapping day” is a day on which the contributor created at least one changeset, independent from the registration date. I am also planning on adding additional user reputation information such as used editors or tagging behavior. And of course I am going to add some RSS feeds in the next version. The first version can be found here.

OSMSuspicious

What makes all of this different from other tools? Well, I think one of the major advantages is the simplicity of the webpage and that you can filter changesets based on the contributor activity and/or the changeset edits. In contrast to other tools, you can find changesets not only based on your area of interest, but also based on potential beginner mistakes and hopefully not vandalism attempts or fictional/ none existing map data.

Find Suspicious OSM Changesets here: http://resultmaps.neis-one.org/osm-suspicious

Thanks to maɪˈæmɪ Dennis.

The State of the Map. United States. Street Network. 2013

Last year we wrote a journal paper in which we analyzed the OpenStreetMap (OSM) dataset of the United States which was published on May 28th, 2013 in the Transactions in GIS Journal. You can download a free pre-print version here. This paper has been published just on time to add to the discussion at the upcoming State of the Map United States conference which will take place in San Francisco and includes some presentations about data imports to OSM. Unfortunately, Dennis and I cannot attend the conference this year, so we decided to write a blog post with some additional and up-to-date numbers.

In January there was an announcement on the OSM mailing list that in the past few months many connectivity errors in the United States OSM dataset had been fixed. Probably a lot of these fixes can be attributed to Martijn’s Maproulette website or to Geofabrik’s OSM Inspector (OSMI) Routing View. However, a short discussion started on the mailing list about the total number of errors that are left and how long it would take to fix all those errors. Thus, we downloaded four OSM planet files dated Jan 4th 2012, June 13th 2012, Jan 2nd 2013 and Jun 2nd 2013 to get some new results. After cutting the United States dataset from the planet files, we used the same algorithm as utilized in OSMI’s Routing View, to receive some stats about the street network of the US datasets.

First of all the, the following image shows the number of errors for each dataset that we included in the analysis. The errors that were detected are separated into unconnected and duplicate ways. You can find some additional information about both error types here.

As you can see, the number of unconnected OSM ways has been rapidly reduced in the past 17 months from around 141,000 to 19,000. The number of “duplicate way” errors has been reduced from 17,500 to 11,500. You can find the exact numbers in the following table and an updated error layer on the mentioned OSMI website. In certain cases the duplicate way error created several errors for one and the same way. For these particular cases the number of unique OSM way IDs were counted.

Date – Unconnected Ways – Duplicate Ways

  • Jan 4th, 2012 – 141,578 – unique 17,563 (overall errors: 535,923)
  • June 13th, 2012 – 145,468 – unique 17,977 (overall errors: 518,536)
  • Jan 2nd, 2013 – 15,911 – unique 12,287 (overall errors: 257,388)
  • Jun 2nd, 2013 – 19,073 – unique 11,582 (overall errors: 220,451)

Overall the length of the US street network did not really change a lot. At the beginning of 2012 it was around 11.07 million km while in 2013 it is 11.1 million km, which means an increase of around 30,000 km. The following image shows the distribution of the US street network divided by different OSM road classes.

The length of the residential roads is still decreasing (-496,000 km), similar to what we saw during the analysis for our paper, while the length of the other road types (+276,000 km) and secondary/tertiary roads (+205,000 km) is increasing. This is the result of a massive retagging process of the imported TIGER/Line dataset in OSM. Dennis mentioned this already in his SotM US 2012 presentation. Motorways also experienced an increase of around +44,000 km in 2012. You will find some additional, quite interesting statistics, charts and of course maps in the aforementioned journal publication. In particular a few more thoughts and facts about the effect and impact of data imports on OSM can be found in our research study about the United States OSM dataset.

Updated Status for Unmapped Places

The last unmapped places analysis for OpenStreetMap that I conducted is nearly eight months ago. So I figured it was about time to create a new one. You can read in the last blog post how my algorithm exactly works.

However, at the moment (Nov. 4th. 2011) we have (according to the Geofabrik extract) about 597 000 entries in OSM for places that are located within “Europe“. This means we have an overall increase of about 90 000 places within the past eight months. We can separate them into several types with different values:

  • City: 1093 (as of March 11th, 2011 it was 1055 ; +3.6%)
  • Town: 16213 (as of March 11th, 2011 it was 16106 ; +0.7%)
  • Suburb: 29642 (as of March 11th, 2011 it was 24913 ; +19.0%)
  • Village: 301638 (as of March 11th, 2011 it was 278691 ; +8.2%)
  • Hamlet: 238717 (as of March 11th, 2011 it was 184326 ; 29.5%)
  • Isolated dwelling: 9064 (new in my stats)

The results showed that of the total 301638 village entries for Europe in the database, about 154445 (51%) (in March 2011 it was 55%) have not been detected or mapped yet. Further it is possible that the places are tagged incorrectly (e.g. villages vs. hamlet). Anyway, the following figures show the distribution of the values for each country (in different scales).

It is nice to see, that Austria (-688), Czech Republic (-633), France (-1978), Georgia (-721), Germany (-1192), Italy (-926), Poland (-2364), Spain (-1472) and the United Kingdom (-829) were able to reduce their “unmapped places” in a quite solid way. As usual you can find my results as a GPX-overlay here: http://resultmaps.neis-one.org

(Remarks for http://resultmaps.neis-one.org: Not each and every country is available as an overlay. Some countries such as France or Poland showed longer browser loading times to display the GPX-overlays!)

UPDATE: Download the complete GPX-files of this analysis here.

thx @ maɪˈæmɪ Dennis

Routing View Europe 2011-05

First of all, sorry that I did not create a new stat regarding the Routing View past month. To all the new readers: Usually I create an analysis about the Routing View of the OpenStreetMap Inspector for each month for Europe. You can find more information about the OSM Inspector (OSMI) here. The Routing View within the OSMI “shows problems in the data related to routing and navigation”. You can read more about it here … A direkt link to the OSMI Routing View is here!

However, here are the new stats for May, 2011: we have a total of about 124000 “Unconnected Roads” and about 108000 “Duplicate Ways” (number of duplicate segments). Overall this means that we have about 17000 *new* „Unconnected Roads” errors and only ca. 1300 “Duplicate Ways” have been fixed in Europe. For the past three months we have an increment of about 2850000 new OSM way segments for routing. (May 7th: 34500000, February 20th: 31700000, January 20th: 30600000)

In the following images you can see the amount of errors divided by country and the amount of errors in detail per country for “Europe”:

For this month only a few countries were able to reduce their errors. France (-2200) and Poland (-4800) are ahead of everyone else, so Poland this is your month 🙂 Here you can find the February stat of the OSMI Routing View. Hopefully this is going to be better in the next month :S …

thx @ maɪˈæmɪ Dennis 🙂

“Unmapped” Places in Europe?

Recently some new posts on the German OpenStreetMap-Mailing-List regarding the coverage of yet undetected regions in OpenStreetMap have been accumulating, caused by the current clearance of the BING-aerial images.

In one of my former blog posts that I wrote back in August this year, I introduced an analysis that included the search for “places” such as small villages etc. in Germany that probably had not been mapped in OSM at that time (the post in German language can be found here).

However, I repeated this analysis using the database of the routing view. This time I expanded the research area to entire Europe. In total there are 477591 places in Europe covered in OSM (at the moment). They can be separated into the following place-types:

  • city=1045
  • town=16032
  • suburb=23563
  • village=271147
  • hamlet=165804

During my analysis I *only* used those places that had a corresponding “village-tag”. For the case of Germany it can be assumed that places with “higher” place-type tags such as “town” or “city” have already been mapped. In the special case of the “hamlet”-tag there were too many “false=positives” included, thus they could not be considered during the analysis.

The results showed that of the total 271147 villages, about 156940 villages (58%) who are located within Europe, have not been detected or mapped yet (corresponding to the Geofabrik extract). The following diagram shows the distribution of the numbers by country.

The results can either be displayed as a GPX-overlay on an OSM-map which can be found http://resultmaps.neis-one.org or they can be downloaded as a *.zip file that includes the results for all countries that have been included in the analysis (see at the End!).

Remarks for http://resultmaps.neis-one.org :

  1. Not each and every country is available as an overlay.
  2. Some countries such as France, Poland and Ukraine showed longer browser loading times to display the GPX-overlays.

There is a possibility that some of the “places” have been mapped by now. Currently there is a lot of work being contributed to OSM with the help of the new BING aerial images!

thx @ VBA Dennis 😉

Download Unmapped Places GPX *.zip file: 20101205_results_unmapped_eu.zip