Tag: Planet

New metric for measuring the “qualitative nature” of OpenStreetMap activities @ How did you contribute ?

Back in June we had a twitter chat about potential new features for the “How did you contribute to OpenStreetMap” (HDYC) website. One suggestion was to “show more relevant information about skills, tagging system or the quality of contributions” of a project member (by J-Louis). Overall I really like the following summary by Claudius: “HDYC started off with a strong focus on quantitative metrics and you expanded it lately a lot to reflect the qualitative nature of contributions. I think there’s value to show more about which area of data someone contributed: Auto/bike/railway/water infrastructure, amenities…”.

So I finally started searching in the OpenStreetMap (OSM) wiki for any feasible information about “groups of tags” or “tag categories”. Altogether, I couldn’t discover any solution that fits perfectly to determine the areas of data a mapper contributed in. However, later I got a hint from the JOSM developers to use the presets of the well-known and popular editor. You may ask, ‘What are presets?’ “Presets in JOSM are menu-driven shortcuts to tag common object types in OpenStreetMap. They provide you with a user friendly interface to edit one or more objects at a time, suggest additional keys and values you may wish to add to those objects, and most importantly, prevent you from having to enter keys and values by hand.” You can find many different presets at the aforementioned JOSM page. However, during my data processing I utilized the “default presets”. The XML file contains many combinations of popular or established tag combinations, which contributors use when they are mapping.

So far so good, as a first step I released a new version of “Find Suspicious OpenStreetMap Changesets“. It shows the utilized presets for each changeset. This can already indicate some quality aspects such as attribute (tag) accuracy or completeness. Now, after some weeks and some minor adjustments, I started to use this collected information about applied presets to expand the metrics of a mapper’s profile. The HDYC-page now also lists which presets the mapper recently utilized during her/his contributions such as adding, modifying or removing map elements. I think this is a really useful next step towards an even more required aspect of quality assurance that we highly need with the OSM project.

Some technical details: The database behind the “Find Suspicious OpenStreetMap Changesets” webpage uses the augmented diff files of the Overpass-API. The utilized “default” preset list of the JOSM editor can be found here (Internal Preset list). The entire processing tool was developed with JAVA and uses a Postgres database to store the results. By now, only recently utilized presets of the past 60 days of the contributor’s activity are utilized and presented.

However, thank you very much for all your feedback. Hope that it helps.

Thanks to maɪˈæmɪ Dennis.

Additional insights about OSM changeset discussions: Who requests, receives and responds?

Last year I wrote two blog posts about the OpenStreetMap (OSM) feature that allows commenting on contributor map changes within a changeset. The first blog post showed some general descriptive statistics about the number of created changeset discussions, affected countries, the origin of the commenting contributors or their mapping reputation. The second post described a newly introduced feature, where contributors can flag their changeset so that their map edits can be reviewed. This blog post will follow up on this topic and conducts some similar but updated research.

The first chart shows the number of created comments (discussed changesets) and the contributors involved over the last 15 months. The number of created comments and discussed changesets fluctuates over time, whereas the number of contributors who take part in changeset discussions stays consistent at around 1,500 per month. Around 3,200 contributors received a comment on at least one changeset’s map edits a month.

After publishing the aforementioned blog post, people were asking for some numbers that show the commented changeset grouped by the editing application that was utilized. The results show that these numbers stayed more or less the same with 2/3 of all commented changesets (almost 160,000) being edited by the iD editor. This is not very surprising since this particular editor is used by many OSM beginners during first edits. It’s also interesting to see whether the changeset author responded (also grouped by the OSM editor that was used). Overall only around 32,000 contributors responded to their changeset comment. You can find some additional charts about the comments per discussed changeset in the previous blog post. Again, the majority (around 71%) of the changeset discussions contain one comment only.

Since last August, contributors can mark their changeset with a flag for “review_requested”. After a few months now I think it’s time for a first look at the numbers. The following charts display the number of requested reviews by contributors and their marked changesets. First of all, almost each month around 7,000 contributors asked for one review minimum. Overall almost 36,000 changesets have been marked for review each month. If we take a close look and filter changesets by hashtags, we can see that sometimes large numbers of the changesets are contributed by #HOTOSM or #MissingMaps members.

The following diagram shows probably the most disappointing results: The number of requested reviews that actually have been reviewed in the end. No matter if the changeset has the #HOTOSM or #MissingMaps tags or not, the relative value of reviewed changesets lies only between 6 and 18%. To be honest, I’m also a bit surprised that only a few of #HOTOSM or #MissingMaps changesets have been reviewed so far.

So, what do you think? Do you review contributions without commenting on the changesets? Do we need more attention here or is it just boring to look after changesets which are marked for review? I think it’s obvious, that we need more contributors who review map changes or least “documenting” their work. But can we handle this? Or do we need better tools?

Thanks to maɪˈæmɪ Dennis.

Adding Indicators to OSM Map Edits Assessment

Almost two years ago I published a web service that finds suspicious OpenStreetMap (OSM) map changes. You can use the service here and find some more information in previous blog posts. Especially Changeset discussions revealed that they are more or less de facto standard for communication between contributors during map change reviews.

However, when I am inspecting map changes, I sometimes see new contributors using uncommon OSM tags. Therefore I think it could be useful to add an additional assessment parameter to the aforementioned suspicious OSM map changes page. The newly introduced indicator states the matching ratio between the contributed and the most popular OSM tags. This means, if the changeset contributor used many uncommon tags at her/his map changes objects, the matching rate will be low. If the contributor applied many common (“popular”) tags, it results in a high matching rate towards 100%. For the calculation I used Jochen Topf’s taginfo API to get commonly used OSM tags. An API description can be found here. Furthermore I added the average age (in days) of modified and deleted objects. This indicator can be used to see if the contributor edited objects, which have been mapped today (0 days) or exist already for a longer period of time, e.g. 1566 days. The values for the average version numbers are computed in a similar fashion.

Last but not least, the number of the affected contributors of the changeset is calculated. If a contributor only changes objects on which she or he is the latest modifier, this number will be ‘0’. Otherwise the value represents the number of unique mappers whose contributions have been changed. I hope that overall the newly added indicators can be useful for identifying changesets which need a closer look. The suspicious OSM map changes website has also received some style updates. They should help to highlight the most important parameters. I also added the aggregation of the latest changesets for a specific contributor. Guess this could be really useful to see a “big picture” of the individual mapping activities.

The aforementioned service is online here –> “Find Suspicious OpenStreetMap Changesets

Thanks to maɪˈæmɪ Dennis.

Review requests of OpenStreetMap contributors
– How you can assist! –

The latest version of the OpenStreetMap editor iD has a new feature: “Allow user to request feedback when saving“. This idea has been mentioned in a diary post by Joost Schouppe about “Building local mapping communities” (at that time: “#pleasereview”) in 2016. The blog post also contains some other additional and good thoughts, definitely worth reading.

However, based on the newly implemented feature, any contributor can flag her/his changeset and ask for feedback. Now it’s your turn! How can you find and support those OSM’ers?

  • Step 1: Based on the “Find Suspicious OpenStreetMap Changesets” page you can search for flagged changesets, e.g. limited to your country only: Germany or UK.
  • Step 2: Leave a changeset comment where you e.g. welcome the contributor and (if necessary) give her/him some feedback about the map changes. You could also add some additional information, such as links to wiki pages of tags (map features), good mapping practices, the OSM forum, OSM help or mailing lists. Based on the changeset comment other contributors can see that the original contributor of this changeset already has been provided with some feedback.
  • Step 3: Finally you could create & save a feed URL of your changeset’s search. That’s it.

Personally, I really like this new feature. It provides an easy way to search for contributors who are asking for feedback about their map edits. Thanks to all iD developer’s for implementing this idea. What do you think? Should I add an extra score to “How did you contribute to OpenStreetMap” where every answer to a requested feedback changeset will be counted?

Some statistics? There you go: “OSM Changesets of the last 30 Days

Thanks to maɪˈæmɪ Dennis.

Who is commenting?
An Overview about OSM Changeset Discussions

As mentioned in my previous blog post about detecting vandalism in OpenStreetMap (OSM) edits, it’s highly recommended that contributors use public changeset discussions when contacting other mappers regarding their edits. This feature was introduced at the end of 2014 and is used widely by contributors today. Each and every comment is listed publicly and every contributor can read the communication and, if necessary, add further comments or thoughts. In most cases where questions about a specific map edit come up, it is desirable that contributors take this route of communication instead of private messaging each other.

For my presentation at the German FOSSGIS & OpenStreetMap conference I created several statistics about the aforementioned changeset discussion feature. For this blog post I reran all analyses and created some new charts and statistics. Let’s start with the first image (above): It shows the number of commented or discussed changesets per month since its introduction. The peak in January, 2017 is based on a revert with several thousands of changesets.

In total, more than 92,000 changesets have been discussed in the past few years with around 151,000 comments. All comments were created by almost 14,000 different contributors. So far most changesets were commented in Germany, the United States, Russia and the UK, as you can see in the following images. This correlates to some extent, with the exception of Kazakstan, with the number of active contributors for each country (see e.g. OSMstats for active contributors). As shown on the right side, many changesets (71%) only received one comment or discussion. This means, in most cases the commented changeset did not receive a response by the owner/contributor of the changeset.

Which changesets are discussed and who creates comments? I think it’s not surprising to see that most changesets by new contributors receive a comment. However, as the following charts show, there are also changesets by long-time contributors that have some discussions. It’s also quite interesting to see that all kinds of contributors (new and long-time) created discussions. I would have expected a trend towards contributors with a higher number of mapping days.

What is the origin of the contributor who created the comment? Again, not surprisingly, this correlates with the number of active OSM contributors per country as mentioned above. The contributors’ origin is determined by his/her main activity areas which you can find/see on “How did you contribute to OpenStreetMap?“.

Some additional numbers about the text content of the changeset discussions: Roughly 22% of the changeset comments contain the word “revert”. On the other side, more than 17% include some sort of “Welcome”, “Willkommen”, “Hello”, “Ciao”, “Hola”, “Bonjour”, “nǐ hǎo!” or “привет!” text. The following image shows a word cloud of the most used words in the changeset discussions:

The last chart shows the accumulative changeset discussion contributors and comments. Almost 63% of all discussion comments have been created by around 2% of the contributors. However, I assume this looks very similar to other long tails of OpenStreetMap contribution charts. What do you think?

Want to see the latest OSM discussions in your area or country? Check this webpage.

Thanks to maɪˈæmɪ Dennis.

Detecting vandalism in OpenStreetMap – A case study

This blog post is a summary of my talk at the FOSSGIS & OpenStreetMap conference 2017 (german slides). I guess some of the content might be feasible for a research article, however, here we go:

Vandalism is (still) an omnipresent issue for any kind of open data project. Over the past few years the OpenStreetMap (OSM) project data has been implemented in a number of applications. In my opinion, this is one of the most important reasons why we have to bring our quality assurance to the next level. Do we really have a vandalism issue after all? Yes, we do. But first we should take a closer look at the different vandalism types.

It is important to distinguish between different vandalism types. Not each and every unusual map edit should be considered as vandalism. Based on the OSM wiki page, I created the following breakdown. Generally speaking, vandalism can occur intentionally and unintentionally. Therefore we should distinguish between vandalism and bad-map-editing-behavior. Oftentimes new contributors make mistakes which are not vandalism because they do not have the expert mapper knowledge. In my opinion, only intentional map edits such as mass-deletions or “graffiti” are real cases of vandalism.

To get an impression of the state of vandalism in the OSM project, I conducted a case study for a four week timeframe (between January 5th and February 12th, 2017). During my study I analyzed OSM edits, which mostly deleted objects from new contributors who created fictitious data or changesets for the Pokemon game. If you did not hear or read about OSM’s Pokemon phenomena, you can read more about it here. The OSM wiki page for quality assurance lists some tools that can be used for vandalism detection. However, for this study I applied my own developed OSM suspicious webpage and the quite useful augmented OSM change viewer (Achavi). Furthermore, a webpage that lists the newest OSM contributors may also be of interest to you.

So what can you do when you find a strange map edit that could be a vandalism case? The OSM help page contains an answer for that. First of all: Keep calm! Use changeset comments and try to ask in a friendly manner for the suspicious mapping reasons.

Results of the study: Overall I commented 283 Changesets in the aforementioned timeframe of four weeks. Unfortunately I did not count the number of analyzed changesets, but I assume that it should be around 1,200 (+- 200). The following chart shows the commented changesets per day. The weekends tend to have a larger number of commented/discussed changesets.

As mentioned in the introduction of the vandalism types, we should distinguish between different vandalism types. The following image shows the numbers for each category. In my prototype study, 45% of the commented changesets were vandalism related and 24% have already been reverted which was not documented in the discussion of the changeset. Sometimes I also found imported test- and fictitious data, which the initial contributor of the changeset didn’t revert. It should be clarified to everyone that the live-database should never be used for testing purposes. Interested developers can use the test API and a test database (see sandbox for editing).

Responses and spatial distribution: Overall I received 70 responses for the discussed changesets, sadly only 20 from the owner/contributor of the changeset. But, more or less every response was in a friendly manner. Most often the contributors wrote “thank you” or “I didn’t know that my changes are going to be saved in the live database”. Furthermore, if I received a response, it was within 24 hours.

The following map contains some clustered markers. Each one highlights areas where the discussed changesets are located. As you can see on the map, the commented changesets are spread almost all over the world. In some areas they tend to correlate with the number of active OSM’ers. However, here is some additional information about three selected areas: 1: USA – Several cases of Pokemon Go related and fictitious map edits. 2: Japan/China – Some mass deletions and 3: South Africa – Oftentimes new MissingMaps or HOT contributors tend to delete and redraw more or less the same objects such as buildings. I guess it was not explained well enough to these editors that this destroys the object history? However, the article about “Good practice” in the OSM wiki is quite useful in this case.

Conclusion: The study reveals that there is an ongoing issue with vandalism in OSM’s map data. I think we do need to simplify the tools for detecting vandalism. In particular we should omit work where several users review identical suspicious map edits. Maybe the best possible solution should be a tool which is integrated directly in the OSM.org infrastructure. However, my presentation also contained some statistics and charts about the OSM changeset discussions feature. This will be the content of a separate blog post in following weeks. Also, the prototype introduced at the end of my talk will (hopefully) be presented in the next few months.

Thanks to maɪˈæmɪ Dennis.

Reviewing OpenStreetMap contributions 1.0 – Managed by changeset comments and discussions?

The OSM project still records around 650 new contributors each day (out of almost 5,000 registered members per day). Some countries (such as Belgium or Spain) already provide platforms to coordinate the introduction to OSM for new mappers. Others use special scripts or intense manual work to send the newly registered contributors mails with useful information (Washington or The Netherland). However, oftentimes new contributors make, as expected, beginner-mistakes. Personally, I often detect unconnected ways, wrong tags or rare fictive data. Unfortunately, sometimes (new) members also delete, intentionally or unintentionally, existing map data.

At the end of 2014, many people were anticipating the newly introduced changeset discussions feature. A few months later, I developed a page that finds the latest discussions around the world or in your country. By now, many OSM members use changeset discussions for commenting or questioning map edits of other members.

main

However, one year ago, almost to the day, I wrote a blog post about a webpage for detecting suspicious OSM edits. In the newly updated version, I would like to combine the aforementioned changeset discussions and comments about suspicious edits to communicate with members in a more direct way. The following image shows the revised webpage.

map

Furthermore you can request all changesets of a contributor, which have been commented on. The same page can also show all comments written by a selected contributor (with all comments of the particular changeset). I think the last both features are really helpful for keeping control over your own and other changeset discussions. This should also simplify the reviewing process of changesets and map edits.

overview

As mentioned at the beginning of this blog post, some OSM groups send a welcoming e-mail to new contributors. I also saw that some mappers are welcoming new members in Taiwan with a changeset comment and information on their first changeset. Pretty neat stuff if you ask me.

Latest OSM Changeset Discussions: http://resultmaps.neis-one.org/osm-discussions
Find Suspicious OSM Changesetshttp://resultmaps.neis-one.org/osm-suspicious

Thanks to maɪˈæmɪ Dennis.

A comparative study between different OpenStreetMap contributor groups – Outline 2016

Over the past few years I have written several blog posts about the (non-) activity of newly registered OpenStreetMap (OSM) members (2015, 2014, 2013). Similarly to the previous posts, the following image shows the gap between the number of registered and the number of active OSM members. Although the project still shows millions of new registrations, “only” several hundred thousand of these registrants actually edited at least one object. Simon showed similar results in his yearly changeset studies.

2016members

The following image shows, that the project still has some loyal contributors. More specifically, it shows the increase in monthly active members over the past few years and their consistent data contributions based on the first and latest changeset:

2016months

However, this time I would like to combine the current study with some additional research. I tried to identify three different OSM contributor groups, based on the hashtag in a contributor’s comment or the utilized editor, for the following analysis:

  1. Contributors of the MissingMaps-Project: A contributors of the project usually use #missingmaps in their changeset.
  2. Contributors that utilized the Maps.Me app: The ‘created_by’-tag contains ‘MAPS.ME’.
  3. All other ‘regular’ contributors of the OSM project, who don’t have any #missingmaps in their changesets and neither used the maps.me editor.

In the past 12 months, almost 1.53 million members registered to the OSM project. So far, only 12% (181k) ever created at least one map edit: Almost 12,000 members created at least one changeset with the #missingmaps hashtag. Over 70,000 used the maps.me editor and 99,000 mapped without #missingmaps and the maps.me editor. The following diagram shows the number of new OSM contributors per month for the three aforementioned groups.

2016permonth

The release of the maps.me app (more specifically the OSM editor functionality) clearly has an impact on the monthly number of new mappers. Time for a more detailed analysis about the contributions and mapping times: The majority of the members of the groups don’t show more than two mapping days (What is a mapping day, you ask? Well, my definition would be: A mapping day is day, where a contributor created at least one changeset). Only around 6% of the newly active members are contributing for more than 7 days.

2016mappingdays

Some members of the #missingmaps group also contributed some changesets without the hashtag. But many of those members (70%) only contributed #missingmaps changesets. Furthermore, 95% of this adjusted group doesn’t map for more than two days. Anyway, despite identifying three different contributor groups, the results are looking somewhat similar. Let’s have a look at the number of map changes. The relative comparison shows that the smaller #missingmaps group produces a large number of edits. The maps.me group only generates small numbers of map changes to the project’s database.

2016mapchanges

Lastly, I conducted an analysis for three selected tag-keys: building, highway and name. The comparison shows that the #missingmaps group generates a larger number of building and highway features. In contrast “regular” OSM’ers and maps.me users contributed more primary keys such as the name- or amenity-tag.

2016tags

I think the diagrams in this blog post are quite interesting because they show that the #missingmaps mapathons can activate members that contribute many map objects. But they also indicate that the majority of these elements are traced from satellite imagery without primary attributes. In contrast the maps.me editor functionality proofed to be successful with its in-app integration and its easy usability, which resulted in a huge number of new contributors. In summary, I think it would be good to motivate contributors not only to participate in humanitarian mapathons but also to map their neighborhood in an attempt to stick to the project. Also, I guess it would be great if the maps.me editor would work on the next steps in providing easy mapping functionality for its users (of course with some sort of validation to reduce questionable edits).

Thanks to maɪˈæmɪ Dennis.

Unmapped Places of OpenStreetMap – 2016

Back in 2010 & 2011 I conducted several studies to detect underrepresented regions a.k.a. “unmapped” places in OpenStreetMap (OSM). More than five years later, some people asked if I could rerun the analysis. Based on the latest OSM planet dump file and Taginfo, almost 1 million places have been tagged as villages. Furthermore, around 59 million streets have a residential, unclassified or service highway value. My algorithm to find unmapped places, works as follows:

  1. Use every place node of the OSM dataset which has a village-tag (place=village).
  2. Search in a radius of ca. 700 m for a street with one of the following highway-values: residential, unclassified or service.
  3. If no street can be found, mark the place as “unmapped”!

My results for the entire OSM planet can be found under the following webpage.

unmapped

Overall we have more than 440,000 unmapped places in OSM. As you can see in the picture above, most of the places are around Central Africa, Saudi Arabia or China. However, I hope that this analysis helps to complete some of the missing areas or to revise some incorrect map data. Some remarks about “false=positives” or why your village is marked as unmapped? Some possible reasons: Is the used tag for your place correct? Compare the wiki page for further information. Sometimes “hamlet” could be the correct tag value. Are the nearby highways tagged correctly? (OSM wiki)

Amount of unmapped places for each continent:

  • Africa 119,084
  • Asia 241,833
  • Australia 212
  • Europe 44,819
  • North America 16,464
  • Oceania 837
  • South America 15,576

Technical Stuff: The OSM data for the analysis is prepared by a custom OSM PBF reader. The webpage, which shows the results, is based on Leaflet 1.0.0-rc1 and the really fast PruneCluster plugin.

*Update*: You’ll find the date of the latest data update in the header -> “(Date: Apr. 9th, 2018)”

Thanks to maɪˈæmɪ Dennis.

Verified OpenStreetMap contributor profiles?

The reputation of a contributor in OpenStreetMap (OSM) plays a significant role, especially when considering the quality assessment of the collected data. Sometimes it’s difficult to make a meaningful statement about a contributor by simply looking at the raw mapping work represented by the number of created objects or used tags. Therefore, it would be really helpful if we would have some additional information about the person who contributes to the project. For example: Does she/he help other contributors? Is her/his work somehow documented or based on one of the “discussed” proposals? Or does she/he work as a lone warrior in the OSM world?

In 2010 I created “How did you contribute to OpenStreetMap?” (HDYC) as a kind of fun side project. Nowadays many people use it to get some detailed information about OSM contributors. Some of you are probably familiar with the “verified” icon used on some celebrity Twitter accounts. I created a similar new feature for the aforementioned HDYC page. If you connect your related OSM accounts, your profile will be marked as “verified”.

verified

What do you have to do to get a verified contributor profile, you ask? First of all, you have to create at least 100 OSM changesets. Secondly, you need a login (username) for the OSM Help Forum, for the common OSM Forum and for the OSM Wiki. Last but not least, you have to list your OSM related accounts on your OSM profile page. After that, you should be able to see your accounts in your HDYC profile and your account will be automatically marked as verified.

Malenki already mentioned his usernames as an example in his profile. He also described it in a tiny OSM diary. Overall this feature is optional. So if you don’t want to “connect” or show your accounts for privacy protection, please don’t mention them on your OSM profile. My script checks the OSM profiles of the latest active OSM contributors every 24h. That’s it.

The HDYC profile now also shows the number of your changeset discussions and, if mentioned in your OSM profile, the page shows your Mapillary account as well.

Notice: If someone is trying to cheat with other people’s accounts, I will blacklist her/his username.

Thanks to maɪˈæmɪ Dennis.