Counting changes per Country – A different approach
by Pascal Neis - Published: April 17th, 2015
OSMstats contains several statistics about the OpenStreetMap (OSM) project, such as daily-created objects, the amount of active contributors or detailed numbers for individual countries. One way to determine the sum of created or modified Node objects, is to use the minutely, hourly or daily OSM replication change files and counting the values for each country of the world. Sadly, this approach has some drawbacks. Firstly, the official files do not contain, for example, all Nodes of a modified way, which is required, when trying to find the country where the change took place. Furthermore, the determination of the country for a specific OSM object really depends on the border’s level of detail: More detailed country borders make the processing quite time-consuming. Some of you probably experienced this problem before when using Osmosis or a different OSM processing tool. Anyway, for calculating additional country statistics I tried a new approach:
- Determine the country of a changset based on its center position
- Use the changeset country information for all objects within this changeset.
Of course, the determined country of the changeset can “only” be generalized for the entire changeset content, but how does it compare with the current method utilized in OSMstats? I compared last week’s numbers of OSMstats for each country of the world with the newly introduced approach. In total, the number of active members per country differs for each weekday by around 3% (min. 1% and max. 5%). The average difference of created, modified and deleted Nodes per country is quite similar with 4% (min. 2% and max. 9%). The presented approach could produce partially incorrect results whenever a changeset contains border changes of two or more countries or if the center of the changeset is in the wrong country. But IMHO the assumption to use the changeset centers is sufficient to calculate results and determine changes per country. As you can see in the figure above, most OSM changesets happen in a manageable area within one country. Yes I know, exceptions prove the rule.
So, why am I doing this? The main idea behind this approach is to change the entire processing task for OSMstats within the coming weeks. The changes per country will then be based on the introduced approach. Another advantage will be, that this newly created information, gathered from the changesets, can be utilized to create additional contributor statistics.
Thanks to maɪˈæmɪ Dennis.