dzejkop.space

Serde by Example 2: OpenStreetMap

serde serde-by-example rust intermediate

Finally! After a long wait, here's the second entry in the Serde by example series.

In the previous post, I showed you how to deserialize a JSON RPC v2 response. In this post I wanted to take a look at the response data returned by the OpenStreetMap API. Additionally apart from deserialization we'll also take a basic look at how to make these requests with the reqwest crate.

OpenStreetMap API

The version 0.6 of the api is available at the https://api.openstreetmap.org/ url and luckily, we don't need to authenticate if we're only reading data. OpenStreetMap (which I'll be referring to as OSM from now on) supports a JSON and an XML data format. In this article we'll only look at JSON - but I assume some of the knowledge gained can also be applied to the XML variant. In order to request the data in the JSON format we need to add a .json postfix to the request url path. Or Add a Accept: application/json header - we'll stick with the postfix.

OSM api supports the following actions regarding the map data:

  1. Retrieving map data by bounding box - GET /api/0.6/map
  2. Reading element data - GET /api/0.6/[node|way|relation]/#id
  3. Reading element history - GET /api/0.6/[node|way|relation]/#id/history
  4. Reading element version - GET /api/0.6/[node|way|relation]/#id/#version
  5. Fetching multiple elements - GET /api/0.6/[nodes|ways|relations]?#parameters
  6. Fetching relations for of a given element - GET /api/0.6/[node|way|relation]/#id/relations
  7. Fetching ways for a given node - GET /api/0.6/node/#id/ways
  8. Reading full data of an element - GET /api/0.6/[way|relation]/#id/full

We're only interested in the map data side of OSM not necessarily the history, versioning, etc. so we'll only be focusing on the endpoints numbered 1, 2, 5, 6 and 7. Luckily for us each of these endpoints returns the same data format.

We'll get to the structure of the response later, but what you need to know for now is that OSM deals with 3 element types:

  1. Node - a single element like a statue
  2. Way - a collection of nodes forming a way
  3. Relation - some relation of nodes, for example a number of ways can make up a city square

And the response of each endpoint simply returns a collection of these elements.

Fetching the data

Follow along guide

If you want to follow along on your own machine - follow these instructions to set up your environment.

I'm assuming you have a version of Rust installed and cargo available in your shell. If not refer to the Rust website for installation instructions.

If you're using the 1.62.0 or newer version of Rust you can use the cargo add commands to install the dependencies.

cargo add anyhow serde_json log
cargo add serde --features derive
cargo add tokio --features full
cargo add reqwest --features json

If not, you can use the cargo-edit crate, but the command syntax is slightly different.

cargo add anyhow serde_json log
cargo add serde +derive
cargo add tokio +full
cargo add reqwest +json

Otherwise just paste this into your Cargo.toml replacing the empty [dependencies] section.

[dependencies]
anyhow = "1.0.64"
log = "0.4.17"
reqwest = { version = "0.11.11", features = ["json"] }
serde = { version = "1.0.144", features = ["derive"] }
serde_json = "1.0.85"
tokio = { version = "1.21.0", features = ["full"] }

I'm going to be looking at a small rectangular area in the city center of Wrocław in Poland.

The bounding box we're interested in is given by the following coordinates:

min_longitude: 17.030873894691467
max_longitude: 17.03128159046173
min_latitude: 51.110227939761934
max_latitude: 51.110551258091014

We can fetch this data using the following code:

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let response = reqwest::get("https://api.openstreetmap.org/api/0.6/map.json?bbox=17.030873894691467,51.110227939761934,17.03128159046173,51.110551258091014").await?;

    let text = response.text().await?;

    println!("{text}");

    Ok(())
}

If you run and execute this you'll have a terminal full of clumped together text that's hard to read. We can make our lives easier by parsing the data into a generic JSON value and pretty print it.

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let response = reqwest::get("https://api.openstreetmap.org/api/0.6/map.json?bbox=17.030873894691467,51.110227939761934,17.03128159046173,51.110551258091014").await?;

    let json: serde_json::Value = response.json().await?;

    println!("{json:#}");

    Ok(())
}

Inspecting the data

So what do we get back? Well you're free to run this code and take a look yourself or if you, like me, can't be bothered, here's the raw json.

That's still quite a lot of data to take in all at once, instead let's boil it down to the basic components, first, ignoring the 'elements' field, we have the header:

{
  "attribution": "http://www.openstreetmap.org/copyright",
  "bounds": {
    "maxlat": 51.1105513,
    "maxlon": 17.0312816,
    "minlat": 51.1102279,
    "minlon": 17.0308739
  },
  "copyright": "OpenStreetMap and contributors",
  "generator": "CGImap 0.8.8 (3729426 spike-07.openstreetmap.org)",
  "license": "http://opendatacommons.org/licenses/odbl/1-0/",
  "version": "0.6"
}

Which contains the data we passed to it, as well as some copyright info, the version, the license, etc.

We're frankly not really interested in this. What we are interested in though are the elements.

Again, there are too many to just list them all here, so we'll take a look at a selection, let's start of with some nodes:

[
  {
    "changeset": 126142387,
    "id": 309274935,
    "lat": 51.1104557,
    "lon": 17.0312014,
    "tags": {
      "artwork_type": "statue",
      "brand:wikidata": "Q910242",
      "brand:wikipedia": "pl:Wrocławskie krasnale",
      "name": "Życzliwek",
      "network": "Wrocławskie krasnale",
      "source": "survey",
      "tourism": "artwork",
      "wheelchair": "yes"
    },
    "timestamp": "2022-09-13T16:31:04Z",
    "type": "node",
    "uid": 15262530,
    "user": "Pavol33",
    "version": 9
  },
  {
    "changeset": 96980809,
    "id": 2819200844,
    "lat": 51.1103102,
    "lon": 17.0310379,
    "timestamp": "2021-01-05T12:45:17Z",
    "type": "node",
    "uid": 8826419,
    "user": "Mordechai23",
    "version": 2
  },
  {
    "changeset": 96980809,
    "id": 2819200858,
    "lat": 51.1105131,
    "lon": 17.0310927,
    "timestamp": "2021-01-05T12:45:17Z",
    "type": "node",
    "uid": 8826419,
    "user": "Mordechai23",
    "version": 2
  }
]

you'll see that the nodes actually make up the majority of the response. We know that these are nodes because of the "type": "node" field. Every element will have a set of common attributes. These are, in no particular order:

  1. id - A 64-bit signed integer which identifies the given element. Negative values are used when creating new elements, but elements which are already created will always have a positive id. Notably it's possible for two elements to share the same id if they're of a different type, i.e. there can exist a node with id equal to 2819200844 and there can also exist a way with that same id.
  2. user - The display name of the user that last modified this object.
  3. uid - The id of said user.
  4. timestamp - Time (in a W3C timestamp format) of the last modification.
  5. visible - Not present in the response, but listed in the wiki. Would have a value of false if the element was deleted.
  6. version - The version of the object. All elements start with a version 1 and this value is increment with each change.
  7. changeset - The changeset id, that contained the latest change to this element. Changesets are out of scope of this article since I only want to describe reading data from OSM not writing.
  8. tags - All element types can have tags, which are a text-to-text mapping, used to describe certain features of a given element, e.g. in the data above we see the first element has a tag "artwork_type": "statue", signalling that it's a statue.

It's actually a small bronze-ish dwarf figurine, there's a ton of them all around the city. I'd argue with calling it a "statue", but I suppose it works for the lack of a better term

The remaining fields are unique to the Node element type, and they are the lat and lon - the latitude and longitude coordinates of the node, in degrees. They are decimal numbers with 7 decimal places. The latitude ranges from -90 to 90 degrees. The longitude -180 to 180.

Next up, we have some ways:

[
  {
    "changeset": 94262272,
    "id": 21587265,
    "nodes": [5505422911, 5505422888],
    "tags": {
      "highway": "footway",
      "lit": "yes",
      "surface": "paving_stones"
    },
    "timestamp": "2020-11-17T09:30:38Z",
    "type": "way",
    "uid": 2455523,
    "user": "StalkerOSM",
    "version": 10
  },
  {
    "changeset": 119788184,
    "id": 277426784,
    "nodes": [3851723164, 2819200858, 2819200861, 2819200844, 3851723164],
    "tags": {
      "amenity": "fountain",
      "image": "https://photos.app.goo.gl/QRNBitJYADTAzSLp7",
      "name": "Fontanna \"ZdrĂłj\"",
      "natural": "water",
      "operator": "ZDiUM Wrocław",
      "url": "https://polska-org.pl/510091,Wroclaw,Fontanna_Zdroj.html",
      "url:0": "https://wroclaw.fotopolska.eu/12146,obiekt.html"
    },
    "timestamp": "2022-04-16T15:17:20Z",
    "type": "way",
    "uid": 3476229,
    "user": "maro21",
    "version": 7
  }
]

Ways are collections of nodes that comprise, well... a way of sorts. The first way in the above data is a stone paved path. The second is a fountain. Or rather it describes the outline of the fountain.

You'll have noticed that ways share all the common fields and also contain a field nodes which is a list of nodes that comprise the given way.

And finally, the third element type is as relation

{
  "changeset": 97517773,
  "id": 21981,
  "members": [
    {
      "ref": 277426784,
      "role": "inner",
      "type": "way"
    },
    {
      "ref": 892961287,
      "role": "inner",
      "type": "way"
    },
    {
      "ref": 892961288,
      "role": "inner",
      "type": "way"
    },
    {
      "ref": 892961289,
      "role": "inner",
      "type": "way"
    },
    {
      "ref": 892961290,
      "role": "inner",
      "type": "way"
    },
    {
      "ref": 101129503,
      "role": "inner",
      "type": "way"
    },
    {
      "ref": 277426794,
      "role": "inner",
      "type": "way"
    },
    {
      "ref": 25708300,
      "role": "inner",
      "type": "way"
    },
    {
      "ref": 25708301,
      "role": "outer",
      "type": "way"
    },
    {
      "ref": 743442967,
      "role": "outer",
      "type": "way"
    }
  ],
  "tags": {
    "area": "yes",
    "bicycle:conditional": "yes @ (05:00-09:00)",
    "highway": "pedestrian",
    "lit": "yes",
    "name": "Rynek",
    "place": "square",
    "type": "multipolygon"
  },
  "timestamp": "2021-01-14T22:05:21Z",
  "type": "relation",
  "uid": 553541,
  "user": "B_KSL",
  "version": 18
}

There's only one relation within this bounding box and it's the one that describes the city square. Relations have a field members, each member is an object composed of the following fields:

  1. ref - An id of an element that's a part of this relation.
  2. role - An optional text field which describes the role of the element in the relation.
  3. type - The type of the given element - this is necessary since different element types don't share the same id space.

And that's all there is really to it, now we can get to what's interesting, which is how to structure our code around this data type. For clarity I'll be ignoring the "header" fields from the response and will instead focus on the actual OSM data.

Basic structure

At the very "top" of our data format we have a response struct, which contains a list of elements within the elements field.

#[derive(Debug, Clone, Serialize, Deserialize)]
struct Response {
  pub elements: Vec<Element>
}

An element is one of the three a Node, a Way or a Relation. Being Rust veterans that we are, we know to use enums in such situations

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Element {
  Node(Node),
  Way(Way),
  Relation(Relation)
}

Let's also describe all the structs

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Node {
  pub changeset: u64,
  pub id:        i64,
  pub lat:       f64,
  pub lon:       f64,
  pub timestamp: String,
  pub type:      String,
  pub uid:       u64,
  pub user:      String,
  pub version:   u64,
  pub tags:      HashMap<String, String>
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Way {
  pub changeset: u64,
  pub id:        i64,
  pub nodes:     Vec<i64>,
  pub timestamp: String,
  pub type:      String,
  pub uid:       u64,
  pub user:      String,
  pub version:   u64,
  pub tags:      HashMap<String, String>
}


#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Relation {
  pub changeset: u64,
  pub id:        i64,
  pub members:   Vec<Member>,
  pub timestamp: String,
  pub type:      String,
  pub uid:       u64,
  pub user:      String,
  pub version:   u64,
  pub tags:      HashMap<String, String>
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Member {
  pub ref: i64,
  pub role: String,
  pub type: MemberType,
}

pub enum MemberType {
  Node,
  Way,
  Relation,
}

You'll notice that the code above, doesn't actually compile for many reasons. One reason is that some of the fields are named type or ref which are reserved keywords in Rust. We could go ahead and rename the type field to something like kind and then use the #[serde(rename = "type")] annotation, but we could also notice, that we'll not need the type field at all. The information about the type of the element is already given to us via it's type and we only need the type field to discriminate between the enum variants during deserialization. So let's go ahead and remove that field from our structs and let's use the internally tagged enum representation for the Element enum.

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
#[serde(rename_all = "camelCase")]
pub enum Element {
  Node(Node),
  Way(Way),
  Relation(Relation)
}

We also had to rename all the variants to camelCase, otherwise the deserialization code would expect types like "Node" or "Way".

Before moving on to the modified structs, let's also change one thing. The structs share a lot of common fields. We could extract this information into a struct called ElementInfo and then use #[serde(flatten)] so that the fields get of the nested struct get treated as if they were members of the element struct.

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Node {
  #[serde(flatten)]
  pub info:  ElementInfo,
  pub lat:   f64,
  pub lon:   f64,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Way {
  #[serde(flatten)]
  pub info:   ElementInfo,
  pub nodes:  Vec<i64>,
}


#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Relation {
  #[serde(flatten)]
  pub info:     ElementInfo,
  pub members:  Vec<Member>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ElementInfo {
  pub changeset: u64,
  pub id:        i64,
  pub timestamp: String,
  pub uid:       u64,
  pub user:      String,
  pub version:   u64,
  pub tags:      HashMap<String, String>
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Member {
  pub ref: i64,
  pub role: String,
  pub type: MemberType,
}

pub enum MemberType {
  Node,
  Way,
  Relation,
}

That's much better! Another issue is that the Member struct has two fields which clash names with reserved keywords. And also the MemberType field does not have the correct case specified for deserialization. To fix the reserved keyword clashing we can either use raw identifiers or use the #[serde(rename)] tag again.

I'm going to rename the field type to kind and ref to reference, and use the rename tag.

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Member {
  #[serde(rename = "ref")]
  pub reference: i64,
  pub role: String,
  #[serde(rename = "type")]
  pub kind: MemberType,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub enum MemberType {
  Node,
  Way,
  Relation,
}

We're almost there!

Some final tweaks. First, an element can, but doesn't have to contain tags. Currently serde will generate code that expects at least an empty tags object on each element. In order to allow for no tags we have to mark it with #[serde(default)].

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ElementInfo {
  pub changeset: u64,
  pub id:        i64,
  pub timestamp: String,
  pub uid:       u64,
  pub user:      String,
  pub version:   u64,
  #[serde(default)]
  pub tags:      HashMap<String, String>
}

And in a simmilar manner the role on a member of a relation is optional, so let's change it to an Option.

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Member {
  #[serde(rename = "ref")]
  pub reference: i64,
  pub role: String,
  #[serde(rename = "type")]
  pub kind: MemberType,
}

And that's it! All that remains is to change our fetching code to deserialize the response into our type. Luckily that's as easy as changing the type.

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let response = reqwest::get("https://api.openstreetmap.org/api/0.6/map.json?bbox=17.030873894691467,51.110227939761934,17.03128159046173,51.110551258091014").await?;

    let json: Response = response.json().await?;

    println!("{json:#?}");

    Ok(())
}

This concludes the second entry in the Serde by example series. I hope you've found it educational or at least interesting. If you want to take a look at the full final code, then check out this link.