Serving live weather data

After having a chat with Mike Kittridge about our project that serves live weather data, I thought I'd write down some thoughts to keep progress going on this project. So here is a flow of thoughts on where we are at and where we want to go!

An interesting request

A few months ago, Justin approached me to request some help to make some of their live weather data available via a webpage. The whole concept was quite neat- they already had weather stations that was gathering weather data and would send short-burst data (sbd) via the iridium satellite network. They were currently receiving the data as attachements to emails, and were hoping to develop a more effective method of gathering and sharing the data for teaching but also as a public facing interface. I was fairly confident I could help- at least on the front end of things to display the data, however I would need to brush up on some skills to accept the data and make it available

Setting up the server

I had never hear of a sbd before. While reading up about it, I discovered it was possible to have the data send to a ip:port address, and I also found a rust crate that would do listen for the package for me (thanks for the excellent crate Pete!). Seemed like a good time to finally hone those rust skills!

Accepting the data was painless and was stored in small files. I reversed engineered the script that our previous technician had put together, and converted the sbd data to something human-readable. I found rust to be a challenging programming language to wrap my head around( I'm still not i fully understand the borrow checker), but I found writing tests and building and deploying to be a piece of cake. The strict typing the language requires makes me confident that most edge cases are handled elegantly - I think I lost a few hours sorting out how to use the chrono create to convert between time zones and provide default values- so many special cases!

Given the tiny size of the data (currently - 4 months of data is equivalent to 260kb), I didn't bother implementing any type of formal database. Instead, I just read all the data into memory, store each set of data with a time/station key combination, and serve the station/date ranges that are requested. So far it is fairly efficient, but I don't know if this will scale very well.

Current state

The page is currently hosted on a temporary dev server(apologies if the link is broken!), and all the basic functionality is there. I've gotten in touch with our IT server team to set up a permanent server.

Even though the first request was for live data only (I think we had said the past week), I added the option to increase the range to the past year, as the data is there and available. It does greatly increase the size of the data transfer, and slows down the rendering of the graph, so might have to restrict this in the future, or explore rending on a webGL canvas instead of as an svg.

I've started setting up a map to display the latest data. You can select a station and all the latest data will be displayed. Displaying the data graphically by observation time( e.g. temp, or wind direction) might be useful, however it is complicated by the fact that the data comes in a burst every 10 hours, and that sometime some data is missing. I think a time slider might work, and a menu to chose which dataset to display on the map.

Figure

live data.png

A few things have come up that are neat to explore and think about, and made me reflect on how to handle live data like this

Intermittent data

The Miers Valley station is set in a very secluded location, and it regularly runs out of power. You can see from the data that after period of low wind speed, the battery voltage starts draining, and eventually stops transmitting. Currently, the page only displays the weather station option if data is available

Figure

weather graph.png
Data from Miers Valley weather station. Not the large gap in time

One weather station occasionally send an sdb, but apparently without a payload. Not sure what the cause is, need to investigate. Here is an example log excerpt.

(2024-08-29 02:04:33) DEBUG: Handling TcpStream from 
(2024-08-29 02:04:33) INFO: Recieved message from IMEI 300434066116570 with MOMN 1445 and 280 byte payload
(2024-08-29 02:04:33) INFO: Stored message
(2024-08-29 08:37:56) DEBUG: Handling TcpStream from 
(2024-08-29 08:37:56) ERROR: Error when reading message: NoPayload

Options

The current title row of the page allows you to choose the station, export data, and choose the time interval. Here Miers Valley is a visible option, but if we choose to only display the data from the last week, the button disappears. Not sure i like that. Maybe I should make the button visible but disabled? How does it scale -what does it mean when we add more weather stations?

Figure

title row.png
The header

I'm not satisfied with the way the time interval is shown in small on the top right. The time interval choice dictates the range of data the server sends to the page, so I chose to put the past week as the default option to keep the size manageable. This has the consequence of sometimes hiding some stations options depending on whether data is available for that interval. Perhaps changing the 'Past Data' title to reflect the time interval would help.

Handling outliers

With some observations, it makes sense to restrict the range that is visible on the graph- e.g. avg wind direction cannot be panned above 360 or below 0. However, how do you handle data that is obviously wrong? If you look at the Miers Ridge data, you can see that the barometric pressure and relative humidity drop to near 0 on a regular basis. I'm not sure what the cause of those observations are, but I'm fairly sure it's not real! The consequence is that the actual variation in atmospheric pressure is difficult to read in the data, as the curve become flat.

A quick fix would be to again restrict the range of values on the y-axis., or at the bare minimum set a starting range that can be panned later. However, this would have to be on a station by station basis, as the observed pressure depends on the what altitude your observations are made at. Perhaps we should sanitise incoming data, and remove data obviously erroneous data from the sbd outside a given range for some observations?

Figure

newplot.png
Miers Ridge data showing suspicious measurments

Looking forwards

There are plans in the works to add several live weather station around NZ, which will complicate the backend data structure and the front user interface. For the user interface, we really need to decide how the page will be used, and what data we want to be visible. Do we want all the potential stations options to be obvious from the start? Do we want different pages for different geographical areas(e.g. Antarctica and NZ)?

As for the data structure, I might have to bite the bullet and set up a proper database. After my chat with Mike , we decided that thethys might not be the most efficient data model as this sets the dataset on top. He suggested using a data model that sets the instrumentation on top, in the form of: station -> sensor -> observation. I guess a sbd could be a form of 'sensor'. As for a proper database to store that data instead of the current sbd format - again, will look into mongoDB on Mike's suggestion.