First of all, we love Timescale DB. It’s great for time series data. It’s an extension that can be installed on an existing PostgreSQL instance to add a lot of time series-friendly functionality. That reduces time to market and shortens the learning curve for anybody with a background in SQL.
Recently we had an IoT company managing 100’s of thousands of sensors which each checked in with a heartbeat every few minutes. They were using PostgreSQL for the time series data, but had several MySQL databases managing meta data (basically their dimensions). PostgreSQL was a better fit for the high volume of data they were ingesting with it’s auto-workers and auto vacuum features.
Enter TimescaleDB
We quickly identified that TimescaleDB would be a great fit for them since they didn’t need a high resolution on old data, but wanted to maintain a high resolution on recent data. Timescale’s compression features allowed them to do just this. So while they generated several GB of data per hour, they could compress the data after 24 hours in a format that took up less than 5% of that space and they would drop chunks after a week.
The real selling point of Timescale, though, was the continuous aggregate feature. Ingesting a large amount of data and turning it into actionable insight within a couple of minutes. And the calculated metrics were persistent – even after compressing and subsequently dropping chunks, the continuous aggregate data stayed intact. The footprint of the continuous aggregate data was almost nothing, so they stored it indefinitely. (Each day was less than 15MB).
Legacy MySQL System
First we tried putting them on the Timescale DB cloud. We managed the migration and set everything up for them, but we quickly ran into an issue. They had a lot of data stored in various MySQL databases; for a lot of reasons they didn’t want all of this data stored in their Timescale instance.
So we assumed adding a foreign data wrapper (FDW) wouldn’t be an issue so they could query their remote MySQL databases within PostgreSQL to fetch additional metadata. (If you’ve used linked servers in SQL server, this is basically the same idea).
When we reached out to the Timescale team, though, they told us that they only supported PostgreSQL FDW’s, meaning within Timescale cloud you can only query other PostgreSQL servers. So we asked them to enable mysql_fdw. Having used the extension before we knew this would only take a few minutes.
No go on the MySQL FDW’s…
Unfortunately, though, this feature wasn’t a common enough request so they informed us that they wouldn’t be able to add it. And I respect that they are careful with adding extensions to their system for various reasons (support, security, etc.). So we went back to our customer asking if they would be willing to move the MySQL meta data into the PostgreSQL instance.
The customer couldn’t move the data in, though. Too many of their endpoints relied on connections with the MySQL databases and refactoring the code would have been a big project. So we had to roll our own instance of Timescale in GCP.
Setting up an instance of PostgreSQL 11 in GCP along with a read-only replica (for reads – a good practice to keep reads and writes separate in a production environment when possible), along with the required MySQL FDW took less than two days.
Deploying a robust solution isn’t usually this fast, and there was definitely a lot we learned with the initial Timescale rollout that made the subsequent self-installed project faster, but if you’re willing to roll your sleeves up and need a reliable time series solution we recommend you consider Timescale. And of course, if you don’t want to go it alone, we’re always here to help.