Dynamically binning timeseries data

oek · February 9, 2023, 12:28am

Hello,

I have some timeseries data that exceeds the 10,000 point limit for line charts. I need to display as much of the data as possible to give my customers as much context into the data as possible.

This data is partitioned from different “log files” that live in our database and its data rate is 1hz.

To make this data visible to consumers, I “bin” the data for each log file by time (e.g. apply an aggregation over a window). I “find the maximum for each data point in minute 1 and plot that, same for minute 2, etc”. This would result in 60 data points in each bin and can visualize up to a 166 minute log (10,000pts/60s).

I currently have a “binning parameter” that users can configure which defaults to 60s. This works; however, some of my log files are much longer or shorter than 166 minutes … So I would like to dynamically set bin based on the size of the log. It is very cumbersome and unprofessional to have to communicate to all my consumers: “see if there is a ‘other’ category in the legend of your plot, if so, decrease the bin size until it goes away” or “if the resolution looks too small, increase the bin size parameter until it looks better, but not too far”.

Attempt 1: Dynamically setting a Parameter default.
I have a Parameter that controls everything set to 60s. If I could dynamically set that based on number of data points, then that would solve my issue. I dont see any way to do this.

Attempt 2: Build a calculated field for the bin size
I have built up something to the effect of:

bin_size_s = lastValue(dataSet)-firstValue(dataSet) / 10000
timestamp_binned_s = decimalToInt({timestamp_s} / decimalToInt(bin_size_s)) * bin_size_s

I am able to visualize this in a table and confirm that the timestamp_binned_s is what I expect.

I can use timestamp_binned_s as my X axis and apply an aggregation for my Y data for the each…

However, QuickSight says I cannot use a custom aggregation as a dimension field. This binning that im building must be a dimension field to be the X-axis in my line chart.

Does anyone have any suggestions or even different approaches to my high-level goal to “display my timeseries data with the 10,000 data point limitation”?

Thanks,
Oliver

Max · February 9, 2023, 9:49pm

For your first attempt you could look to set up a dynamic default. This would require you to create a separate dataset with your group number and all of the users that are logging in to set this dynamically.

For your second attempt can you instead do this in SQL?

oek · February 9, 2023, 10:09pm

Thanks,

I was hoping to not have to touch my dataset at all.
More context: I’m trying to do this for tens of dashboards which all have their own dataset. To create a new dataset or switch to SQL would mean I would have to build out support for programmatically doing this with the “QuickSight definition API thing”. I currently generate all the dashboards programmatically but was hoping to not have to do the datasets.

While it sounds possible for your two suggestions, I was hoping not to .

Do you understand the crux of the “custom aggregation cannot be a dimension”? That part left me pretty confused.

Max · February 28, 2023, 9:26pm

Yes you can’t add aggregation as dimensions.

sum({field}) cannot be used as a group. Essentially this aggregation needs to be grouped itself and cannot be a group. The only have to make it a group is to transfer the logic to SQL so that it is already calculated and not aggregated in quicksight.

oek · March 1, 2023, 10:14pm

Understood. Thank you for the response