Hello,
I have some timeseries data that exceeds the 10,000 point limit for line charts. I need to display as much of the data as possible to give my customers as much context into the data as possible.
This data is partitioned from different “log files” that live in our database and its data rate is 1hz.
To make this data visible to consumers, I “bin” the data for each log file by time (e.g. apply an aggregation over a window). I “find the maximum for each data point in minute 1 and plot that, same for minute 2, etc”. This would result in 60 data points in each bin and can visualize up to a 166 minute log (10,000pts/60s).
I currently have a “binning parameter” that users can configure which defaults to 60s. This works; however, some of my log files are much longer or shorter than 166 minutes … So I would like to dynamically set bin based on the size of the log. It is very cumbersome and unprofessional to have to communicate to all my consumers: “see if there is a ‘other’ category in the legend of your plot, if so, decrease the bin size until it goes away” or “if the resolution looks too small, increase the bin size parameter until it looks better, but not too far”.
Attempt 1: Dynamically setting a Parameter default.
I have a Parameter that controls everything set to 60s. If I could dynamically set that based on number of data points, then that would solve my issue. I dont see any way to do this.
Attempt 2: Build a calculated field for the bin size
I have built up something to the effect of:
bin_size_s = lastValue(dataSet)-firstValue(dataSet) / 10000
timestamp_binned_s = decimalToInt({timestamp_s} / decimalToInt(bin_size_s)) * bin_size_s
I am able to visualize this in a table and confirm that the timestamp_binned_s is what I expect.
I can use timestamp_binned_s as my X axis and apply an aggregation for my Y data for the each…
However, QuickSight says I cannot use a custom aggregation as a dimension field. This binning that im building must be a dimension field to be the X-axis in my line chart.
Does anyone have any suggestions or even different approaches to my high-level goal to “display my timeseries data with the 10,000 data point limitation”?
Thanks,
Oliver