Quicksight generation of CDF plots

I’m looking to generate the CDF (cumulative distribution function) plot for a given data series.

Taking the following example data series:

data_series = [584,441,114,237,774,332,361,214,478,366,121,398,246,646,117,447]

The CDF of this distribution should look like this:

The idea is that the data series entries are represented in the X-axis while their corresponding cumulative probability is recorded in the Y-axis.

Any advice will be greatly appreciated.

Hello @fmacias , welcome to the QuickSight community!

I think the most straight forward way to do this would actually be in SQL. Below is an example of how the function for Redshift:

You could try the following calculated field:

CDF = runningSum(count({data_series}), [{data_series} ASC]) / countOver({data_series}, [], PRE_AGG)

Let me know if that works or you receive an error!

Hi fmacias,

After reading both your posts I’m going to make some assumptions. See if this helps.

Understanding Your Issue

You’re trying to generate a CDF plot for a given data series. Here’s a summary of your approach:

  1. CDF Plot Generation: You provided an example data series and a CDF plot generated which illustrates how you want the data series entries represented on the X-axis with their corresponding cumulative probability on the Y-axis.
  2. Line Chart Limitation: You tried using a line chart in QuickSight, but you encountered a limitation. The X-axis in a line chart doesn’t support the granularity you need, as it aggregates data points, preventing you from accurately plotting each data point.
  3. Scatter Plot Limitation: You then attempted to use a scatter plot, which can plot each data point individually. However, you found that there’s no option to connect the dots with a line in a scatter plot in QuickSight.

Why QuickSight Doesn’t Support This

Currently, QuickSight has limitations in both line and scatter plots that prevent the exact representation you’re aiming for:

  • Line Charts: QuickSight’s line charts aggregate data points, making it impossible to plot each individual point without aggregation.
  • Scatter Plots: While scatter plots allow individual points to be plotted, QuickSight does not support connecting these points with a line, which is crucial for representing a CDF.

No Workarounds Available

As of now, there are no direct workarounds in QuickSight to achieve the precise CDF plot you’ve demonstrated. QuickSight doesn’t support the combination of granularity and connectivity of data points required for such a visualization.

Feature Request

I recommend submitting a feature request to the QuickSight team. Adding support for connecting dots in scatter plots or allowing non-aggregated line plots would be valuable features for many users.

Alternative Visualization

While QuickSight might not support the exact visualization you need, you can consider the following alternative to represent cumulative probability:

  • Step Plot with Aggregation: Create a step plot where you round your data to the nearest 100 and aggregate, then use a line chart to plot these points. While this aggregates some data, it can still provide a visual representation of cumulative probability.

image

c_round:
round({data_series} / 100) * 100

step line style:

1 Like

Hi duncan,

Thanks for the reply. Tried that calculated field and I’m getting a mismatched aggregation error:

“Mismatched aggregation. Custom aggregations can’t contain both aggregated and nonaggregated fields, in any combination.”

1 Like

Hi robdhondt,

Thanks for the reply. Yes, both questions were related to this same topic. Scatter plot seems to be doing the trick, however having no connecting line between the dots makes the plot very difficult to interpret.

Alternatively, having a line plot allowing independent X and Y inputs would also help. It is possible from my end to generate the cdf probability for each of the measurements in the data_series, so were that supported I could just plot cdf vs data_series.

Unfortunately, as you mentioned, it seems like none of these is currently supported. What is the process to submit these as a feature request for the QuickSight team to implement?

Hello @fmacias !

I can tag this topic as a feature request for the QuickSight team for you.