Enhancing Support for HyperLogLog Functions in QuickSight

MANUEL_G_P_FILHO · June 12, 2024, 1:36pm

Our use case involves creating a dashboard that allows users to see how many unique customers purchased different [category1 / category2 … category5 / brand / SKU] for arbitrary periods with daily granularity.

To achieve this with satisfactory performance and cost, we are using Athena’s hyperloglog functions (approx_set) to pre-aggregate the data at the SKU/DAY granularity.

The sketches resulting from approx_set can then be re-aggregated using merge() and the number of unique customers can be estimated using “cardinality”.

We have successfully incorporated the Athena query consolidating data using cardinality(merge(unique_clients_sketch)) in a dataset with custom SQL.

However, this approach does not allow us to define the dimensions used for aggregation during analysis creation in QuickSight - the dimensions need to be hardcoded in the query. This significantly limits the usability of these wonderful Athena functions.

An alternative to solve this would be for QuickSight to have a “pass-through” function equivalent to Tableau’s RAWSQL function, allowing custom aggregation functions with SQL so that we can inform QuickSight that the unique_clients_sketch field should be aggregated with the cardinality(merge(…)) function.

Another possibility would be for QuickSight to support HyperLogLog functions that are available in most services like Athena, Redshift, and Postgres.

I would greatly appreciate it if anyone in the community could share similar experiences and if there is a better approach with the current technology to use QuickSight more effectively with HLL.

I would also like to suggest that this be considered in the solution’s roadmap. I believe it would be a significant differentiator.

Cheers!

Koushik_Muthanna · June 13, 2024, 10:38am

@MANUEL_G_P_FILHO ,

Thanks for your feedback. Before I tag this as a feature request, as I understand, the limitation you are facing today is that if you have many dimensions, your sql query is hardcoded to specific dimensions and you want the flexibility to use any dimension ?

For example, in the below , dynamically change to use product_id instead of country.

WITH mainquery AS (
  SELECT 'us' country, 1 product_id, 100 sales UNION ALL
  SELECT 'us', 2, 200  UNION ALL
  SELECT 'us', 3, 300   UNION ALL
  SELECT 'us', 4, 400       UNION ALL
  SELECT 'us', 5, 500       UNION ALL
  SELECT 'ca', 3, 600       UNION ALL
  SELECT 'ca', 4, 700       UNION ALL
  SELECT 'ca', 5, 800       UNION ALL
  SELECT 'ca', 6, 900       UNION ALL
  SELECT 'ca', 7, 1000       UNION ALL
  SELECT 'ca', 8, 1100       UNION ALL
  SELECT 'ca', 9, 1200
),
agg as ( select country, approx_set(sales) as unique_sketch from mainquery group by country ) 
select country, cardinality(merge(unique_sketch)) from agg group by country

Kind regards,
Koushik

MANUEL_G_P_FILHO · June 13, 2024, 1:19pm

Hi Koushik,

Thank you for your response.

Yes, you are correct. The limitation we are facing is that our SQL query is hardcoded to specific dimensions, and we would like the flexibility to use any dimension dynamically during analysis creation in QuickSight.

For example, in your provided SQL, we would like to have the ability to dynamically change the dimension used for grouping, such as using product_id instead of country without having to modify the underlying SQL query each time.

We are seeking a way to leverage QuickSight to dynamically aggregate these pre-aggregated sketches using different dimensions without modifying the SQL query. Specifically, a pass-through function similar to Tableau’s RAWSQL function could allow us to apply custom aggregation functions within QuickSight.

Cheers
Manuel

Koushik_Muthanna · June 13, 2024, 1:29pm

Thanks for the additional details. I will tag this as a feature request.

Kind regards,
Koushik