Our use case involves creating a dashboard that allows users to see how many unique customers purchased different [category1 / category2 … category5 / brand / SKU] for arbitrary periods with daily granularity.
To achieve this with satisfactory performance and cost, we are using Athena’s hyperloglog functions (approx_set) to pre-aggregate the data at the SKU/DAY granularity.
The sketches resulting from approx_set can then be re-aggregated using merge() and the number of unique customers can be estimated using “cardinality”.
We have successfully incorporated the Athena query consolidating data using cardinality(merge(unique_clients_sketch)) in a dataset with custom SQL.
However, this approach does not allow us to define the dimensions used for aggregation during analysis creation in QuickSight - the dimensions need to be hardcoded in the query. This significantly limits the usability of these wonderful Athena functions.
An alternative to solve this would be for QuickSight to have a “pass-through” function equivalent to Tableau’s RAWSQL function, allowing custom aggregation functions with SQL so that we can inform QuickSight that the unique_clients_sketch field should be aggregated with the cardinality(merge(…)) function.
Another possibility would be for QuickSight to support HyperLogLog functions that are available in most services like Athena, Redshift, and Postgres.
I would greatly appreciate it if anyone in the community could share similar experiences and if there is a better approach with the current technology to use QuickSight more effectively with HLL.
I would also like to suggest that this be considered in the solution’s roadmap. I believe it would be a significant differentiator.
Cheers!
@MANUEL_G_P_FILHO ,
Thanks for your feedback. Before I tag this as a feature request, as I understand, the limitation you are facing today is that if you have many dimensions, your sql query is hardcoded to specific dimensions and you want the flexibility to use any dimension ?
For example, in the below , dynamically change to use product_id instead of country.
WITH mainquery AS (
SELECT 'us' country, 1 product_id, 100 sales UNION ALL
SELECT 'us', 2, 200 UNION ALL
SELECT 'us', 3, 300 UNION ALL
SELECT 'us', 4, 400 UNION ALL
SELECT 'us', 5, 500 UNION ALL
SELECT 'ca', 3, 600 UNION ALL
SELECT 'ca', 4, 700 UNION ALL
SELECT 'ca', 5, 800 UNION ALL
SELECT 'ca', 6, 900 UNION ALL
SELECT 'ca', 7, 1000 UNION ALL
SELECT 'ca', 8, 1100 UNION ALL
SELECT 'ca', 9, 1200
),
agg as ( select country, approx_set(sales) as unique_sketch from mainquery group by country )
select country, cardinality(merge(unique_sketch)) from agg group by country
Kind regards,
Koushik
Hi Koushik,
Thank you for your response.
Yes, you are correct. The limitation we are facing is that our SQL query is hardcoded to specific dimensions, and we would like the flexibility to use any dimension dynamically during analysis creation in QuickSight.
For example, in your provided SQL, we would like to have the ability to dynamically change the dimension used for grouping, such as using product_id
instead of country
without having to modify the underlying SQL query each time.
We are seeking a way to leverage QuickSight to dynamically aggregate these pre-aggregated sketches using different dimensions without modifying the SQL query. Specifically, a pass-through function similar to Tableau’s RAWSQL function could allow us to apply custom aggregation functions within QuickSight.
Cheers
Manuel
Thanks for the additional details. I will tag this as a feature request.
Kind regards,
Koushik