Pre-Filter Issue

maltiza_ivanova · May 28, 2024, 4:23pm

Hi Guys!

I hope you can help me with the following issue that I encountered:

I have a dataset A where each row represents a meeting and for each meeting, we have several participants. There could be more than one meeting per day, but each meeting is unique and has a unique meeting_id associated. We want to compute the total number of participants in such a way that it will not be changed by filters, except if the filter is applied to the meeting date.

Moreover, as a next step, we join this dataset with a second one, dataset B, where we have all the participants’ info, grouped by meeting_id. This will be a left join, with dataset A being on the left and with the meeting_id as the join key. To accomplish this, we are creating two different fields on the dataset A level (so prior to the join):

number_of_participants_per_day :

sumOver(minOver({number_of_participants}, [{meeting_id}],PRE_FILTER) / countOver({number_of_participants}, [{meeting_id}], PRE_FILTER),[year, month, day],PRE_FILTER)

total_number_of_participants :

sum(minOver({number_of_participants_per_day}, [year, month, day], PRE_AGG) / countOver({number_of_participants_per_day}, [year, month, day], PRE_AGG))

The issue: This setup works correctly until we filter for something which is excluding an entire day, here an example: if on a specific day we only had meetings where all the participants’ ages were over 50 and we filter for age < 50 then we also see the total_number_of_participants changing (decreasing of a factor which is exactly the number of participants present on that day)! However, as the sumOver is applied as a PRE_FILTER condition, I would not have expected this to change, why is that? What could I do?

NOTE: in number_of_participants_per_day we are nesting a minOver()/countOver() as after the join we will have duplicate rows for the meeting_id values.

What I already tried:

Creating the fields after the join
removing the nested minOver()/countOver()
Full join
Right join

Thanks for your help!

Xclipse · May 29, 2024, 3:41am

Hi @maltiza_ivanova, welcome to the QuickSight Community! To ensure the total number of participants is calculated correctly and remains unaffected by filters on participant attributes, we need to adjust our calculations to ensure the total_number_of_participants correctly aggregates the daily participants without being affected by non-date filters.

Try these steps, you may have to adjust the syntax

Create a calculated field number_of_participants_per_day_static in dataset A:

sumOver(minovery({number_of_participants}, [{meeting_id}], PRE_FILTER) / countOver({number_of_participants}, [{meeting_id}], PRE_FILTER), [year, month, day], PRE_FILTER)

Ensure this field is not affected by non-date filters:

Use it in a way that it remains static regardless of participant-level filters.

Calculate total_number_of_participants using this static field:

sum({number_of_participants_per_day_static}, [year, month, day], PRE_AGG)

If the above adjustments still do not fully solve the issue, consider using a custom SQL. For example:

WITH participants_per_meeting AS (
    SELECT 
        meeting_id,
        COUNT(participant_id) AS number_of_participants
    FROM 
        dataset_b
    GROUP BY 
        meeting_id
),
participants_per_day AS (
    SELECT 
        a.meeting_date,
        SUM(p.number_of_participants) AS number_of_participants_per_day
    FROM 
        dataset_a a
    JOIN 
        participants_per_meeting p
    ON 
        a.meeting_id = p.meeting_id
    GROUP BY 
        a.meeting_date
),
total_participants AS (
    SELECT 
        SUM(number_of_participants_per_day) AS total_number_of_participants
    FROM 
        participants_per_day
)
SELECT 
    tp.total_number_of_participants
FROM 
    total_participants tp;

Static reference point in your calculation can be tricky. Here is a good reference : Level-Aware Calculations: Tips and Best Practices

Did this solution work for you? I am marking this reply as, “Solution,” but let us know if this is not resolved. Thanks for posting your questions on the QuickSight Community!

paolo_r · May 29, 2024, 10:41am

Hi @Xclipse,
Thank you for your answer!
I am also having the same issue @maltiza_ivanova has shared and the solution you proposed doesn’t seem to work.
Given that the two fields are created on a dataset level (before the join) using LAC inside sum is not possible. Moreover it is not possible to specify PRE_AGG inside sum.
Finally, after the data is joined there will be duplicates in the meeting_id column, considering that on dataset B each row represent a single participant of a single meeting of a single day: this means that the simple sum approach you proposed will result in a computed value which is much bigger than the real one.
About the SQL approach, it seems quite interesting but I have a question: can I use it if I am directly uploading a csv file? If so, how?

Could it be possible to reopen this discussion?

Thanks in advance!
Paolo

David_Wong · May 30, 2024, 1:16am

Hi @paolo_r,

I would suggest creating a separate topic with specifics about your dataset so we can better help you. In your new topic, please specify if you’re using LAC-A or LAC-W, and what type of visual you’re trying to create.

Although PRE_FILTER isn’t supported with LAC-A, I’ve had pretty good success using LAC-W with PRE_FILTER and then wrapping it with an LAC-A function.

number_of_participants_per_meeting (fixed total)

minOver({number_of_participants}, [{meeting_id}], PRE_FILTER)

number_of_participants_per_day (fixed total, assuming that day is the dimension in the visual)

sum(min({number_of_participants_per_meeting}, [{meeting_id}]))

This helps to handle the duplicates without having to divide by the count but it also depends what type of visual you’re trying to create.

paolo_r · May 30, 2024, 11:18am

Hi @David_Wong,

Thanks for your answer! @maltiza_ivanova and I are struggling with the exact same issue as we are colleagues. Could it be possible to remove the “Solution” marker from this post?
I will also answer your questions.
This is what we are trying to achieve exactly: we want to have the total_number_of_participants as a number which can be filtered only by date. E.g.: let’s pretend that we already know the results of my analysis and the total number of participants for the whole 2023 were 100 while on January 2023 they were 20. I want my dashboard to show the total number of participants in a (KPI) visual and I want it to show that if I change my filter from the hole 2023 to only January 2023 the numbers go down from 100 to 20; moreover and most importantly, I want this visual to be only and exclusively affected by the date filter: if I filter for “age < 50 AND date == January” I still see 20, or if I select “whole 2023 AND age < 50” I still see 100.
In addition to showing values on a visual, we need to be able to use the total_number_of_participants for computing frequencies (e.g. people attending on Mondays/total_number_of_participants)
The two calculated fields number_of_participants_per_day and total_number_of_participants in @maltiza_ivanova’s post are working fine for what I just described but there is one exception (highlighted in section The issue: of the original post): when we apply a filtering condition (example: participant age) which is unluckily excluding all the participants from a single day, we also see this reflected in the total number of participants calculations: we would like to avoid this and have the total_number_of_participants changing only when filtering by date.
Is this possible?
Thanks in advance

Xclipse · June 5, 2024, 1:17am

Hi @paolo and @maltiza_ivanova, besides using SQL and the solutions offered by David – I’m out of ideas. Sorry, currently this is not possible but I’m marking this for feature request. At AWS, our roadmap is primarily driven by our customers. Your feedback helps us build a better service. I have tagged this as a feature request. More features are being added on a regular basis, so please keep an eye on the What’s New / Blog . You can set up a Watching Alert by clicking on the bell icon.