Hi Guys!
I hope you can help me with the following issue that I encountered:
I have a dataset A where each row represents a meeting and for each meeting, we have several participants. There could be more than one meeting per day, but each meeting is unique and has a unique meeting_id associated. We want to compute the total number of participants in such a way that it will not be changed by filters, except if the filter is applied to the meeting date.
Moreover, as a next step, we join this dataset with a second one, dataset B, where we have all the participants’ info, grouped by meeting_id. This will be a left join, with dataset A being on the left and with the meeting_id as the join key. To accomplish this, we are creating two different fields on the dataset A level (so prior to the join):
- number_of_participants_per_day :
sumOver(minOver({number_of_participants}, [{meeting_id}],PRE_FILTER) / countOver({number_of_participants}, [{meeting_id}], PRE_FILTER),[year, month, day],PRE_FILTER)
- total_number_of_participants :
sum(minOver({number_of_participants_per_day}, [year, month, day], PRE_AGG) / countOver({number_of_participants_per_day}, [year, month, day], PRE_AGG))
The issue: This setup works correctly until we filter for something which is excluding an entire day, here an example: if on a specific day we only had meetings where all the participants’ ages were over 50 and we filter for age < 50 then we also see the total_number_of_participants changing (decreasing of a factor which is exactly the number of participants present on that day)! However, as the sumOver is applied as a PRE_FILTER condition, I would not have expected this to change, why is that? What could I do?
NOTE: in number_of_participants_per_day we are nesting a minOver()/countOver() as after the join we will have duplicate rows for the meeting_id values.
What I already tried:
- Creating the fields after the join
- removing the nested minOver()/countOver()
- Full join
- Right join
Thanks for your help!