Return the most common value and if tie, then return the latest by date

This has proven (to me at least) to be more difficult and convoluted to implement than it ought to be.

Here is a sample of the data:

Driver_ID Date OrderNumber Branch Rating
111 July 1 2024 1234 A 1
111 July 1 2024 2345 A 1
111 July 2 2024 4321 B 3
222 July 1 2024 5555 F 2
222 July 1 2024 6666 F 5
222 June 12 2024 8888 G 4
222 July 7 2024 9999 G 5

I want to create a table that shows the driver, and their most frequent branch. if the most frequent branches are a tie then return the latest one.

this is the desired output:

Driver_ID Rank By Avg Rating Avg Rating Most Frequent Branch
111 2 3 A
222 1 4 G

I have created a table where the DRIVER_ID is the primary dimension, then I created this calculated field:

ifelse(countOver(Branch, [{DRIVER_ID}, Branch],PRE_FILTER) = maxOver(countOver(Branch, [{DRIVER_ID}, Branch],PRE_FILTER), [{DRIVER_ID}],PRE_FILTER), Branch, NULL)

and I added it to the table as a dimension.

But this calculated field returns two values 1- the most frequent branch 2- NULL. so I have to add a filter to the table to exclude the NULL values. and convert all calculation in the table to PRE_FILTER and use ifelse to filter by parameters.

this seems to work at first but when a driver has a tie for the most frequest branch then this calculated field will return both branches.

so far Iā€™m unable to modify this calculated field to break the tie by choosing the latest branch instead. Please help

1 Like

Hello @Ali_B, I think I was able to sort this out so it will return the latest branch if there is a tie amongst most frequent. I will walk through the steps below:

I created this calculated field to create a rank for each branch partitioned by Driver ID:
Branch Rank per Driver =

denseRank
(
  [countOver({Order Number}, [{Driver_ID}, {Branch}], PRE_FILTER) DESC], 
  [{Driver_ID}],
  PRE_FILTER
)

Now, we can use this to check if a Driver has more than 1 branch as Rank 1 or not. Then we can return the expected Branch value:
Rank 1 or Most Recent =

ifelse(
    distinctCountOver(
        ifelse(
            minOver({Branch Rank per Driver}, [{Driver_ID}, {Branch}], PRE_FILTER) = 1,
            {Branch},
            NULL
        ), 
    [{Driver_ID}], PRE_FILTER) = 1 
    AND {Branch Rank per Driver} = 1, 
    {Branch},  
    distinctCountOver(
        ifelse(
            minOver({Branch Rank per Driver}, [{Driver_ID}, {Branch}], PRE_FILTER) = 1,
            {Branch},
            NULL
        ), 
    [{Driver_ID}], PRE_FILTER) > 1
    AND maxOver({Date}, [{Driver_ID}], PRE_FILTER) = {Date},
    {Branch},
    NULL
)

That will check if we have a single Rank 1 and if it matches the Rank in the field value, or if there are multiple rank 1s, then the Branch returned will match the max date value for that Driver ID.

I will mark this response as the solution, but please let me know if you have any remaining questions. I am happy to guide you further!

1 Like