I am analyzing employee GitHub issue/commit data. The dataset joins two CSV files together: one file (“commits”) has each row showing each commit published, and the other file (“issues”) has each row showing each issue and its details. For example, let’s say Bob worked published 2 commits for issue #1234, and Jane published 3 commits for the same issue: the “commits” file would show 5 rows for issue #1234, 2 for Bob and 3 for Jane, and the “issues” file would only show 1 row for issue #1234.
Is there a way in QuickSight (analysis or dataset) to calculate who the first employee to make a commit was? I tried using firstValue(), and the formula had no errors:
{Commit Date} and {Issue Num} are from the original “issues” file, while {Employee} is a calculated field I created in QuickSight using the “Email” column in the “issues” file.
However, when I tried adding this column to a “Table” being grouped by “Issue” in an analysis, it wouldn’t accept the calculated field as a dimension, and when I added it as a measure, I got the following error message:
“Table calculation attribute reference(s) are missing in field wells”
Hi @michaelkhosrofian,
What if you tried using a denseRank function and partition it by your issue #, then you can use an ifelse function to show employee name for when the ranking = 1.
Let me know if this works for your case or if you have any additional questions!
I tried using a denseRank function instead, but I got the same error message when I added it to the analysis table.
Even if I didn’t get an error, I don’t think that would work for what I’m looking for, because I assume even if I added the function to the table, the table would be aggregated by both issue and employee, so only the stats from the first employee would show. I just need to aggregate by issue, and would like to know who the first employee who made a commit for each issue.
If you add all other fields, the error goes away but you end up with all the other fields that you don’t necessarily want in your visual. Even if you hide those unwanted columns, you still have the extra rows.
The key here is to use PRE_AGG with the denseRank function so that it doesn’t force you to include every field that’s in your calculated field in your visual.
Thanks for the clarification regarding denseRank and PRE_AGG, I was able to create a denseRank function in my analysis. Unfortunately, filtering for only the #1 ranked employee (first committer) will filter out the stats of any other employee(s) who made commits to the issue, because it still aggregates by both issue_number and employee, when I only need it to aggregate by issue_number.
To clarify, I would like my analysis table to show the Issue number, a column that shows the employee with either the first/most commit(s) per issue, and then any other stats (for this example, just the total number of commits) for all employees, not just the one with the first/most commit(s).
Below is a table that uses the example from my initial post, assuming Bob was the first committer and Jane had the most commits:
Are you still working on this issue or were you able to find a solution?
I think the easiest/shortest solution here would be to pre-calculate these fields using SQL in the join of these two CSVs. Would that be possible or do you need this solution to exist in QuickSight?
Are you still working on this issue or were you able to find a solution?
The firstValue and denseRank suggestions were helpful, and I was able to create a table that included one of these functions. I still couldn’t figure out how to create a table that included one of these functions and the general aggregate stats by issue, without grouping by the issue and the first/most committer.