I have a dataset that records the status of each step of a data pipeline as an example below:
tableName | dataSource | stage | status |
---|---|---|---|
Apple | A | ingestion | COMPLETE |
Apple | A | preprocessor | COMPLETE |
Apple | A | data_lake | COMPLETE |
Banana | A | ingestion | COMPLETE |
Banana | A | preprocessor | COMPLETE |
Banana | A | preprocessor | COMPLETE |
Banana | A | preprocessor | COMPLETE |
Banana | A | preprocessor | COMPLETE |
Banana | A | preprocessor | FAIL |
Banana | A | preprocessor | FAIL |
Mango | B | ingestion | COMPLETE |
Cherry | B | preprocessor | COMPLETE |
Raspberry | C | ingestion | COMPLETE |
I want to create a table on QuickSight Analysis to see the number of COMPLETE and FAIL for each dataSource and stage.
dataSource | stage | COMPLETE | FAIL |
---|---|---|---|
A | ingestion | 2 | 0 |
A | preprocessor | 1 | 1 |
A | data_lake | 1 | 0 |
B | ingestion | 1 | 0 |
B | preprocessor | 1 | 0 |
C | ingestion | 1 | 0 |
However, some of the tables in some stages have more than one data points and are counted toward the statuses when they’re summed in the calculated fields I created [COMPLETE asifelse(status = "COMPLETE",1,0)
and FAIL as ifelse(status = "FAIL",1,0)
] which make the table look like:
dataSource | stage | COMPLETE | FAIL |
---|---|---|---|
A | ingestion | 2 | 0 |
A | preprocessor | 5 | 2 |
A | data_lake | 1 | 0 |
B | ingestion | 1 | 0 |
B | preprocessor | 1 | 0 |
C | ingestion | 1 | 0 |
I am using CountOver to sum the number of COMPLETE and FAIL in each stage for each table then use a condition to check whether there is a FAIL
COMPLETE > ifelse({status}="FAIL" AND countOver({status},[{status},{tableName},{stage}])=0, 1, 0)
FAIL> ifelse({status}="FAIL" AND countOver({status},[{status},{tableName},{stage}])>1, 1, 0)
But this doesn’t seem to work. Does any have any idea how to countOver with condition?