I have a dataset that records the status of each step of a data pipeline as an example below:
| tableName | dataSource | stage | status |
|---|---|---|---|
| Apple | A | ingestion | COMPLETE |
| Apple | A | preprocessor | COMPLETE |
| Apple | A | data_lake | COMPLETE |
| Banana | A | ingestion | COMPLETE |
| Banana | A | preprocessor | COMPLETE |
| Banana | A | preprocessor | COMPLETE |
| Banana | A | preprocessor | COMPLETE |
| Banana | A | preprocessor | COMPLETE |
| Banana | A | preprocessor | FAIL |
| Banana | A | preprocessor | FAIL |
| Mango | B | ingestion | COMPLETE |
| Cherry | B | preprocessor | COMPLETE |
| Raspberry | C | ingestion | COMPLETE |
I want to create a table on Quick Sight Analysis to see the number of COMPLETE and FAIL for each dataSource and stage.
| dataSource | stage | COMPLETE | FAIL |
|---|---|---|---|
| A | ingestion | 2 | 0 |
| A | preprocessor | 1 | 1 |
| A | data_lake | 1 | 0 |
| B | ingestion | 1 | 0 |
| B | preprocessor | 1 | 0 |
| C | ingestion | 1 | 0 |
However, some of the tables in some stages have more than one data points and are counted toward the statuses when they’re summed in the calculated fields I created [COMPLETE asifelse(status = "COMPLETE",1,0) and FAIL as ifelse(status = "FAIL",1,0)] which make the table look like:
| dataSource | stage | COMPLETE | FAIL |
|---|---|---|---|
| A | ingestion | 2 | 0 |
| A | preprocessor | 5 | 2 |
| A | data_lake | 1 | 0 |
| B | ingestion | 1 | 0 |
| B | preprocessor | 1 | 0 |
| C | ingestion | 1 | 0 |
I am using CountOver to sum the number of COMPLETE and FAIL in each stage for each table then use a condition to check whether there is a FAIL
COMPLETE > ifelse({status}="FAIL" AND countOver({status},[{status},{tableName},{stage}])=0, 1, 0)
FAIL> ifelse({status}="FAIL" AND countOver({status},[{status},{tableName},{stage}])>1, 1, 0)
But this doesn’t seem to work. Does any have any idea how to countOver with condition?
