Good morning,
in my dataset the are two columns (“T040” and “T041”) that contains temperature values. I would like to remove all the rows where T040 and T041 are equals to the previous row. I need something like this:
Original dataset
TimeStamp
T040
T041
16:15:00
28.3
27.8
16:15:01
28.2
27.9
16:15:02
28.2
27.9
16:15:03
28.2
27.9
16:15:04
28.4
27.6
Dataset desidered
TimeStamp
T040
T041
16:15:00
28.3
27.8
16:15:01
28.2
27.9
16:15:04
28.4
27.6
Is there a way to do it? I’m executing a DirectQuery on my S3 storage
In this way vibrationMaxValue will have a max value within t042 parition. This is right, but I need to exclude all duplicated values and I don’t know how to create. I tryied to change that field in
This way you can filter the data directly there and make sure that all the analyses that use it received the filtered data.
However remember that it is the best practice to work with already clean and massaged datasets to avoid overloading the BI layer with calculations and filters so your visuals can render blazing fast.
I would recommend to create a data pipeline using AWS Glue or or your favorite ETL tool to ensure that the data that arrives to the consumption layer (Quicksight in this case) doesn’t contain duplicates.
Hope it helps, please mark this solution as solved if that’s the case also to help other members of the community., otherwise let us know.
@EnriqueS thank you very much!! As first step I will setup filters in a not-so-best-practice way, in the meantime I will study Glue and use it as soon as I will be ready.
The Glue way is absolutelly the correct one, and I think I will use it a lot in the future.
Do you launch this code in quicksight or do you do it in sql? is_duplicate_t040 = ifelse(lead({t040},[{timestamp} ASC],1)=firstValue({t040]},[{timestamp} ASC],[{timestamp}]),‘Duplicate’,‘Keep’)