Last minute of data

mikkel1 · December 27, 2024, 5:52am

Hi Community

I have a dataset consisting of timestamp, person_id and acceleration. The data is regular csv data. I am having trouble figuring out how to only work with the last minute of data.

I have around 25 accelerometer data points per second. I want to count how many times those data points are outside a threshold, within the last recorded minute. I cant filter the time, as I don’t have any set times for recording. So for each person_id, I want to get roughly the last 1500 rows, based on timestamp and count how many times acc > 0.3 for example.

How can i achieve this?

apjvinod · December 27, 2024, 7:17am

Hi @mikkel1

Welcome to QuickSight Community , sounds like interesting use case. Could you please upload sample data and sample visual in the QuickSight Arena and share it to us to work together on this usecase.

You want to find out the latest recorded datapoints for every person_id and for each person-id there will be many datapoints with the same latest recorded timestamp. Wondering is the threshold is static number(0.3) or any logic needs to be applied and computed for every person-id?

mikkel1 · January 2, 2025, 8:53am

Hi @apjvinod, thank you.
I hope screenshots are sufficient, otherwise let me know.

Here is what I’m looking for in Quicksight on the top and a sheet for later:

In this image, I’ve used this:

countIf({g_deviation}, {g_deviation} > 0.2)

To count the number of times the acceleration deviates (don’t be confused be the name. It doesn’t count pr. second here). As you can see, it aggregates over the whole day. I want it over the last minute, or 250 measurements.

You can see what I mean in the above sheet.
So the green might be the latest 5 measurements, and because 3/5 are over 0.3, the patient is moving. The yellow might be the previous 5 measurements and as 3/5 is under 0.3, the patient is classified as non moving.

I want to continuously evaluate the newest 5 measurements and set “is_moving” based on that.

Does that make sense?

rajencho · January 2, 2025, 10:22am

Hi @mikkel1

Thank you for reaching out to Quicksight Community.

In this case you can use countOver function.

Since you would want to use two conditions here :

calculate only for last one minute
calculate acc > 0. 3

you can create two calculated fields (ifelse function) for above two.
Eg : 1. acc_condition = ifelse( acc > 0.3,1,0)
2. is_one_minute = ifelse condition based on date.

Now you can use these two calculated fields as partition by under first countOver function.
Let us know if this works.

prantika_sinha · January 2, 2025, 10:37am

Hi @mikkel1 ,

For this use case, the steps I can think of are:

Step 1: Calculate minute against all records based on timestamp field. You can truncate your timestamp to the nearest minute.
Step 2: Identify the latest minute. Since you are looking for a dynamic approach, you can use maxover to fetch the latest minute timestamp. You should use the person id as partition since you intend to calculate for each person.
Step 3: write the conditions as @rajencho mentioned above. ifelse( truncated timestamp = max timestamp and deviation>0.2, return person_id, else NULL). Since you have calculated deviation already, have not broken down that step.
Step 4. Now to calculate no. of times deviated, you can do count of new field from step 3.

Also, a sample dataset upload in the Arena can help us share with you the solution.

Thanks,
Prantika

mikkel1 · January 4, 2025, 5:12am

Hi @rajencho and @prantika_sinha, thanks for the answers.

I got over my lazyness and tried out arena. It seems good to learn, so thanks for the push haha. Here is a link:
last_minute_of_data_sample

The table I included has all the times in the dataset and all the people, but as mentioned above, I want one row for each person id, with a label “has_moved” (or something like that). This label should be true if “g_deviation” > 0.3 50% of the time within the last minute available for the person (just as a concrete example. I might tweek the specifics).

I tried the countOver, but it seems like I’ve used it wrong.
I would also love if you could explain how I truncate time and calculate over time in more detail, this is the part I’m struggling with.

prantika_sinha · January 4, 2025, 3:16pm

Hi @mikkel1 ,

Have tried to create a sample solution. Please check the second table and the calculated fields created.
sample solution 40403

Let me know if the requirement was interpreted incorrectly. A quick rundown of the sample solutions is:

Created nearest minute timestamp against time field
calculate latest minute for each id
check deviation for latest minute only
calculate deviation occurrence / total response times to check 50% threshold, allot moving, not moving flag

Thanks,
Prantika

mikkel1 · January 5, 2025, 12:04am

Amazing @prantika_sinha, thank you so much!

Just making sure I understand everything:

truncDate just “rounds up” the date to nearest minute.
So these three rows:
2024-12-27 10:30:15.123
2024-12-27 10:30:16.456
2024-12-27 10:30:17.789

Are turned into this:
2024-12-27 10:30:00
2024-12-27 10:30:00
2024-12-27 10:30:00

When you count deviation and records, you just test to see if the rows truncated time matches the latest instance of truncated time
When you label moving/non-moving, you just check if more than 50% of the records are over the threshold.

Once again, thank you so much. This was exactly what i was looking for.

mikkel1 · January 5, 2025, 12:38am

Oh and what if i want the last 5 minutes for example. I can see that the last minute in my data is never really a full minute, as I will almost always stop recording data in the middle of a minute.

EDIT - Solved:
If I want to take the last 5 min i just add a dateDiff check.
last_5_min_deviation_count:

count(
    ifelse(
        dateDiff(truncDate('MI', time), latest_minute_by_patient, 'MI') <= 5
        and g_deviation > g_deviation_threshold,
        time,
        NULL
    )
)