How to get a difference of unique strings in different time periods

william.h.wade · July 26, 2024, 9:07pm

I have AWS billing data for my Org imported into a Quicksight dataset.

over 300 AWS accounts worth of data
each row of data represents a single resource billed for a single time period

I want to be able to see when new services are billed for in a month compared to the previous month. The goal is to have a dashboard that can flag when a new service has been turned on. I’ll eventually use parameters and filtering to see this globally, and in individual accounts.

My trouble is figuring out the best method for this.

A simple solution is a pivot table of Billing Period vs Service showing a distinct count of services per period. But this is just a table that shows if a service exists in each billing period month. I can tinker with conditional formatting to highlight changes from 0 to 1, but this is not ideal.

In SQL, I think this could be done by creating a query that generates a table in memory to represent the list of unique service names for each time period.

If I did this manually, I’d get a list of unique services in each consecutive billing period, and output a list of the differences. I’d also like to be able to parameterize the two billing periods to be able to show this difference between non-connected months.

DylanM · July 29, 2024, 9:56pm

Hello @william.h.wade, there are a few things that may help you achieve what you are looking for.

The simplest way to know if a new service has been added would be to figure out the current baseline. Let’s say you currently have 12 services, then you could add a function like distinct count of services to either a KPI or table visual. These are 2 visual options that allow you to set threshold alerts.

Since you know you currently have 12 distinct services, you could set an alert for when that number exceeds 12. This would be a way to not only track the number of services but receive a notification when the number increases. That will likely be the best way to achieve what you are looking for, let me know if this helps!

william.h.wade · July 30, 2024, 1:37pm

The problem here is that I want to see differences between this month and last month, and to be able to see this for any given pair of dates. In other words, my “baseline” is last months count of services.

I know there are ‘aggregate over period’ functions in the custom calculations that I could do this with a simple count of services, but this wouldn’t tell the whole story. Within AWS, there are hundreds of services, and if one account turns on a new service at the same time as another account turns off the last usage of another service, then my count of services does not change, and it goes under the radar. That’s why I was hoping to devise a method to compare the list of distinct services per date period.

DylanM · July 31, 2024, 3:39pm

Hello @william.h.wade, in your data, is there any kind of field like start date and end date that tracks when an AWS service was opened or closed? A field that like would help ensure you are only targeting services that have changed.

william.h.wade · August 1, 2024, 5:31pm

There’s not a start / stop date, BUT here’s what I figured out :

Individual rows contain a single hour’s charge for a single cloud resource.
Columns include amortized_cost, service, and a billing_period date column.

I created a Pivot Table and a few calculated fields to pull this together :

This calculates the oldest date being shown in the pivot chart and is used for filtering logic below

billing_period min :=
min(minOver({billing_period},[],PRE_AGG))

This calculates the cost from the previous date period on the Pivot table

AmC - last billing_period := 
lag(sum({amortized_cost}),[{billing_period} ASC],1,[service])

This calculates the absolute difference between the current period and the last, or outputs a 0 if null

AmC - abs difference to previous period := 
ifelse(
    isNotNull(abs(sum({amortized_cost}) - {AmC - last billing_period})),
    abs(sum({amortized_cost}) - {AmC - last billing_period}),
    0
    )

This messy calculation produces the percentage difference between current and last period
This can be cleaned up for efficiency, but the rules prevent bad behaviors due to 0 and null values, especially on the first period shown on the pivot.

AmC - percent difference := 
ifelse(
    min({billing_period}) = {billing_period min},
    0,
    ({AmC - abs difference to previous period} / {AmC - last billing_period}) <> 0,
    {AmC - abs difference to previous period} / {AmC - last billing_period},
    {AmC - last billing_period} = 0 AND {AmC - abs difference to previous period} <> 0,
    {AmC - abs difference to previous period} / 1,
    isNull({AmC - last billing_period}),
    sum({amortized_cost}) / 1,
    0
    )

Then I have some conditional formatting and sorting to highlight and bubble up outliers. This isn’t the best or cleanest solution to the problem, but it’s our MVP for now.