I’m trying to figure out the best way to join 2 different datasets together that have different date granularities. I will try to use an example that is as simple as I can make it.
Let’s say I have one table called Price that is stored in S3. It has 2 columns, Week Ending Date and Price. There is one row for each week.
Week Ending Date
Price
5/20/2023
127
5/27/2023
99
6/3/2023
103
Let’s say I have a second table called Pounds that is stored in S3. It has 2 columns, Daily Date and Pounds. There is one row per day.
Daily Date
Pounds
5/16/2023
376
5/17/2023
273
5/18/2023
398
5/19/2023
245
5/20/2023
654
5/21/2023
432
5/22/2023
383
5/23/2023
327
5/24/2023
365
5/25/2023
146
Note that both of these datasets have data that is changing weekly/daily and being updated in its S3 location. Both tables in reality have many columns of information, but just wanted to scope it down and give a simple example.
In QuickSight, I want to analyze Price per Pound at the weekly level. So I need the price for a particular week (1 row) / the sum of the pounds for a particular week. I’d also want to go higher up and do a monthly, quarterly, and yearly Price per Pound analysis.
How do I create the correct dataset in QuickSight that merges these 2 tables together and create the measures I’m looking for?
I would create one dataset for Price. When you convert day to week, QuickSight uses the first day of the week. For that reason, I would calculate Week Starting Date in your Price dataset.
I would then create another dataset for Pounds and in this dataset create a calculated field.
Week = truncDate(“WK”, {Daily Date})
I would then join the 2 datasets using the week. I’m not sure about the performance when joining using a date column. If performance is an issue, you can create an integer date key in your Price and Pounds dataset (e.g. 20230903 instead of 9/3/2023) and join using that instead.
Because of your join, the same Price value will show for every day of the week. Depending on what you want to calculate in your analysis, you may have to use LAC to handle the duplicates.
If you want to do calculations at the month, quarter or year level, you should be able to do that using the same joined dataset by just changing the granularity in the analysis itself.
Thank you very much for your help! So I have attempted to put together my datasets and now I’m running into issues/errors with trying to get my calculations correct. I keep running into LAC and aggregation type of errors.
Here is what my table of data looks like. I am trying to figure out how to get my “Lbs per CF (H2O)” calculated field to be correct. It should equal the sum of Total Pounds for a given week divided by the H2O Used for that week (single value). What would this calculation look like? What is shown in the table is not correct. The value should be something like 2.2 for the week of 9/2.
Appreciate your help on this! I keep getting stuck and I guess I can’t wrap my head around these LACs. I have a bar chart that I will put FormattedDateField across the x axis and my Lbs per CF calculated field as the value. FormattedDateField will change based on if the user chooses to look at the data at the daily, weekly, monthly, quarterly, or yearly level.
I can’t seem to get the calculation to work at the different levels.
For example, here is what my table looks like using this for Lbs Per CF (H2O). (Note I’ve also tried what I said worked above at the week level)
If I describe it in words, it should be the sum of total pounds for all the rows within the FormattedDateField level divided by the sum of the distinct rows of H2O for the same time period.
What makes your problem more difficult is that your “H2O Used” value is at the week level but you want to calculate “Lbs per CF” at different levels of granularity.
For example, let’s say you want to show monthly Lbs per CF. That means you also have to calculated monthly H2O Used. How would you do that for the month of August? 65162 + 33578 + 27330 + 36737 + 34259 doesn’t seem right. Jul 21 contributed to the 34259 and Sep 1 contributed to the 65162 but both of those dates are excluded from the month of August.
If your H2O values are at the week level, then it only makes sense to calculate Lbs per CF at the week level. If you want to calculate the ratio at different granularity levels, you need to start with H2O Used per day in your dataset.
Yes you are right. The H2O used reported on 9/2 is really actual data for the end of August. The client I’m working on this for understands this but still wants to analyze this data at weekly, monthly, quarterly, and yearly levels. They are ok with the issue you raised. They cannot report these meter readings at the daily level, only weekly. Is there any way I can accomplish this?
How would you do it for the month of September? If you have only 1 data point, what value would you use as denominator to calculate the ratio for September? 65162 divided by 7?
Can you try this?
Lbs per CF (H2O) = sum({Total Lbs}) / sum(min({H2O Used}, [{Week Ending}]))
Then put “Weekend Ending” and “Lbs per CF (H2O)” in your visual. As you change “Weekend Ending” to Month, Quarter and Year in your visual, the calculated value for “Lbs per CF (H2O)” should change.
Thanks for all your help with this. I couldn’t get this to work so what I decided to do was use Amazon Glue DataBrew on my 2 sets of S3 data and do an outer join so I didn’t end up with duplicate meter readings (weekly) for each pounds value (daily). So now my Lbs per CF (H2O) calculation is simply sum(Total Pounds) / sum(H2O Used).