Line chart by date, comparing the same calculated field (different date range)

HarveyB-B · February 27, 2024, 10:00am

Each case has a predicted default rate. I have a chart which shows the count of cases by month, grouped by credit quality, with a line showing the monthly predicted default rate. The one thing this chart doesn’t show is how the overall predicted default rate (portfolio default rate) is affected by the monthly default rate.

I want to create a line chart which by date, compares the monthly predicted default rate, against the portfolio predicted default. The problem is that they are the same field ‘PD’. I just want one to be based on month, and one to be based on all-time. How can i do this?

DylanM · February 27, 2024, 5:48pm

Hello @HarveyB-B, how are you calculating your default rate value? It seems like the best case scenario to implement this would be to utilize the Reference Line visual option and make it a calculated line.
ReferenceLineOptions

That should allow you to add those default values to your visual. I’ll link some documentation as well!

HarveyB-B · February 29, 2024, 10:31am

So i have looked into this, but i do not want the line to be straight. What we want to see is how the month by month pd affects the overall pd over the year.

So i would have 2 lines:
Average Month by Month
Average Overall

I guess you could see it as a runningaverage line? The main idea is that we want to see WHERE the increase/decrease is, so we can see what happened that month and why it dropped/increase.

Scenario (based on we starting jan 2024):
Jan : 6% default rate
Feb : 3% default rate
March : 4% default rate

I would want to see 2 lines, one showing the 6%, for jan, 3% for feb etc. But also a line for Overall AVG jan 6%, feb 4.5%, march 4.3%

I hope this helps, and i understand that this may not be possible

DylanM · February 29, 2024, 4:21pm

Hello @HarveyB-B, I believe you should be able to make this work. Do you have calculations already built out in your analysis to handle both of the averages you want to display? I tested this out on my end, but the data is definitely a bit different, but it seems like you could do one line that is an avgOver function to get the average month by month, then maybe a runningAvg function to get the Average Overall. Then you should be able to add both calculated fields to the Value field well of your visual.

The only thing I am not certain of is if the runningAvg function will give you the values you are expecting. I would try building it out like this to see what you are getting:

Overall Average = runningAvg(
  sum({portfolio default rate}), 
  [truncDate('MM', {Date}) ASC], 
  []
)

Here I leave the partition field empty so it continues to reference the full dataset of values. Then maybe your other function would look something like this:

Average Month by Month = avgOver(
{portfolio default rate}, [truncDate('MM', {Date})], PRE_AGG
)

Play around with those and let me know if you are able to get the result you are expecting or at least close to it. Thank you!

DylanM · March 5, 2024, 4:45pm

Hello @HarveyB-B, since we have not heard back from you, I will mark my previous response as the solution. If you have any follow-up questions on this topic, please let me know. Thank you!

HarveyB-B · March 7, 2024, 9:48am

Hi there, thanks for the reply. I am a bit confused with this mainly because when i translate to my quicksight, i get many errors. What i will do is provide the current field i am using to work out month by month PD:

avgCreditScore:
avg(creditscoreint)

PD:
1-
(
exp(((avgCreditScore) - 600)/80)
/
(1+exp(((avgCreditScore) - 600)/80))
)

And then all i do is add PD to stacked bar combo chart as the line, which currently shows the month-by-month PD.

So in theory all i need is the runningAvg field, which when i try to use your example i get the error:

(What my field is):
runningAvg(
sum({PD}),
[truncDate(‘MM’, {processing_date}) ASC],

)

What i get:
Nesting of aggregate functions like {{aggregateFunction1}} and {{aggregateFunction2}} is not allowed.

processing_date is just a date field of when that case was processed, so i see no reason as to why this shouldnt work other than ‘PD’ is a calculated field so may not like it?

DylanM · March 8, 2024, 9:02pm

Hello @HarveyB-B, I see, if your PD field is a calculated field then it will not be accepted in the runningAvg calculation due to nesting aggregations. At this point, my recommendation would be to create your PD field within custom SQL on your dataset. You could try to create the calculated field in your dataset layer and implement the running average on the analysis, but I am not certain that would bypass the issue. That is the most likely way to resolve this without updating the data upon ingestion through SQL or within your datasource.

Once that nested aggregation error is resolved, the runningAvg should give you the result you are looking for. I hope this helps!

HarveyB-B · March 12, 2024, 9:46am

Hi there, thanks for the response.

I played around and basically created all the calculated fields inside of the dataset, so i no longer have the issue. The one thing is that it takes the month-by-month average, and averages that out across the year/months to create this ‘Overall Average’, rather than case by case average which is what i need to show.

How can i adjust this so it gets the ‘Overall PD’ which would be the runningAVG of each case PD opposed to running average of month by month?

Hope this makes sense?

DylanM · March 12, 2024, 3:26pm

Hello @HarveyB-B, to accomplish that you would want to play around with updating the sort order field.
You could try converting to day on the truncDate, leaving date field as it is, or utilizing something like an Order ID that would specify each individual interaction

Overall PD= runningAvg(
  sum({portfolio default rate}), 
  [truncDate('DD', {Date}) ASC], 
  []
)

Overall PD= runningAvg(
  sum({portfolio default rate}), 
  [{Date} ASC], 
  []
)

Overall PD= runningAvg(
  sum({portfolio default rate}), 
  [{Order ID} ASC], 
  []
)

One of the options above should get you closer to a running average on each distinct predicted default rate.

HarveyB-B · March 14, 2024, 3:57pm

Hi again, thanks for the response.

So trying these out it still isn’t providing the number i am looking for - and i am coming across errors aswell for one of them.

I think a reason is that it doesn’t take the count into perspective, but i could be wrong?
For example, month1 could have 3000 cases with an average pd of 4.5% for that month. If the next month is 10 cases with an average of 6.5%, it then brings that average to:
(4.5% + 6.5%) / 2 = 5.5%.

So although only 10 cases were included in the second average, the impact looks major. When in reality its only 10 cases so the impact would be little.

Edit: i think i need to do the average case by case? So get all the cases for month 1, average it out. Month two should be sum of PD for each case in month1 + 2, divided by the count of cases month1 + 2. And so on?

Replicating this may be hard (or not possible) but i think it should provide the correct number?

HarveyB-B · March 18, 2024, 12:23pm

Hello, I have provided an arena with dummy data which correlates to my data. All it is is the caseID / date created / creditscore / borrow amount, and using the credit score i created a calculated field to workout the PD. This should be the data fields used to workout this runningaverage but by case, by date.

mockdata

DylanM · March 20, 2024, 3:07pm

Hello @HarveyB-B, I see. My final thought on this to manage your expected solution in QuickSight calculated fields rather than building a custom dataset in SQL would be to utilize a calculated field that uses sumOver functions to aggregate the average value for each month taking into account the count, then using that within a runningAvg. This is the most likely way to achieve what you are looking for. I am assuming Casereference determines each case, so I will use that in the count portion. Change that value if needed.

It would look something like this:

Overall Average = runningAvg(
sumOver(sum({Portfolio Default Rate}), [truncDate('MM', {Order Date})])/countOver(count({Casereference}), [truncDate('MM', {Order Date})]), [truncDate('MM', {Order Date}) ASC], [])

This was able to run within my QuickSight account without hitting errors. Basically it will determine an average value based on the number of cases per month and then that will be used within the running average. This should get you closer to your expected output.

If you continue to run into issues with this calculation, my final recommendation is to build out the runningAvg function within a SQL query. That will give you more control without running into aggregation issues.

I will mark this as the solution and archive this topic. If you still require assistance with this topic, please post a new question in the QuickSight community and link to this topic. That will ensure you are at the top of the priority list for a response from one of our QuickSight experts. Thank you!