Self-joining or self-modifying dataset?

sgadsby · July 17, 2024, 7:33pm

Good day.

I’m new to QuickSight, coming from DOMO, and I’m trying to find documentation or an example showing how to log the daily updates of one dataset to another dataset. I’ve been reading and searching QuickSight docs, but without knowing the proper QuickSight terminology, I’m probably skipping right past the section of the Furnished Manual I should Read. I’d really appreciate a friendly pointer in the right direction.

My situation is that I’m given in QuickSight a dataset that is, effectively, a daily snapshot. The dataset contain rows of items and each item’s current status.

Something like:

ItemId, IsInStock
ItemA, 0
ItemB, 1
ItemC, 0

The dataset is a snapshot, without any historical data in it, but what I need to report is, sure enough, changes over time. In DOMO I created a recursive dataset that joined itself with the daily snapshot, identified snapshot rows with new items or items with changed status, and then appended those identified rows to itself while logging the current date to those rows in a third column.

QuickSight won’t let me join a dataset within itself, and I’m not seeing how to otherwise build a self-modifying dataset. I’d love guidance toward whatever obvious solution I’m overlooking.

My access to QuickSight (and previously DOMO) is through another, non-Amazon service. As a result, I have no ability to use Athena, S3 buckets, or even the CSV/Excel file upload feature of QuickSight itself.

I thank you for reading this and for any help you can provide me.

DylanM · July 17, 2024, 10:20pm

Hello @sgadsby, welcome to the QuickSight community!

I may need a little more information about how the datasets are being created or what resource you are connecting to QuickSight to assist you further, but I can try to lead you in the right direction. If you are able to create 2 different datasets in QuickSight, you could use the join functionality to Inner, Left, Right, or Left join them with a matching unique key.

Alternatively, you could utilize custom SQL in QuickSight to build out a dataset from your datasource/database connection.

There are a few options to manipulate the data you are bringing in. I hope this helps!

David_Wong · July 18, 2024, 1:25am

Hi @sgadsby,

Select the “Use in a new dataset” option for your dataset.

This will create a new child dataset that references your parent dataset.

Click on “Add Data”, select “Dataset” and select the same parent dataset.

Your child dataset now contains a parent dataset that joins with itself. All you have left to do is add your join clauses to finish creating the join.

Hope this helps!

sgadsby · July 18, 2024, 12:57pm

Hi, @DylanM.

The datasets are coming in from an Athena data source. Clicking “Datasets”, “New dataset”, and then my only available data source reveals that much. It displays, “Database name ATHENA”.

sgadsby · July 18, 2024, 2:07pm

Hi, @David_Wong.

But in your example, both sides of that join—both “Business Review” and “Business Review (2)”—bring in the data the Business Review currently holds, right? I need to compile historical information about a dataset that doesn’t keep its own history.

Let’s say on January 1, the Business Review dataset contains three rows:

ItemA, 0
ItemB, 1
ItemC, 0

On January 2, the Business Review dataset still contains three rows, but with some changed data:

ItemA, 0
ItemB, 1
ItemC, 1

On January 3, there’s a new row and data in an existing row has changed again:

ItemA, 0
ItemB, 1
ItemC, 0
ItemD, 1

Each day the data in the Business Review dataset is completely wiped and rewritten. It’s a snapshot, an “always been at war with Eastasia” dataset.

I need to compile from those daily snapshots a Business Review Change History dataset that remembers. A dataset that each day logs items that are new or had their status changed. So, after the Business Review dataset’s January 3 update, my change history dataset should contain something like:

ItemA, 0, Jan 1
ItemB, 1, Jan 1
ItemC, 0, Jan 1
ItemC, 1, Jan 2
ItemC, 0, Jan 3
ItemD, 1, Jan 3

Does that help? I don’t think I can accomplish that by joining Business Review with itself, but I’ll be very grateful if you can show me how I can!

Thanks!

DylanM · July 18, 2024, 4:19pm

Hello @sgadsby, if the datasource is connected to Athena, how are you storing each individual day’s worth of data? Are you partitioning the data into parquet format in S3 or something similar? If Athena is querying data from S3, you should be able to merge the rows for each day within the Athena editor into a view so you can update a single dataset every day in QuickSight.

There really isn’t an alternate way to handle this functionality from the QuickSight console directly. Let me know if this helps!

sgadsby · July 18, 2024, 4:52pm

Thanks, @DylanM, but all of what you’ve described is out of my hands. The SaaS vendor controls all that.
Thank you for trying to help me though! I’ll keep experimenting and maybe, eventually I’ll find some way around the limitations.

David_Wong · July 18, 2024, 5:04pm

@sgadsby
I was just addressing the “self-join” part of your question. As @DylanM explained, there isn’t really a way to keep historical data in your dataset if that’s not what’s coming from your source.

sgadsby · July 18, 2024, 5:28pm

Thanks, @DylanM and @David_Wong. I’ve asked my vendor for help with this.

~~What is community practice? If the answer to a “How to…?” question is “You can’t”, does that “You can’t” count as the “Solution”?~~