I’m new to Quick Sight, coming from DOMO, and I’m trying to find documentation or an example showing how to log the daily updates of one dataset to another dataset. I’ve been reading and searching Quick Sight docs, but without knowing the proper Quick Sight terminology, I’m probably skipping right past the section of the Furnished Manual I should Read. I’d really appreciate a friendly pointer in the right direction.
My situation is that I’m given in Quick Sight a dataset that is, effectively, a daily snapshot. The dataset contain rows of items and each item’s current status.
Something like:
ItemId, IsInStock
ItemA, 0
ItemB, 1
ItemC, 0
The dataset is a snapshot, without any historical data in it, but what I need to report is, sure enough, changes over time. In DOMO I created a recursive dataset that joined itself with the daily snapshot, identified snapshot rows with new items or items with changed status, and then appended those identified rows to itself while logging the current date to those rows in a third column.
Quick Sight won’t let me join a dataset within itself, and I’m not seeing how to otherwise build a self-modifying dataset. I’d love guidance toward whatever obvious solution I’m overlooking.
My access to Quick Sight (and previously DOMO) is through another, non-Amazon service. As a result, I have no ability to use Athena, S3 buckets, or even the CSV/Excel file upload feature of Quick Sight itself.
I thank you for reading this and for any help you can provide me.
Hello @sgadsby, welcome to the Quick Sight community!
I may need a little more information about how the datasets are being created or what resource you are connecting to Quick Sight to assist you further, but I can try to lead you in the right direction. If you are able to create 2 different datasets in Quick Sight, you could use the join functionality to Inner, Left, Right, or Left join them with a matching unique key.
Alternatively, you could utilize custom SQL in Quick Sight to build out a dataset from your datasource/database connection.
There are a few options to manipulate the data you are bringing in. I hope this helps!
The datasets are coming in from an Athena data source. Clicking “Datasets”, “New dataset”, and then my only available data source reveals that much. It displays, “Database name ATHENA”.
But in your example, both sides of that join—both “Business Review” and “Business Review (2)”—bring in the data the Business Review currently holds, right? I need to compile historical information about a dataset that doesn’t keep its own history.
Let’s say on January 1, the Business Review dataset contains three rows:
ItemA, 0
ItemB, 1
ItemC, 0
On January 2, the Business Review dataset still contains three rows, but with some changed data:
ItemA, 0
ItemB, 1
ItemC, 1
On January 3, there’s a new row and data in an existing row has changed again:
ItemA, 0
ItemB, 1
ItemC, 0
ItemD, 1
Each day the data in the Business Review dataset is completely wiped and rewritten. It’s a snapshot, an “always been at war with Eastasia” dataset.
I need to compile from those daily snapshots a Business Review Change History dataset that remembers. A dataset that each day logs items that are new or had their status changed. So, after the Business Review dataset’s January 3 update, my change history dataset should contain something like:
ItemA, 0, Jan 1
ItemB, 1, Jan 1
ItemC, 0, Jan 1
ItemC, 1, Jan 2
ItemC, 0, Jan 3
ItemD, 1, Jan 3
Does that help? I don’t think I can accomplish that by joining Business Review with itself, but I’ll be very grateful if you can show me how I can!
Hello @sgadsby, if the datasource is connected to Athena, how are you storing each individual day’s worth of data? Are you partitioning the data into parquet format in S3 or something similar? If Athena is querying data from S3, you should be able to merge the rows for each day within the Athena editor into a view so you can update a single dataset every day in Quick Sight.
There really isn’t an alternate way to handle this functionality from the Quick Sight console directly. Let me know if this helps!
Thanks, @DylanM, but all of what you’ve described is out of my hands. The SaaS vendor controls all that.
Thank you for trying to help me though! I’ll keep experimenting and maybe, eventually I’ll find some way around the limitations.
@sgadsby
I was just addressing the “self-join” part of your question. As @DylanM explained, there isn’t really a way to keep historical data in your dataset if that’s not what’s coming from your source.