Dataset performance

I have a few questions regarding dataset performance. If my dataset uses SPICE and if I’m joining 2 database tables, is there any difference in performance between selecting the 2 tables from my data source and joining them using the UI, or creating the join in a custom SQL query?

My other question is about adding filters in a dataset. I didn’t think there would be any difference in performance between adding the filters using the “Filters” pane and adding the filters in a WHERE clause in a custom SQL query but I found it made a significant difference in one of my datasets. With the filters added to the “Filters” pane, my dataset took much longer to refresh than with the filters in a WHERE clause. I was wondering why it’s different.

Hi,

There are two ways to you can use, first is that you can create a view in database based on two table joins.

for the second query, you can review your indexes in db for those column using as filter in Query.

I hope both will work for you.

Regards,
Naveed Ali

One other comment about “selecting the 2 tables from my data source and joining them using the UI, or creating the join in a custom SQL query?” If you select the two tables in the UI, then if someone add new fields to either/both of these tables in the data source, then QS will AUTOMATICALLY see those new fields. However, if you use a custom SQL query, QS will never see those new fields, not unless you come back and edit the custom SQL.

Tradeoffs! we use both. We also have datasets where we select tables and use the UI to join to custom SQL queries.

All of this is for direct queries style of dataset. When feeding a Spice dataset, when using the first approach, and someone adds columns to your tables in your datasource, your spice dataset will grow in size even if it retrieves the same number of rows.

Also just to add some more to what has been said - if you create a view, assuming you’re using Redshift, when you modify the view (to add or change data types) you have to republish the view.

I am currently testing out the same and what I see is that the granularity and what visualizations you are trying to build determine how you need to structure the datasets. For example, if your visualization is counting sum of dollars per customer but you want to display customer name in addition to displaying the count of distinct products. Customer Name is on some customer DIM, dollars are on a FACT, line item is on a different FACT. Because QS creates the SQL queries on the fly (for direct query) the SQL it is generating might actually not be efficient. If you are using Redshift, they have the ability to show the SQL requested and you can see there are a lot of temp tables and aggregations and such. You would need to determine from the SQL generated by QS which is likely more performant.

Other than what has been mentioned, I guess what I was really trying to find out is if QuickSight does any kind of performance optimization when we create the dataset by selecting each individual table in QS itself as opposed to writing a custom SQL query. Am I missing out on any performance optimization in QS by writing custom SQL queries?

Hi @David_Wong, I don’t believe there is any difference in performance optimization when using custom SQL queries. However, one thing to keep in mind is that you can refresh data in one dataset and not the other if you join using the QuickSight UI instead of using custom SQL.