This post covers some best practices to reduce the time it takes for a QuickSight Q topic to index.
First, let’s define what a QuickSight Q topic index is. When you create a topic, Amazon QuickSight Q creates, stores, and maintains an index with definitions for data in that topic. The topic index is an index of unique string values for fields included in a topic. Q uses this index to generate correct answers when there are cell values mentioned for filters, e.g. “sales for Amazon” implying a filter on sales = 'Amazon.com, Inc.", provide autocomplete suggestions when someone asks a question, and suggest mappings of terms to columns or data values.
Read more here: Refreshing Amazon QuickSight Q topic indexes - Amazon QuickSight
There are a few elements that contribute to indexing time:
- Size of the dataset in terms of the number of indexable columns (dimensions, not measures) and the number of rows
- If row-level security (RLS) is enabled for the dataset
- Type of dataset (SPICE vs. Direct Query)
- Cardinality of the enabled fields
Here are some suggestions or best practices for each element above.
Size of the dataset
Consider creating a copy of your dataset that is curated for the topic’s specific use case. If the topic is only covering one area of the business, consider making a copy that removes unnecessary fields/rows. Reducing the available time range can also help if applicable for the topic business use case. For example, the dataset might contain 4 years of historical data that is used in the dashboard, but the topic is only needed for looking at the last year of data.
Row-level Security (RLS)
RLS is a requirement based on the business use case and data security standards. Adjusting your rules so they are more streamlined is a best practice to help reduce indexing time. The indexing performance depends on the number of unique values for the fields in the main dataset that are part of RLS rules and number of columns in the rules table. So if you include a high cardinality column in the rules table like user and there are many users, this will increase the time. Consider adding your users to user groups and settingPreformatted text rules for the highest level like department or job function.
Type of Dataset
We recommend using SPICE instead of Direct Query. For Direct Query, QuickSight does not have any control over how long it takes to extract the data. SPICE has an efficient export implementation and stores data in the robust in-memory engine that is built to serve data more rapidly. Read more here: Importing data into SPICE - Amazon QuickSight
Cardinality of the Fields
As mentioned in the definition at the top, the topic index is an index of unique string values for fields included in a topic. Since Q needs to create and store a copy of each string value for each enabled dimension, one way to reduce indexing time is to consider removing unnecessary high-cardinality fields. For example, if you have a Product Name and Product ID field, but for the topic readers will only want to see Product Name, you can remove Product ID to eliminate the need for every product ID to be indexed. In general, it is a best practice to only include the fields that are relevant to the specific topic use case to minimize lexical overlap. Read more here about best practices: Best practices for enabling business users to answer questions about data using natural language in Amazon QuickSight | AWS Business Intelligence Blog
Note: If you did want both Product Name and Product ID to be used for answering questions but will only need fuzzy matching of cell values on product name then you can use disableIndexing = true to have Q skip this field for the sake of indexing; the field will otherwise still be included it just will no longer have values suggested in autocomplete, nor fuzzy matching for mentions for filters. Read more here: TopicCalculatedField - Amazon QuickSight
You can also achieve this right from the topic authoring page by unchecking the “Show field values in search suggestions” setting.