The Q topic indexing has been going on for more than 8+ hours for one of my datasets. While the dataset is huge - 589 GB, it is in SPICE.
Is this data size not recommended for a Q topic? I have tried optimizing my dataset with deselecting some columns and show value fields in suggestions for most.
Hi @Sanjay1 and welcome to the QuickSight community!
While a Q topic may be able to accept that size of a dataset, it may not be the most practical way to utilize as Q creates a topic index to generate answers to your questions. So a dataset of that size will take much longer to ingest.
As there’s no expected indexing time table based on dataset sizing, it’s hard to say what the expected time is but if it’s taking that long, other topic updates made may lead to a large loading time as well.
I would suggest that the best practice in this case would most likely be to try and remove the fields that you won’t be using in the Q topic from the dataset prior to adding to a topic.
So while creating the dataset for the topic, I should exclude the not required cols and then build the dataset and from it the topic. Is my understanding correct?
Hi @Sanjay1,
Yes correct. Depending on the number of fields being removed, this should reduce your loading time. Although it’s hard to say by how much without testing.
In regards to that feature, I believe it’s been moved to a different spot, the functionality may be a bit different as well. Once you ask a question, you’ll receive a bar underneath where you can edit the fields, see below:
Reducing the number of columns prior to dataset creation did improve the performance.
Additionally, I have a question regarding the incremental loading of our dataset, which occurs either monthly or quarterly. I’m wondering about the impact this might have on the q topic indexing, specifically - Will the indexing of the q topic refresh every time after each incremental load?
Hi @Sanjay1,
I believe that you may need to refresh the dataset within the Q topic which can be done from the section shown below (You can also prompt the refresh from here). Once updated, there should not be any additional steps needed to index the data as well.