How to handle 7 unrelated CSV datasets for Natural Language Querying in chat agents in QUICK SUITE?

Hi everyone,

I am building a generative BI solution using Amazon QUICK SUITE chat agents to allow users to ask questions about production data (Demand, Transport, Machine Utilization, etc.).

The Technical Setup:

  • Data Source: 7 distinct CSV files from a Gurobi optimization engine.

  • The Challenge: These tables are completely unrelated. There are no Primary Keys or Foreign Keys to join them.

  • Header Overlap: Some tables share similar header names (e.g., both “Transport” and “Demand” might have a “Date” or “Region” column), but the data points represent different parts of the optimization process and cannot be merged into one flat table.

  • Previous Attempt: We tried Athena, but struggled with the requirement of creating a single unified view/dataset for the chat agent.

Questions for the Community:

  1. Best AWS Service for Staging: Given these are 7 independent CSVs, is it better to keep them in Athena (S3), or move them to RDS or Redshift? Which service allows QuickSight Q to best distinguish between unrelated tables?

  2. Handling No Keys/No Joins: Since we cannot join these tables, can a single QuickSight Q Topic manage 7 independent Datasets simultaneously?

  3. Intent Routing: How does the Q engine decide which table to query when headers are similar (e.g., “Show me demand for Region A” vs “Show me utilization for Region A”)?

  4. Data Prep Best Practices: Are there specific ways to use Synonyms or Field Descriptions in QuickSight to “force” the agent to the right table when no relational schema exists?

We want to ensure that when a user asks about “Machine Utilization,” the agent doesn’t try to pull data from the “Transport” table just because they share a “Date” column.

Any architectural guidance would be greatly appreciated!

Hi @juan.rivera,
This question just came up in another topic as well, feel free to review here:

  • In reference to your Q topic question. While you can create a topic and link multiple datasets to the same topic, I will say that it’s not always the cleanest functionality, especially when dealing with a scenario where there are ‘like’ fields. There is a section in Topics to add ‘custom instructions’ however so you test out writing in specific rules about how/when each dataset should be considered.
    So for instance, when handling a question about that refers to ‘utilization’, use the ‘utilization dataset’.

The new Agentic AI features assist in scenarios like this though as you now have the ability to create spaces, which can be used to store data from multiple datasources, files and other assets. Then you can link a Chat Agent up to that space so that it can draw answers from the respected datasource.

Hi @juan.rivera,
Following up here as it’s been awhile since we last heard from you on this thread; did you have any additional questions or did the previous response assist with your answers?

Let us know if there is anything further we can assist with in relation to your initial post. If we do not hear back within the next 3 business days, I’ll mark the solution.

Thank you

Hi @Brett, yes thank you very much. Your response was very helpful in solving our issue. We opted to use 7 different Athena Tables and creating 7 different Topics.

Thank you so much for your assistance