Optimizing Speed with Custom Chat Agents

Hello. I am currently developing a custom chat agent that will be able to answer questions using SQL-like logic for SPICE datasets. I have 3 SPICE datasets connected to the chat agent via a Quick Space, with a data dictionary describing each dataset also existing within the space (3 in total, as well). While I’m happy with the accuracy of the chat agent so far, the speed is leaving a lot to be desired with the supposedly faster query engine in SPICE. While things like limiting the datasets or reducing the amount of documents in the Space could be options for us, it’s unclear to us how this chat agent could be taking 30-40 seconds to answer questions with SPICE datasets when we will be looking to eventually connect Redshift and API calls to other websites within our chat agent’s ecosystem. I would be happy to explain more about our product if anyone is willing to help. Thanks!

Thank you for your feedback @lucasjacob23. I have flagged this for our experts to look at in the AM. In the meantime, I asked Quick for ideas and got this reply ( @brett could you review this and validate):

"A few things that may help diagnose and improve the 30-40 second response times:

1. Understand where the latency is coming from

The end-to-end time includes several steps: NL→intent parsing, topic/dataset selection, query generation, SPICE query execution, and response formatting. SPICE itself is typically fast for well-structured queries, so the bottleneck is more likely in the orchestration layers (topic routing, data dictionary lookup, query planning) rather than raw SPICE execution time. If you can, check CloudWatch metrics for your Q Topic to see where time is being spent.

2. Data dictionary optimization

Having 3 separate data dictionaries in the Space is useful for accuracy, but the agent has to parse and reason over all of that context on every question. A few tips:

  • Keep descriptions concise and focused on disambiguation (help the agent pick the right dataset quickly)
  • Remove redundant or overly verbose field descriptions that don’t aid in query generation
  • Consider whether all 3 datasets truly need to be candidates for every question — if you can segment by topic, that reduces the search space

3. Dataset structure matters

  • Pre-aggregate where possible — if the agent frequently does GROUP BY on the same dimensions, a summary SPICE dataset will return faster than computing aggregations at query time
  • Minimize wide tables (many columns) — the agent spends time reasoning about which columns to use
  • Ensure your SPICE datasets are refreshed and not in a degraded state

4. Setting expectations for Redshift/API expansion

When you add Redshift and external API sources, you’ll want to think about this layered:

  • SPICE for high-frequency, known-pattern questions (fastest)
  • Redshift for ad-hoc/complex analytics (expect additional latency from direct query)
  • APIs behind Lambda or similar for real-time external data

A pattern that works well is pre-computing common answers into SPICE from Redshift on a schedule, reserving direct Redshift queries for truly ad-hoc exploration.

5. Consider splitting into multiple focused Topics

Rather than one agent that routes across all 3 datasets, you could create focused Topics with narrower scope. This reduces the “reasoning overhead” per question and can noticeably improve response time."

Hi @lucasjacob23 ,

I am glad to hear that the chat agent is able to pull in accurate data from your datasets. In terms of making it perform faster, one thing you can do is change the model of the chat agent to ‘Fast’ as shown below:

If this is not already selected, then this could potentially save some time. Restating what Kristin has previously stated, trying to ‘optimize’ your data dictionary could potentially help as well.

But I should also state that speed of how the chat agent ingests your dataset will only go so far depending on how your dataset is structured and how big of a dataset it is, then in your case, make that three datasets. When you ask the chat agent a question, it goes through a series of steps and actions to not only understand what you are asking but how best to answer it. If you have three relatively complex datasets that the chat agent has to sort through, it will take time even if the datasets are in SPICE. Then from what I can tell, the chat agent essentially runs a small sql query against your dataset to pull in the relevant data to answer your question. You can try to cut corners a bit by specifying which dataset your answer should come from in your prompt so the chat agent does not waste time trying to figure out what dataset to use. But, in terms of processing speed, there is not much we can do.

That being said, the chat agents and the models behind it are constantly being worked on so while you can get an accurate answer in 40 seconds now, it may become faster in the future.

Thank you @Kristin and @JacobR for your advice. A couple of things I’ve noticed so far from a week of trying to change the infrastructure:

  • The Fast model as opposed to the Smart or Complex model is not that great for the questions we’ve been asking of it, even simple ones. At the end of the day, accuracy is most important, and we have not found that to be very reliable.
  • Focusing a chat agent on one data set has helped the latency, but only by a couple of seconds. I notice that there are fewer reasoning events happening during the process of question answering, but simple questions still take around 20-25 seconds to be answered, which is unlike other LLMs. I know the models that Quick’s chat agents use aren’t public, but I am still interested to know whether this latency is a model issue, a SPICE issue, or something else.
  • Our dataset is only around 350,000 rows at an aggregated grain, so even though we have around 25 fields, we don’t feel that it would be that crazy of a lift to answer questions in 10-15 seconds as opposed to the 20-25 seconds we are seeing (and then up to 20 seconds longer for questions that required further math to be done or lots of filtering and grouping).

Thank you for your help thus far, and I hope that we can continue improving the product together for future use through this feedback.

Hi @lucasjacob23 ,

We appreciate the feedback so thank you for sharing the additional steps you took to test out. As I am not an AWS employee, I would be unable to confirm definitively if this is a model issue or something else, however I will do my best to pass along the feedback to the proper team.
As I mentioned above, AWS is continuously working to enhance their Agent along with it’s capabilities, speed and accuracy.

To bring additional attention to this matter, I will mark this as a feature request as well.

Fair enough. Thank you for pushing this along and helping us out.