Hi everyone,
I’m running into a confusing performance issue with SPICE dataset refreshes and hoping someone can shed some light on it.
I have two datasets, both sourced from Athena and ingested into SPICE:
- Dataset A — very small (~30KB), takes over 2.5 minutes to refresh
- Dataset B — much larger (~9MB), refreshes in under a minute
I would expect the smaller dataset to refresh faster, but it’s consistently the other way around. Both datasets are in the same AWS region and use the same Athena + S3 architecture.
Has anyone experienced something similar? What could cause a small dataset to take significantly longer to refresh than a much larger one?
Thanks in advance
Hi @Bar_Cohen
Welcome to the Quick Suite community!
The key insight is that SPICE refresh time is primarily driven by the source query execution, not the data transfer into SPICE. The 30KB vs 9MB output size is almost irrelevant compared to what’s happening upstream in Athena.
Athena Query Complexity: Even though Dataset A produces a tiny result, the underlying query might be doing expensive JOINs across multiple tables, using window functions, CTEs, or heavy aggregations, or running custom SQL in Quick Suite vs. a simple table select. Check the query behind each dataset, and a small output can come from a very expensive query.
Data Scanned in S3: Athena bills and performs based on data scanned, not data returned. Dataset A’s source tables could be:
- Stored as CSV/JSON instead of Parquet/ORC (columnar formats scan far less)
- Unpartitioned or poorly partitioned. Athena does a full scan even if only a few rows match
- Much larger underlying tables that get filtered down to 30KB of results
You can check this in the Athena console → Query History and compare the Data scanned for both datasets refresh queries.
Athena Concurrency & Queueing: If Dataset A’s refresh lands in a busy queue or hits DML concurrency limits in your account, it could sit waiting before execution even starts. Check if the 2.5 minutes includes queue wait time.
Calculated Fields in Quick Suite: If Dataset A has complex calculated fields defined at the Quick Suite dataset level (not in the SQL), those are computed during SPICE ingestion and can add significant time.
S3 File Layout (Many Small Files): If Dataset A’s source data in S3 consists of thousands of tiny files, Athena has high per-file overhead for listing and opening each one the small files problem. Dataset B might have fewer, larger files that are more efficient to scan.
To Diagnose:
- Check Athena Query History: Find the queries triggered by each SPICE refresh and compare execution time and data scanned.
- Run the queries manually in the Athena console to isolate whether the bottleneck is Athena or SPICE ingestion.
- Review S3 data format and ensure both use columnar formats (Parquet/ORC) with proper partitioning.
Hi @Xclipse , thanks for the detailed breakdown!
After reviewing all your suggestions, they don’t seem to apply to our scenario, both datasets are essentially identical in the characteristics you mentioned:
-
Both use CSV, which is a flat file format, so there’s no difference in file format or columnar optimization
-
Both have the same table structure, partitioning, and S3 layout
-
Both have comparable query complexity and calculated fields
-
The underlying data characteristics are the same across the board
So from our perspective, none of the points you raised explain the performance gap between the two datasets.
The only angle we haven’t been able to rule out yet is the Athena Concurrency & Queueing - could you point us to how we can compare the logs between the two dataset refreshes?
Hi @Bar_Cohen
Thank you for taking the time to verify all of those suggestions, that really helps us narrow down the investigation.
Since the dataset characteristics are identical across the board, let’s focus on the Athena concurrency and queueing angle. Here’s how you can compare the two refreshes:
- In the QuickSight console, go to Datasets → Select the dataset → Refresh tab and note the exact refresh times for both Dataset A and Dataset B.
- In the Athena console, go to Recent queries tab and filter by those timestamps. For each query, please compare:
- Queue time - how long the query waited before starting
- Execution time - how long the query took to run
- Data scanned - how much data Athena read from S3
If Dataset A shows a much higher queue time, that would point to a concurrency bottleneck.
Please refer to the following documentation, which might be helpful.
Hi @Xclipse ,
Thank you for continuing to dig into this with me.
Here are the stats for both datasets:
Dataset A: Refresh time ~2:45–3:30 minutes (unstable). Athena runtime: 625ms
Dataset B: Refresh time ~44–47 seconds. Athena runtime: 896ms
I’ve attached the Athena execution stats for both as requested.
The numbers make it pretty clear, both datasets have nearly identical and very fast Athena runtimes (under 1 second each), with negligible queue times (~112ms for A, ~90ms for B). So the concurrency/queueing angle seems ruled out as well.
The real gap is in what happens after Athena returns the data:
Dataset A is ~5x slower in SPICE ingestion despite being 75x smaller in data size, which points to a SPICE-side issue rather than anything in Athena.
Any idea what could cause this kind of overhead on the SPICE ingestion side for a small dataset?
Thanks again!
Hi @Bar_Cohen
Thank you for sharing those details, that’s very helpful.
Since the Athena execution times are nearly identical, the bottleneck is clearly on the SPICE ingestion side. Here are a couple of things worth checking:
Calculated Fields: If Dataset A has complex calculated fields defined at the QuickSight dataset level (not in the SQL), those are computed during SPICE ingestion and can add significant time. Additionally, check if any QuickSight calculated fields are performing type conversions during ingestion (e.g., parsing strings as dates or numbers), as this adds processing time. Check if Dataset A has more field type overrides than Dataset B.
Row-Level Security (RLS): If RLS is configured on Dataset A but not on Dataset B, SPICE applies the security rules during ingestion, which adds overhead.
Could you also check if any of the above apply to Dataset A? If everything matches between the two datasets, this may require a deeper look into the SPICE ingestion internals. In that case, I would recommend filing a case with AWS Support where we can dive into the details and help you further. Here are the steps to open a support case. If your company has someone who manages your AWS account, you might not have direct access to AWS Support and will need to raise an internal ticket to your IT team or whomever manages your AWS account. They should be able to open an AWS Support case on your behalf.
Hi @Xclipse ,
Thank you so much for your time and for walking through this with me so thoroughly!
We went ahead and checked both points you mentioned, the calculated fields and the RLS settings and everything is identical between Dataset A and Dataset B. No differences there either.
I’ll go ahead and open an AWS Support case to get a deeper look into the SPICE ingestion internals.
Really appreciate all your help in narrowing this down!