Athena Data Source Query never completes DATA_SOURCE_TIMEOUT - Query Works when ran directly in Athena UI

Direct Query Data sources that connect to Athena no longer seem to work.
The same query functions correctly when run directly in Athena.

The Table in question is extremely large, but it’s partitioned to keep the query size low.

region: eu-west-2
timestamp: 1687171801388
requestId: 10a8a86f-d788-4b1f-8c1a-4640ed103882
sourceErrorCode: DATA_SOURCE_TIMEOUT
sourceErrorMessage: Server execution exceeded 50 seconds, which is calculated based on client TTL and service config`

Investigating the Query that QuickSite applies in the Athena Recent Queries Window, shows the generated query uses:

Limit 0

Which always seems to fail regardless of the inner query. Changing this limit to a set number fixes the query.

Not sure what to do here to resolve this as I can’t seem to influence this applied limit.

1 Like

Hi @OwenA - Welcome to AWS QuickSight and thanks for posting the question. Can you please share the SQL query which is working fine in Athena but timing out in QuickSight. Also give some details about size and number of records in the table. Like you said you have partitions created for the table, can we put some filter condition via custom sql to get desired data to QuickSight for the reporting. Give some details on the business requirement as well. The details will help in understanding the problem and guide you the right solution.

Regards - Sanjeeb

Hi @Sanjeeb2022, appreciate the welcome!

The business requirement is to produce some custom charts base on a specific key we have captured for our client. We are filtering by date, location and the specific key.

They only view the data at the end of a month, so right query responsiveness isn’t really an issue.

The Problem really is the client captures a lot data that they don’t want to run analytics on.

Right now there is:

551,992 objects, at a total size at about 456.5gb.

The date range I am currently running has 16,051 objects at 1.29GB.

Though monthly data often hits 50,000 object by the time the client runs the Analytics.

Here’s an example Query:

WITH cte AS (
SELECT *,
cast(Table1.year||'-'||Table1.month||'-'||Table1.day as date) as partionDate
FROM Table1
WHERE Table1Index='Web'
AND any_match(element_at(Table1.visitorshistoricbasetouchpoints, 1).historictouchpoint_customdata, element -> element.Key = 'landing page' AND regexp_like(element.Value, '(?i)CompanyName'))
)
SELECT * FROM cte WHERE cte.partionDate > date '2023-06-01' LIMIT 50000

This query was previously working fine. Honestly not sure what has changed.

@OwenA ,

QuickSight has the following 2 hard limits :

1/ 50 seconds timeout is occurring during dataset preparation. If Athena takes more than 50 seconds to return the results, you will get the error.
2/ 2 minutes timeout in analysis and dashboard : If Athena SQL query takes more than 2 minutes to return the results, you will get the error ( Data source quotas - Amazon QuickSight ) .

For the error in data preparation, you should still be able to save and publish the dataset.
It might also be possible you would have to further optimize the data for specific reporting requirement : Top 10 Performance Tuning Tips for Amazon Athena | AWS Big Data Blog

Kind regards,
Koushik

1 Like

So that 1/50 sec limit is being hit in this case. But this is what I don’t really understand. The Above Query runs in 12-13secs when run directly in athena.

However it times out when run via quick sight. FYI the actual query that runs when I add the above as a custom query is the following:

/* QuickSight */
SELECT ds.* FROM (WITH cte AS (
SELECT *,
cast(Table1.year||'-'||Table1.month||'-'||Table1.day as date) as partionDate
FROM Table1
WHERE Table1Index='Web'
AND any_match(element_at(Table1.visitorshistoricbasetouchpoints, 1).historictouchpoint_customdata, element -> element.Key = 'landing page' AND regexp_like(element.Value, '(?i)CompanyName'))
)
SELECT * FROM cte WHERE cte.partionDate > date '2023-06-01' LIMIT 50000 ) ds LIMIT 0

Is anything I can do about this? Seems like the above Limit is messing with my partitions for some reason.

Data Preparation : When you click on edit dataset, at that point the data preparation screen would open and a query would be submitted to Athena.

You can also check from the Athena query logs and open the SQL executed and test running it again. The log will also tell you if the query completed.

From what I have tested, the query execution time from QuickSight in the data preparation mode , directly from Athena completed roughly in the same time.

You can test this at your end as well as it is traceable from the logs.

Yeah it’s strange. The query quick sight produces doesn’t seem to scan any data at all.

It’s not erroring either:

Running the query from here as you suggested produces the same effect.

The explain query doesn’t look right either. I’ll see what other logs I can dig up.

Hi @OwenA - What is your partition keys in the table?

Regards - Sanjeeb

@Sanjeeb2022 Year, Month and Day as strings

Hi @OwenA - Is it possible to add the condition Table1.year > ‘2023’ in the inner query and try the execution? This way the data will be restricted at CTE level and possibly improve the performance.

Regards - Sanjeeb

Hey @Sanjeeb2022, It doesn’t seem to make a difference at all.

I’m probably going to have to look into cleaning up the data.

Not sure why the Partition’s are being ignore by athena in this case. Though I don’t believe its Quick Sight’s issue. Wrapping in a sub query with LIMIT 0 shouldn’t break everything underneath.

Hi @OwenA - To hit the partition, ideally we should use = in the where clause so that it can go that partition and take the data, now in your query you have put the condition in cte.partionDate > date ‘2023-06-01’ and there is No folder with this, so essentially it is scanning everything. You can do one thing, change the partition keys from string to int as year, month and date are always integer independently ( possibly create another table for testing) and check the explain plan whether it is hitting the partitions or FTS ( Full table scan).

Like @Koushik_Muthanna mentioned, its not an QuickSight problem as your underline query is taking time and QuickSight is throwing timeout due to exceeded period.

Regards - Sanjeeb

Hi @Sanjeeb2022,

It’s not the Partitions.

The partitions are being used in the cte.partionDate > date ‘2023-06-01’.

The query that produces the partionDate variable uses the partition.

I’ve tried as you say using flat = , and it doesn’t make a difference.

To be honest. I can’t use LIMIT 0 at all in any of my queries against this table.

I think I have found the issue.

We have a struct column. No idea why but this causes the LIMIT 0 query to hang.

Removing it sorts the issue.

@Koushik_Muthanna as @Sanjeeb2022 says, it really doesn’t look like this is a Quick Sight problem.

Seems there is some underlying issue in Athena with how LIMIT 0 is being applied when the columns have nested data types.

I’ll be creating a bespoke table with properly flattened data I expect.

2 Likes

@OwenA , Thanks for the update :slight_smile:

1 Like