Direct Query Data sources that connect to Athena no longer seem to work.
The same query functions correctly when run directly in Athena.
The Table in question is extremely large, but it’s partitioned to keep the query size low.
region: eu-west-2 timestamp: 1687171801388 requestId: 10a8a86f-d788-4b1f-8c1a-4640ed103882 sourceErrorCode: DATA_SOURCE_TIMEOUT sourceErrorMessage: Server execution exceeded 50 seconds, which is calculated based on client TTL and service config`
Investigating the Query that QuickSite applies in the Athena Recent Queries Window, shows the generated query uses:
Limit 0
Which always seems to fail regardless of the inner query. Changing this limit to a set number fixes the query.
Not sure what to do here to resolve this as I can’t seem to influence this applied limit.
Hi @OwenA - Welcome to AWS QuickSight and thanks for posting the question. Can you please share the SQL query which is working fine in Athena but timing out in QuickSight. Also give some details about size and number of records in the table. Like you said you have partitions created for the table, can we put some filter condition via custom sql to get desired data to QuickSight for the reporting. Give some details on the business requirement as well. The details will help in understanding the problem and guide you the right solution.
The business requirement is to produce some custom charts base on a specific key we have captured for our client. We are filtering by date, location and the specific key.
They only view the data at the end of a month, so right query responsiveness isn’t really an issue.
The Problem really is the client captures a lot data that they don’t want to run analytics on.
Right now there is:
551,992 objects, at a total size at about 456.5gb.
The date range I am currently running has 16,051 objects at 1.29GB.
Though monthly data often hits 50,000 object by the time the client runs the Analytics.
Here’s an example Query:
WITH cte AS (
SELECT *,
cast(Table1.year||'-'||Table1.month||'-'||Table1.day as date) as partionDate
FROM Table1
WHERE Table1Index='Web'
AND any_match(element_at(Table1.visitorshistoricbasetouchpoints, 1).historictouchpoint_customdata, element -> element.Key = 'landing page' AND regexp_like(element.Value, '(?i)CompanyName'))
)
SELECT * FROM cte WHERE cte.partionDate > date '2023-06-01' LIMIT 50000
This query was previously working fine. Honestly not sure what has changed.
1/ 50 seconds timeout is occurring during dataset preparation. If Athena takes more than 50 seconds to return the results, you will get the error.
2/ 2 minutes timeout in analysis and dashboard : If Athena SQL query takes more than 2 minutes to return the results, you will get the error ( Data source quotas - Amazon QuickSight ) .
So that 1/50 sec limit is being hit in this case. But this is what I don’t really understand. The Above Query runs in 12-13secs when run directly in athena.
However it times out when run via quick sight. FYI the actual query that runs when I add the above as a custom query is the following:
/* QuickSight */
SELECT ds.* FROM (WITH cte AS (
SELECT *,
cast(Table1.year||'-'||Table1.month||'-'||Table1.day as date) as partionDate
FROM Table1
WHERE Table1Index='Web'
AND any_match(element_at(Table1.visitorshistoricbasetouchpoints, 1).historictouchpoint_customdata, element -> element.Key = 'landing page' AND regexp_like(element.Value, '(?i)CompanyName'))
)
SELECT * FROM cte WHERE cte.partionDate > date '2023-06-01' LIMIT 50000 ) ds LIMIT 0
Is anything I can do about this? Seems like the above Limit is messing with my partitions for some reason.
From what I have tested, the query execution time from QuickSight in the data preparation mode , directly from Athena completed roughly in the same time.
You can test this at your end as well as it is traceable from the logs.
Hi @OwenA - Is it possible to add the condition Table1.year > ‘2023’ in the inner query and try the execution? This way the data will be restricted at CTE level and possibly improve the performance.
Hey @Sanjeeb2022, It doesn’t seem to make a difference at all.
I’m probably going to have to look into cleaning up the data.
Not sure why the Partition’s are being ignore by athena in this case. Though I don’t believe its Quick Sight’s issue. Wrapping in a sub query with LIMIT 0 shouldn’t break everything underneath.
Hi @OwenA - To hit the partition, ideally we should use = in the where clause so that it can go that partition and take the data, now in your query you have put the condition in cte.partionDate > date ‘2023-06-01’ and there is No folder with this, so essentially it is scanning everything. You can do one thing, change the partition keys from string to int as year, month and date are always integer independently ( possibly create another table for testing) and check the explain plan whether it is hitting the partitions or FTS ( Full table scan).
Like @Koushik_Muthanna mentioned, its not an QuickSight problem as your underline query is taking time and QuickSight is throwing timeout due to exceeded period.