Hi
I have enabled scheduled incremental refresh of 1 hour in my datasets
But if the data is large it takes 1 hour more , another schedule refresh gets triggered parallely .
Today some dataset after the schedule took 1 hr more for reimporting data ,the second one while trigger lead to 0 ingested row
Another time the trigger ran it ingested row
Does it can be a case refresh overlap issue
Hi @devyameh - Welcome to AWS QuickSight and thanks for posting the question.
It is a good point and we need to engage QuickSight core team to help on this.
Hi @duncan @ErikG @Max - For an incremental data refresh, if the one ingestion process is in progress and took time ( cross the time of the incremental time), is the second incremental will run. QuickSight should wait or cancel the second the incremental refresh. Can you please highlight this issue internally and understand the approach for the refresh.
Hi @devyameh - Please also see why the incremental refresh is taking 1 hour time. Looks like there is some issues in the data set configuration or approach which needs to be analyzed as well .Can you also analyze the below details.
- What is the total volume of data in incremental refresh.
- What is the source of data, is it RDBMS, can you also see the DB performance.
- Are we using any custom sql or many calculated field in the data set, please give more details on the data set.
Hi @David_Wong - Any advise on this?
Regards - Sanjeeb
hi @Sanjeeb2022
2.Source of dataset → aws athena
Volume of data->
3.No
Basically the second refresh resulted in ingested 0 rows
without any change in datasource
the when data got refreshed second time it got rows ingested
Also the refrsh is taking more time due to larger amount of data
note this happened once only
Hi @devyameh - Ok, thanks for the info. Is it possible to check the count in Athena and see the execution time of the sql. Since you are interested in the incremental data, you can put the filter condition and check the execution time of the sql like below query
With temp as ( << Put your sql with 1 hour incremental condition)
select count(1) from temp;
Also what is your underline storage, is there any partition happened to your data and are you save the data in parquet format. Please check these aspects as well.
Regards - Sanjeeb
no of rows → 71663668
no partition
query is
SELECT
v.verification_nid
, v.verification_vid
, v.source_nid
, v.source_vid
, v.physical_location_id
, v.company_id
, v.equipment_number_or_area
, v.work_area
, v.task
, v.critical_risk_id
, v.critical_risk
, v.critical_control_id
, v.critical_control
, v.verification_section
, v.verification_question_id
, v.delta
, v.verification_question_uuid
, v.verification_question_text
, v.verification_question_text_formatted
, v.verification_question_non_compliance
, v.verification_question_compliance
, v.verification_question_na
, v.verification_question_comments
, v.verification_question_evidence
, v.verification_date
, v.verification_last_updated_date changed
, v.verification_type
, v.verification_mobile_submission
, v.verification_language
, v.verification_latitude
, v.verification_longitude
, v.verification_unplanned_work
, v.verification_energised_work
, v.verification_worker_type
, v.checklist_nid
, v.checklist_version_id
, v.checklist_version_latest
, v.checklist_revision_vid
, v.site_id
, v.site
, v.site_level_1
, v.site_level_2
, v.site_level_3
, v.site_level_4
, v.site_level_5
, v.site_level_1_id
, v.site_level_2_id
, v.site_level_3_id
, v.site_level_4_id
, v.site_level_5_id
, v.corporate_company corporate_group
, v.structure_level_1 product_group_crm
, v.structure_level_2 business_unit_crm
, v.structure_level_3
, v.structure_level_4
, v.structure_level_5
, v.verifier_id
, v.verifier_uid verifier_uuid
, v.verifier
, v.verifier_status
, v.verifiers_structure_level_id
, v.verifiers_structure_level
, v.verifiers_employee_status
, v.coach_id
, v.coach
, v.structure_level_verified
, v.group_type
, v.verification_scheduled
, v.crm_url
, CAST(‘No SRU mapping found’ AS varchar) product_group_sru
, CAST(‘No SRU mapping found’ AS varchar) business_unit_sru
, CAST(‘No SRU mapping found’ AS varchar) standard_reporting_unit_sru
FROM ##CTAS## v
WHERE (v.task_based_verification = 0)
Hi @devyameh - So you have 71M records. Can you please put the filter condition for 1 hour and do a CTAS and check the count like I suggested earlier. Need to understand the athena execution time. What is the underline file format?
Regards - Sanjeeb
hello Sanjeeb
Thankyou for your time
Actually our whole data gets reprocessed in 1 hr
so after 1 hour filtering also we have7 M records → which leads to 7 M data in athena
no of data increase in 1 hr is upto 2000
storage is in parquet
Also ctas runtime for view is is → 15 min 44.82 sec
view ->2-3 mins
Hi @devyameh - Is there an chance when the data in athena is refreshed, the refresh in SPICE is also happening in the same time. I am bit confused with the details, when you are saying every data is reprocessed again, why it is that, when you are putting the 1 hour filter, you should expect only 2000 records not 7M. Am I missing anything.
Is it possible to do a full refresh rather incremental in QuickSight as all your data is refreshed in 1 hour ( I am not sure the business logic over here but just to test whether it is working or not)
I am suspecting, there may be a delay in data transfer between Athena to QuickSight may take time but not 100% sure. Please do a manual refresh from QuickSight to see the timing, if it is taking time, please raise a ticket to AWS customer support team to do a deep dive into this problem. If the data transfer is taking time, we need to see how we can improve the performance.
Tagging @David_Wong @Max @sagmukhe for their expert advise as well.
Regards - Sanjeeb