Hi there - I’m using multiple datasets that I pulled individually into Quisksight from S3. The data being ingested into Quicksight is data that I already cleaned up and successfully edited in Databrew (CSV output).
All the other datasets are pulling into quicksight fine, the issue is only happening with my largest data set ~10GBs compressed, about 30-40gbs uncompressed. I have auto-purchase Spice capacity on, and I can see that I’ve already purchased about 40 additional Gbs of SPICE storage.
I tried to remove the dataset completely and start from scratch by generating it out of Databrew uncompressed, but that also lead to the exact same outcome. The data refresh jobs keep hanging, it seems like they’re hanging at towards the end job (and at different rows). Nothing useful in Cloudwatch or logging at all. And the more I try to edit the data the more refresh jobs that initiate and just hang until they fail the next day.
Hi @quickersight - Welcome to AWS Quick Sight community and thanks for posting the question. To understand what is happening in background, it is better to raise a ticket to AWS Customer support and provide the details. I am suspecting the data volume is very high and that may taking time. Is it possible to create an athena table and use athena as a data source for the reporting rather reading the files from S3 directly.
@quickersight - Is there data points (i.e. records with attributes) which is not getting reconciled or validated with the datatype specified in the SPICE dataset?
That’s what I’m trying to figure out but there isn’t any logging anywhere to tell me what’s hanging up. I tried switching almost all of the fields to string to make sure it’s not a datatype issue with the columns, (sometimes the column I edit in Databrew will still show up as a different datatype in Quicksight, but I made sure to correct all of those). I can preview the data fine, I can look at the first and last 5000 rows of the data fine, and the job is hanging towards the very end after processing millions of rows.
I should also note that I ran a data profiling job on the data in databrew and that also hung up. I tried to point both to the edited/cleaned up data, and also the data source, but same issue…
When I point directly to the datasource, it allows me to query the data, but it erroring out when importing into Spice.
Oh I should also add that for the data profiling job in databrew, I upped the number of nodes to 20 thinking it might be a compute issue, but also ended up getting stuck
Are you able to test with a smaller sample of your data? That will tell you if the issue is related to the amount of data or if there’s a different issue.