QuickSight Reporting over S3 Datalake directly

PraveenKumar · June 5, 2023, 3:39am

Hi all,

We have a use-case and want to evaluate if QuickSight can be directly enabled over an S3 data lake to allow our users with ability to access near real time data. Our datasets are large spanning over TB’s or billions of records and are in parquet and other formats.

We want to understand the possibility of our request, we do not want to use Redshift, Athena or any system as this will incur addition cost and are expensive. let us know if this is possible today or anyone solved this already.

Sanjeeb2022 · June 5, 2023, 8:01am

Hi @PraveenKumar - Welcome to AWS QuickSight community and thanks for posting the question. Yes QuickSight can connect to S3 and do the reporting, you have to create a manifest file for the same, however it only support (.csv, .tsv, .clf, or .elf) formats. Please see the documentation for the same - Creating a dataset using Amazon S3 files - Amazon QuickSight.

or other format, the best way is to load the data in Athena ( Compressed format, Partition) if you are looking for a serverless option. Partition will help in scanning low volume data and which will improve the performance.

Regards - Sanjeeb

EnriqueS · June 7, 2023, 11:23am

Hello @PraveenKumar , I am marking this topic as solved based on the answer provided by @Sanjeeb2022 .

Also bear in mind (in terms of cost) that ingesting TBs/billions of data into SPICE (which is the only option when you use S3 source in QS, as you don’t have an engine to run queries against such as Athena or Redshift) will be more expensive than using Athena for sure and most likely more expensive than using Redshift.

Here the best approach is to use the modern data architecture in AWS (being S3, Glue and Athena the central pieces of it). You can find more info on this link.

Hope it helps.

Happy dashboarding!