Team, Could you please share some insights on how much time does it take to load a billion rows dataset, and/or how to optimise the process

We are going to work with close to 1 billion rows. Incremental updates of the dataset, with a timestamped column in, can address in-between updates.

Considering that we need to have at any point in time data from the last 12 months, would QuickSight delete older rows (based on the timestamped column) that exceed its 1 billion rows quota? …OR do we have to always do full dataset updates to keep only 12 months of data (that we know are close but always less than 1billion) ? (the table is updated via a Glue job that will query for the last 12 months). If the latter is the case, then we’d have the import worry against an Athena timeout present on a daily basis. Thoughts/solutions/anything we miss?

Here is more information.

I don’t know exactly what will happen if you go over the 1 billion row limit (never encountered it).

I would suggest trying incremental refreshes first because loading a whole 1 billion rows will take a long time. However, I don’t know what will happen when you go over a billion (if you do).