Combining AWS costs datasets for QuickSight in SPICE

We use QuickSight to analyse our detailed AWS cost data. AWS billing creates the required datasets in S3 with a QuickSight compatible manifest file. Adding this as a dataset all works perfectly.

Each calendar month a new manifest file and new set of files are created on S3. I can create a new dataset in QuickSight for the new month.

So… how can I report on multiple months? I essentially want to do a “union” of each of the SPICE datasets for each month that has been imported.

How can I do this? Many thanks!

The way I had gotten this to work without requiring any additional support from Quicksight was to setup the CUR reports as follows:

It is important to point out that I have selected the option to overwrite files since a new file with monthly data will be written daily.

Following this, add an AWS Athena table with the following definition:

CREATE EXTERNAL TABLE `aws_costs`(
  `identity_line_item_id` string, 
  `identity_time_interval` string, 
  `bill_invoice_id` string, 
  `bill_invoicing_entity` string, 
  `bill_billing_entity` string, 
  `bill_bill_type` string, 
  `bill_payer_account_id` string, 
  `bill_billing_period_start_date` timestamp, 
  `bill_billing_period_end_date` timestamp, 
  `line_item_usage_account_id` string, 
  `line_item_line_item_type` string, 
  `line_item_usage_start_date` timestamp, 
  `line_item_usage_end_date` timestamp, 
  `line_item_product_code` string, 
  `line_item_usage_type` string, 
  `line_item_operation` string, 
  `line_item_availability_zone` string, 
  `line_item_usage_amount` double, 
  `line_item_normalization_factor` double, 
  `line_item_normalized_usage_amount` double, 
  `line_item_currency_code` string, 
  `line_item_unblended_rate` string, 
  `line_item_unblended_cost` double, 
  `line_item_blended_rate` string, 
  `line_item_blended_cost` double, 
  `line_item_line_item_description` string, 
  `line_item_tax_type` string, 
  `line_item_legal_entity` string, 
  `product_product_name` string, 
  `product_purchase_option` string, 
  `product_account_assistance` string, 
  `product_alarm_type` string, 
  `product_architectural_review` string, 
  `product_architecture_support` string, 
  `product_availability` string, 
  `product_availability_zone` string, 
  `product_best_practices` string, 
  `product_capacitystatus` string, 
  `product_case_severityresponse_times` string, 
  `product_category` string, 
  `product_classicnetworkingsupport` string, 
  `product_clock_speed` string, 
  `product_content_type` string, 
  `product_current_generation` string, 
  `product_customer_service_and_communities` string, 
  `product_dedicated_ebs_throughput` string, 
  `product_description` string, 
  `product_disableactivationconfirmationemail` string, 
  `product_durability` string, 
  `product_ecu` string, 
  `product_edition` string, 
  `product_endpoint_type` string, 
  `product_enhanced_networking_supported` string, 
  `product_equivalentondemandsku` string, 
  `product_fee_code` string, 
  `product_fee_description` string, 
  `product_free_query_types` string, 
  `product_from_location` string, 
  `product_from_location_type` string, 
  `product_from_region_code` string, 
  `product_group` string, 
  `product_group_description` string, 
  `product_included_services` string, 
  `product_insightstype` string, 
  `product_instance` string, 
  `product_instance_family` string, 
  `product_instance_type` string, 
  `product_instance_type_family` string, 
  `product_intel_avx2_available` string, 
  `product_intel_avx_available` string, 
  `product_intel_turbo_available` string, 
  `product_launch_support` string, 
  `product_license_model` string, 
  `product_location` string, 
  `product_location_type` string, 
  `product_logs_destination` string, 
  `product_marketoption` string, 
  `product_max_iops_burst_performance` string, 
  `product_max_iopsvolume` string, 
  `product_max_throughputvolume` string, 
  `product_max_volume_size` string, 
  `product_maximum_extended_storage` string, 
  `product_memory` string, 
  `product_message_delivery_frequency` string, 
  `product_message_delivery_order` string, 
  `product_network_performance` string, 
  `product_normalization_size_factor` string, 
  `product_operating_system` string, 
  `product_operation` string, 
  `product_operations_support` string, 
  `product_origin` string, 
  `product_parameter_type` string, 
  `product_physical_processor` string, 
  `product_platousagetype` string, 
  `product_platovolumetype` string, 
  `product_pre_installed_sw` string, 
  `product_proactive_guidance` string, 
  `product_processor_architecture` string, 
  `product_processor_features` string, 
  `product_product_family` string, 
  `product_programmatic_case_management` string, 
  `product_provisioned` string, 
  `product_purchaseterm` string, 
  `product_queue_type` string, 
  `product_recipient` string, 
  `product_region` string, 
  `product_region_code` string, 
  `product_servicecode` string, 
  `product_servicename` string, 
  `product_sku` string, 
  `product_standard_storage_retention_included` string, 
  `product_storage` string, 
  `product_storage_class` string, 
  `product_storage_media` string, 
  `product_subscription_type` string, 
  `product_technical_support` string, 
  `product_tenancy` string, 
  `product_thirdparty_software_support` string, 
  `product_throughput` string, 
  `product_to_location` string, 
  `product_to_location_type` string, 
  `product_to_region_code` string, 
  `product_training` string, 
  `product_transfer_type` string, 
  `product_usagetype` string, 
  `product_vcpu` string, 
  `product_version` string, 
  `product_volume_api_name` string, 
  `product_volume_type` string, 
  `product_vpcnetworkingsupport` string, 
  `product_who_can_open_cases` string, 
  `pricing_lease_contract_length` string, 
  `pricing_offering_class` string, 
  `pricing_purchase_option` string, 
  `pricing_rate_code` string, 
  `pricing_rate_id` string, 
  `pricing_currency` string, 
  `pricing_public_on_demand_cost` double, 
  `pricing_public_on_demand_rate` string, 
  `pricing_term` string, 
  `pricing_unit` string, 
  `reservation_amortized_upfront_cost_for_usage` double, 
  `reservation_amortized_upfront_fee_for_billing_period` double, 
  `reservation_effective_cost` double, 
  `reservation_end_time` string, 
  `reservation_modification_status` string, 
  `reservation_normalized_units_per_reservation` string, 
  `reservation_number_of_reservations` string, 
  `reservation_recurring_fee_for_usage` double, 
  `reservation_start_time` string, 
  `reservation_subscription_id` string, 
  `reservation_total_reserved_normalized_units` string, 
  `reservation_total_reserved_units` string, 
  `reservation_units_per_reservation` string, 
  `reservation_unused_amortized_upfront_fee_for_billing_period` double, 
  `reservation_unused_normalized_unit_quantity` double, 
  `reservation_unused_quantity` double, 
  `reservation_unused_recurring_fee` double, 
  `reservation_upfront_value` double, 
  `savings_plan_total_commitment_to_date` double, 
  `savings_plan_savings_plan_a_r_n` string, 
  `savings_plan_savings_plan_rate` double, 
  `savings_plan_used_commitment` double, 
  `savings_plan_savings_plan_effective_cost` double, 
  `savings_plan_amortized_upfront_commitment_for_billing_period` double, 
  `savings_plan_recurring_commitment_for_billing_period` double, 
  `savings_plan_start_time` string, 
  `savings_plan_end_time` string, 
  `savings_plan_offering_type` string, 
  `savings_plan_payment_option` string, 
  `savings_plan_purchase_term` string, 
  `savings_plan_region` string, 
  `resource_tags_aws_cloudformation_logical_id` string, 
  `resource_tags_aws_cloudformation_stack_name` string, 
  `resource_tags_aws_created_by` string, 
  `resource_tags_user_lambda_service_function_name` string, 
  `resource_tags_user_table_service_name` string)
PARTITIONED BY ( 
  `year` string, 
  `month` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://sample-costs-bucket/daily/Costs/Costs'
TBLPROPERTIES (
  'transient_lastDdlTime'='1653742811')

At this point you still would not be able to see the data because Athena does not yet know about the partitions. But we can pre-add partitions for a number of upcoming months and for all existing months by executing the following queries in Athena:

ALTER TABLE tma1_prd_costs ADD IF NOT EXISTS
  PARTITION (year = '2022', month='7')
  PARTITION (year = '2022', month='8')
  PARTITION (year = '2022', month='9')
...

You would need to pre-add partitions for all months of interest (or else you could look into dynamic ID parititioning for Athena to determine the partitions automatically Dynamic ID partitioning - Amazon Athena … but this is not required here if you add all partitions yourself for the forseeable future :wink: )

Following this, you can go ahead and create an Athena DataSet on QuickSight to query this Athena table. Also set up daily SPICE refresh on this dataset in order to see new data every day.

Some resources that might be useful to you are: Querying Cost and Usage Reports using Amazon Athena - AWS Cost and Usage Reports
and 300 Labs :: AWS Well-Architected Labs

1 Like