
After your initial data movement to Amazon S3, you’re supposed to receive incremental updates from the source database as CSV files using AWS DMS or equivalent tools, where each record has an additional column to represent an insert, update, or delete operation.

Let’s assume you have a relational database that has product inventory data, and you want to move it into an S3 data lake on a continuous basis, so that your downstream applications or consumers can use it for analytics.
#Aws glue iceberg how to#
Later in the implementation steps, when you create an AWS Glue job, we show how to use the connector you just configured. To use this connector, when you create an AWS Glue job, make sure you add this connector to your job. You can confirm your new connection on the AWS Glue Studio Connectors page. Create a connection by providing a name and choosing Create connection and activate connector.Choose Usage Instruction, which opens a page that has a link to activate the connector.Choose the AWS Glue version and software version.Choose Continue to Subscribe and then Accept Terms.Navigate to the AWS Marketplace connector page.The following steps guide you through the setup process: Configuring this connector is as easy as clicking few buttons on the user interface. The connector supports AWS Glue versions 1.0, 2.0, and 3.0, and is free to use. You can integrate Apache Iceberg JARs into AWS Glue through its AWS Marketplace connector. In this post, we walk you through a solution to implement CDC-based UPSERT or MERGE in an S3 data lake using Apache Iceberg and AWS Glue.
#Aws glue iceberg full#
It’s designed to support ACID transactions and UPSERT on petabyte-scale data lakes, and is getting popular because of its flexible SQL syntax for CDC-based MERGE, full schema evolution, and hidden partitioning features. Apache Iceberg is an open table format originally developed at Netflix, which got open-sourced as an Apache project in 2018 and graduated from incubator mid-2020.


Apache Hudi integration is already supported with AWS analytics services, and recently AWS Glue, Amazon EMR, and Amazon Athena announced support for Apache Iceberg.
