Presented by:
Sai Srirampur
I was the CEO and co-founder of PeerDB, where we focused on building the world's fastest replication tool for Postgres. Recently, ClickHouse acquired PeerDB to offer a native Postgres CDC integration within ClickHouse. I now work on the product team at ClickHouse, leading all things Postgres, including Postgres CDC integration.
Before PeerDB, I was an early employee at Citus Data and saw it through the Microsoft acquisition. For the past decade, I have been an active member of the Postgres community, helping customers implement Postgres, Citus and PeerDB.
No video of the event yet, sorry!
Every datastore is unique with a diverse set of features and data modeling characteristics. For example, PostgreSQL has 4 ways to ingest data, 5 ways to read data, 300+ data types and 300+ database configs. Building data movement solutions that scale, therefore, requires an emphasis on the unique design and capabilities of each data store.
However, most existing data movement tools focus on breadth over quality of connectors. They often fail at scale due to painfully slow syncs, lack of reliability, and lack of features. These challenges are reflected in the number of companies building in-house solutions and maintaining large data engineering teams.
This emphasizes the need of first class data movement tool for Postgres. A tool that focuses on quality over breadth and is native to Postgres. In this talk, I will do a deep dive into what it takes to build a Postgres-specialized data movement tool.
I will cover the architectural tradeoffs - Why choose a peer-to-peer architecture that keeps data-stores at the center vs a hub-and-spoke one that optimizes for the breadth of connectors?
Deep-dive into Postgres native optimizations to enhance performance, reliability and richness of data-movement:
- Partitioning a Postgres table using internal tuple identifiers (CTIDs) and implement parallel snapshotting to move TBs of data in hours vs days;
- Preserve data type nativity while moving specialized types such as Geospatial, JSONB, ARRAYs to Postgres and non Postgres targets;
- Reliably manage Schema Changes on the target by using Relation messages from logical decoding.
- Efficiently replicate TOAST columns without requiring REPLICA IDENTITY FULL.
To sum it up, I will share what needs to go into Postgres upstream to make data movement a first-class citizen.
- Date:
- 2024 April 19 16:30 PDT
- Duration:
- 20 min
- Room:
- San Pedro
- Conference:
- Postgres Conference 2024
- Language:
- English
- Track:
- Dev
- Difficulty:
- Intermediate