Postgres Conference

Presented by:

Sai Srirampur

ClickHouse Inc

Sai leads all the Postgres and Database integration efforts at ClickHouse. He was the CEO and Co-founder of PeerDB, which he sold to ClickHouse. Sai is a Postgres enthusiast who's helped hundreds of companies get the most out of their databases. He built Postgres tools at Microsoft and was an early Citus Data employee (acquired by Microsoft).

No video of the event yet, sorry!

pg_dump and pg_restore are reliable tools for backing up and restoring Postgres databases. They are widely used for various real-world use cases, including Database Migrations, Disaster Recovery, and more.They offer fine grain control, allowing selection of specific db objects for backup and restore, choice of dump format (plain text or compressed), and parallel processing of multiple tables. Furthermore they guarantee dumping and restoring a consistent snapshot of the database.

However, pg_dump and pg_restore are single-threaded at the table level, significantly slowing down dump and restore of databases with star schema (common in real-world apps), with a few large and many small tables. For such databases with over 1TB of data, pg_dump and pg_restore can take multiple days, resulting in increased downtime during migrations and higher Recovery Time Objectives (RTOs) in DR scenarios.

In this talk, we'll discuss an idea to parallelize pg_dump and pg_restore at the individual table level, still ensuring consistency. It involves creating a Postgres snapshot, using that snapshot to logically partition a large table based on CTIDs, and then parallelly dumping/restoring each partition. As enhancements, a) each partition can be read from the source and written to the target simultaneously, enabling pg_dump | pg_restore and b) by tracking the partitions that have been dumped and restored, pg_dump and pg_restore can resume from where they left off in case of failures, instead of restarting from the beginning.

This idea has already been implemented in PeerDB, an open-source Postgres replication tool. The benefits have been significant, enabling the replication of large production databases with over a TB of data in just a few hours, compared to days. We will share these insights and assess how they can be integrated into the Postgres upstream!

Date:: 2024 April 19 16:00 PDT
Duration:: 20 min
Room:: San Pedro
Conference:: Postgres Conference 2024
Language:: English
Track:: Dev
Difficulty:: Intermediate

How can we make pg_dump and pg_restore 10 times faster?

Presented by:

Sai Srirampur

No video of the event yet, sorry!

Why Guidelines

Types of Presentations We Accept

Laptop connections supported

What about X technology?

How do presentations on forks or closed source versions help the success of Postgres?

Shouldn’t the community be promoting Open Source solutions over proprietary closed source solutions?

Types of Presentations

Introduction

Inclusivity and Appropriate Conduct

Retaliation

Reporting

Acting in Good Faith

Conclusion