{"id":2558,"date":"2023-05-10T04:58:46","date_gmt":"2023-05-10T04:58:46","guid":{"rendered":"https:\/\/osdbtechnologies.com\/?p=2558"},"modified":"2023-05-10T04:58:46","modified_gmt":"2023-05-10T04:58:46","slug":"setting-up-logical_replication-using-pg_dump","status":"publish","type":"post","link":"https:\/\/divaind.com\/ie1\/2023\/05\/10\/setting-up-logical_replication-using-pg_dump\/","title":{"rendered":"Optimizing Initial Sync for Large Databases in RDS PostgreSQL using pg_dump in Logical Replication"},"content":{"rendered":"<p data-renderer-start-pos=\"1456\">PostgreSQL provides a built-in logical replication system based on the publish\/subscribe model, where a publisher publishes data changes to a publication, and a subscriber subscribes to the publication and receives the changes. Logical replication can be configured and managed using SQL commands and functions or third-party tools and libraries.<\/p>\n<p data-renderer-start-pos=\"1810\">The user creates the publication on the source PostgreSQL server and adds the required tables to the publication. Each table needs to have a replica identity (primary key or unique key) to replicate the updates and deletes on the source tables to the destination. Otherwise, the updates and deletes will not be replicated but inserts will get replicated.<\/p>\n<p>When using AWS RDS PostgreSQL or any managed service, it&#8217;s not possible to employ physical backup tools like pg_basebackup to configure logical replication to community PostgreSQL due to the lack of low-level file system access. For large databases in the terabyte range, the initial sync process with logical replication can be time-consuming. However, an efficient solution to optimize the synchronization process is to use the logical backup tool pg_dump. This PostgreSQL utility creates a logical backup of the entire database or selected database objects that can be used for the initial sync of data. If this is the initial configuration of logical replication, during the time of the restoration we can tune the configuration parameters and make the recovery faster.<\/p>\n<h2 id=\"Steps-to-configure-the-logical-replication-using-the-pg_dump-backup\" data-renderer-start-pos=\"2387\"><strong data-renderer-mark=\"true\">Steps to configure the logical replication using the pg_dump backup<\/strong><\/h2>\n<h3 id=\"Enable-logical-replication-on-the-source\" data-renderer-start-pos=\"2456\"><strong data-renderer-mark=\"true\">Enable logical replication on the source<\/strong><\/h3>\n<p data-renderer-start-pos=\"2498\">Make sure the below configuration parameters are set, set the parameters, and restart the cluster.<\/p>\n<div class=\"code-block css-xc5qaq\">\n<div class=\"css-6m0gf\">\n<div class=\"css-6yiv0d\">\n<div role=\"presentation\">\n<div>\n<pre class=\"lang:default decode:true \"> max_replication_slots  | 10\n max_wal_senders        | 10\n track_commit_timestamp | on\n wal_level              | logical<\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 id=\"Create-and-add-tables-to-publication\" data-renderer-start-pos=\"2716\"><strong data-renderer-mark=\"true\">Create and add tables to the publication<\/strong><\/h3>\n<div class=\"code-block css-xc5qaq\">\n<div class=\"css-6m0gf\">\n<div class=\"css-6yiv0d\">\n<div role=\"presentation\">\n<div><\/div>\n<\/div>\n<\/div>\n<\/div>\n<pre class=\"lang:default decode:true \">create publication logical_rep01;\nalter publication logical_rep01 add table employee;\nalter publication logical_rep01 add table dept;<\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<h3 id=\"Create-replication-slot\" data-renderer-start-pos=\"2889\"><strong data-renderer-mark=\"true\">Create replication slot<\/strong><\/h3>\n<p data-renderer-start-pos=\"2914\">Create a replication connection to the database and create a replication slot. This session needs to be created in a different terminal and needs to be in an active state till the logical backup is completed.(I prefer to use screen session )<\/p>\n<div class=\"code-block css-xc5qaq\">\n<div class=\"css-6m0gf\">\n<div class=\"css-6yiv0d\">\n<div role=\"presentation\">\n<pre class=\"lang:default decode:true \">[sql@test (terra.c dev) ~]$ psql -h 10.12.2.1 -U postgres \"dbname=postgres replication=database\"\npsql (12.14, server 12.13)\nSSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES128-GCM-SHA256, bits: 128, compression: off)\nType \"help\" for help.\n\npostgres=&gt; CREATE_REPLICATION_SLOT logical_rep01 LOGICAL pgoutput;\n    slot_name    | consistent_point |    snapshot_name    | output_plugin\n-----------------+------------------+---------------------+---------------\n logical_rep01 | 7\/1F0E3938       | 00000005-00062AB3-1 | pgoutput\n (1 row)<\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 id=\"Backup-the-database\" data-renderer-start-pos=\"3693\"><strong data-renderer-mark=\"true\">Backup the database<\/strong><\/h3>\n<ul class=\"ak-ul\" data-indent-level=\"1\">\n<li>\n<p data-renderer-start-pos=\"3716\">Take a logical backup of the source(primary) database using pg_dump.<\/p>\n<\/li>\n<li>\n<p data-renderer-start-pos=\"3786\">use the directory format without compression to make the backup faster.<\/p>\n<\/li>\n<\/ul>\n<div class=\"code-block css-xc5qaq\">\n<div class=\"css-6m0gf\">\n<div class=\"css-6yiv0d\">\n<div role=\"presentation\">\n<div>\n<pre class=\"lang:default decode:true \">pg_dump -h &lt;hostname&gt; -U postgres -t employee -t dept -Fd -f backup -j 2 --no-publications --no-subscriptions --snapshot=00000005-00062AB3-1 --compress==0 -v postgres<\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<ul class=\"ak-ul\" data-indent-level=\"1\">\n<li>\n<p data-renderer-start-pos=\"4031\"><code class=\"code css-z5oxh7\" data-renderer-mark=\"true\">-Fd<\/code>: This specifies the backup format. <code class=\"code css-z5oxh7\" data-renderer-mark=\"true\">-Fd<\/code> specifies a directory format.<\/p>\n<\/li>\n<li>\n<p data-renderer-start-pos=\"4107\"><code class=\"code css-z5oxh7\" data-renderer-mark=\"true\">-f backup<\/code>: This specifies the name of the backup directory. In this case, the directory will be named &#8220;backup&#8221;.<\/p>\n<\/li>\n<li>\n<p data-renderer-start-pos=\"4222\"><code class=\"code css-z5oxh7\" data-renderer-mark=\"true\">-j 2<\/code>: This specifies the number of parallel jobs to run. In this case, two jobs will run in parallel to speed up the backup process.<\/p>\n<\/li>\n<li>\n<p data-renderer-start-pos=\"4358\"><code class=\"code css-z5oxh7\" data-renderer-mark=\"true\">--no-publications --no-subscriptions<\/code>: This specifies that publications and subscriptions should be excluded from the backup.<\/p>\n<\/li>\n<li>\n<p data-renderer-start-pos=\"4486\"><code class=\"code css-z5oxh7\" data-renderer-mark=\"true\">--snapshot=00000005-00062AB3-1<\/code>: This specifies the snapshot ID to use for the backup. This is useful when using a PostgreSQL cluster with a replication setup.<\/p>\n<\/li>\n<li>\n<p data-renderer-start-pos=\"4648\"><code class=\"code css-z5oxh7\" data-renderer-mark=\"true\">--compress=0<\/code>: This specifies the level of compression to use for the backup. <code class=\"code css-z5oxh7\" data-renderer-mark=\"true\">0<\/code> means no compression.<\/p>\n<\/li>\n<li>\n<p data-renderer-start-pos=\"4752\"><code class=\"code css-z5oxh7\" data-renderer-mark=\"true\">-v postgres<\/code>: This specifies the name of the database to backup. In this case, the database is named &#8220;postgres&#8221;.<\/p>\n<\/li>\n<\/ul>\n<p data-renderer-start-pos=\"4869\"><strong data-renderer-mark=\"true\">Close the session created earlier for the replication slot but save the output.<\/strong><\/p>\n<h3 id=\"Restore-the-backup\" data-renderer-start-pos=\"4950\"><strong data-renderer-mark=\"true\">Restore the backup<\/strong><\/h3>\n<div class=\"code-block css-xc5qaq\">\n<p><span class=\"prismjs css-1vd0zfg\" data-code-lang=\"sql\"><span class=\"prismjs css-1vd0zfg\" data-code-lang=\"sql\"><code class=\"language-sql\"><\/code><\/span><\/span><\/p>\n<pre class=\"lang:default decode:true\">pg_restore -p 5433 -h &lt;hostname&gt; -U postgres -Fd -j 2 -v -d postgres backup<\/pre>\n<\/div>\n<h3 id=\"Create-subscription\" data-renderer-start-pos=\"5047\"><strong data-renderer-mark=\"true\">Create subscription<\/strong><\/h3>\n<p data-renderer-start-pos=\"5068\">Create the subscription in standby which refers to the primary database in disabled state.<\/p>\n<div class=\"code-block css-xc5qaq\">\n<div class=\"css-6m0gf\">\n<div class=\"css-6yiv0d\">\n<div role=\"presentation\">\n<div>\n<pre class=\"lang:pgsql decode:true\">CREATE SUBSCRIPTION logical_sub01 CONNECTION 'host=hostname port=5432 dbname=postgres user=postgres password=*****' PUBLICATION logical_rep01\nWITH (\n  copy_data = false,\n  create_slot = false,\n  enabled = false,\n  connect = true,\n  slot_name = 'logical_rep01'\n);<\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 id=\"Advance-the-replication-origin\" data-renderer-start-pos=\"5425\"><strong data-renderer-mark=\"true\">Advance the replication origin<\/strong><\/h3>\n<p data-renderer-start-pos=\"5457\">Take the external ID and increase the position of the replay to the one captured during the replication slot creation time. This requires two arguments one is the external id which is available in pg_subscription and another one is the consistent point which is in the output of the &#8220;create replication slot&#8221; statement.<\/p>\n<div class=\"code-block css-xc5qaq\">\n<div class=\"css-6m0gf\">\n<div class=\"css-6yiv0d\">\n<div role=\"presentation\">\n<div>\n<pre class=\"lang:default decode:true \">postgres=# SELECT 'pg_'||oid::text AS \"external_id\" FROM pg_subscription WHERE subname = 'logical_sub01';\n external_id\n-------------\n pg_82699\n(1 row)<\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 id=\"Take-the-position-of-the-LSN-from-the-replication-slot-creation-and-make-it-advance-to-start-replication\" data-renderer-start-pos=\"5913\"><strong data-renderer-mark=\"true\">Take the position of the LSN from the replication slot creation and make it advance to start the replication<\/strong>&lt;<\/h3>\n<div class=\"code-block css-xc5qaq\">\n<pre class=\"lang:default decode:true \">postgres=# SELECT pg_replication_origin_advance('pg_82699', '7\/1F0E3938') ;\n pg_replication_origin_advance\n-------------------------------\n\n(1 row)<\/pre>\n<p>&nbsp;<\/p>\n<p><span class=\"prismjs css-1vd0zfg\" data-code-lang=\"sql\"><code class=\"language-sql\"><\/code><\/span><\/p>\n<\/div>\n<h3 id=\"Enable-the-subscription\" data-renderer-start-pos=\"6168\"><strong data-renderer-mark=\"true\">Enable the subscription<\/strong><\/h3>\n<div class=\"code-block css-xc5qaq\">\n<div class=\"css-6m0gf\">\n<div class=\"css-6yiv0d\">\n<div role=\"presentation\">\n<div><\/div>\n<\/div>\n<\/div>\n<\/div>\n<pre class=\"lang:default decode:true\">postgres=# ALTER SUBSCRIPTION logical_sub01 ENABLE; \nALTER SUBSCRIPTION<\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<h3 id=\"Check-the-status-of-the-replication\" data-renderer-start-pos=\"6265\"><strong data-renderer-mark=\"true\">Check the status of the replication<\/strong><\/h3>\n<div class=\"code-block css-xc5qaq\">\n<pre class=\"lang:default decode:true\">select * from pg_stat_replication;<\/pre>\n<\/div>\n<div>We appreciate you taking the time to explore the content above. I hope the information served the reason for which you sought out the blog.<\/div>\n<div>\n<p data-renderer-start-pos=\"12257\"><span class=\"fabric-text-color-mark\" data-renderer-mark=\"true\" data-text-custom-color=\"#0747a6\">Comments on how to make the blog better are indeed very appreciated. If you have any questions, suggestions, or criticism, kindly email us.<\/span><\/p>\n<p data-renderer-start-pos=\"12538\">To be informed about all of our content, subscribe now<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>PostgreSQL provides a built-in logical replication system based on the publish\/subscribe model, where a publisher publishes data changes to a publication, and a subscriber subscribes to the publication and receives the changes. Logical replication can be configured and managed using SQL commands and functions or third-party tools and libraries. The user creates the publication on&hellip;<\/p>\n","protected":false},"author":1,"featured_media":2507,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28],"tags":[166,194,245,262,266,300,305],"class_list":["post-2558","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-postgresql","tag-inital-sync","tag-logical-replication","tag-pg_dump","tag-postgres","tag-postgresql","tag-rds-to-community-postgres","tag-replication","category-28","description-off"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/posts\/2558","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/comments?post=2558"}],"version-history":[{"count":0,"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/posts\/2558\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/media\/2507"}],"wp:attachment":[{"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/media?parent=2558"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/categories?post=2558"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/divaind.com\/ie1\/wp-json\/wp\/v2\/tags?post=2558"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}