Our Zero-Downtime MYSQL to PGSQL Migration

We recently migrated Hyvor Blogs production data from MYSQL to PGSQL without any downtime.

Here’s how we came up with a solution to sync data in real-time from our old MYSQL database to PGSQL. First, we tested several tools like pgloader, but pretty much all of them were created to do a dump-import style import which was not possible in our case.

So, we came up with a plan - we write a script that connects to the MYSQL database and fetches data and feeds it to the PGSQL database. It loops infinitely and checks for newly updated records using the updated_at column. Then we select a loop iteration where there are no new updates and route the application to the PGSQL database.

For this method to work, the following criteria must be met, we figured:

All tables must have a primary key ID column. We use this as the identifier to decide whether we should create or update a record in PGSQL. In our case, all tables had an id bigint auto-incrementing primary key column.
All tables must have an updated_at column, and the database or the application must be configured to update its value whenever a record is updated (more details below).
The speed at which the script is syncing data from MYSQL to PGSQL must be greater than the speed at which new data is coming in.

Import script

This is roughly the script we wrote in PHP as a Laravel Command, after cleaning up some application-specific code:

  1<?php
  2
  3use Illuminate\Console\Command;
  4use Illuminate\Support\Facades\DB;
  5
  6class MysqlToPgsqlCommand extends Command
  7{
  8
  9    public $signature = 'migrate:mysql-pgsql';
 10
 11    public function handle() : void
 12    {
 13
 14        // just a confirmation
 15        if (!$this->confirm('Are you sure you want to migrate the database?')) {
 16            return;
 17        }
 18
 19        // any tables that we want to skip
 20        $tablesToSkip =  [
 21            'migrations',
 22            'cache',
 23            'cache_locks',
 24            'failed_jobs',
 25            'jobs',
 26        ];
 27
 28        // get table names from PGSQL
 29        $tables = DB::select("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'");
 30        $lastRun = null;
 31
 32        // we loop forever
 33        // whenever we see a loop iteration where there are no new records,
 34        // we switch the application to use the PGSQL database
 35        while (true) {
 36
 37            // timestamp of the current loop iteration
 38            $currentRun = now()->toImmutable();
 39
 40            // start going through each table
 41            foreach ($tables as $table) {
 42
 43                // skip the table
 44                if (in_array($table->table_name, $tablesToSkip)) {
 45                    continue;
 46                }
 47
 48                $this->info("Migrating table: {$table->table_name}");
 49
 50                // Get the column names of the current table
 51                $columns = DB::select(
 52                    'SELECT column_name FROM information_schema.columns WHERE table_name = ?',
 53                    [$table->table_name]
 54                );
 55                $columnNames = array_map(fn($column) => $column->column_name, $columns);
 56
 57                // get the data from the MySQL database
 58                DB::connection('mysql')
 59                    ->table($table->table_name)
 60                    // when $lastRun is set (which means we have run the loop before)
 61                    // we only get records that have been updated since the last run
 62                    ->when($lastRun, fn($query) => $query->where('updated_at', '>=', $lastRun))
 63                    // this is important to make sure chunk is working properly
 64                    ->orderBy('id')
 65                    // get the data in chunks of 1000
 66                    ->chunk(1000, function ($rows) use ($table, $columnNames) {
 67
 68                        // we used a transaction just because it made the inserts faster
 69                        DB::transaction(function() use ($rows, $table, $columnNames) {
 70
 71                            // loop through each row
 72                            foreach ($rows as $row) {
 73                                $data = (array)$row;
 74
 75                                // filter data to only include existing columns
 76                                // (we deleted some columns during a cleanup)
 77                                $filteredData = array_filter($data, fn($key) => in_array($key, $columnNames), ARRAY_FILTER_USE_KEY);
 78
 79                                // update or insert based on the primary key, which is 'id'
 80                                DB::connection('pgsql')
 81                                    ->table($table->table_name)
 82                                    ->updateOrInsert(
 83                                        ['id' => $data['id']],
 84                                        $filteredData
 85                                    );
 86                            }
 87                        });
 88
 89
 90                    });
 91
 92                // Important: we need to set the sequence to the max id of the table
 93                // to make sure that the next insert will have the correct id after the migration
 94                DB::statement("SELECT setval('{$table->table_name}_id_seq', (SELECT MAX(id) FROM {$table->table_name}))");
 95            }
 96
 97            $lastRun = clone $currentRun;
 98            sleep(1);
 99        }
100    }
101}

We let this script run and do a full sync first. It took about 30 minutes to finish. Then, it kept looping updating new columns occasionally by checking the updated_at column. When there were no new updates coming in, we switched the database to PGSQL so that new writes go there.

Switching the database

In our case, we had set up new application servers separately to connect to MYSQL and PGSQL. At the time of this change, we changed our proxy to send requests to the PGSQL server, which was extremely fast.

In our case, we had a couple of seconds of ‘no-data’ intervals, where no new data were coming in since Hyvor Blogs is a blogging platform that is read-heavy than write-heavy. But, if this is not an option, making the MYSQL server read-only before changing the application servers would prevent adding/updating new records at the expense of a couple of seconds of downtime.

Notes

Do a full test beforehand with real data. We found several issues that were not obvious on dev.
The updated_at the column must be updated at the database level (ON UPDATE CURRENT_TIMESTAMP in MYSQL) or at the application level when a record is updated. However, historical records do not matter here. We did have a couple of tables without an updated_at column. So, we added the column with database-level updates before the migration. We only need to make sure that the updated_at column is updated during the migration.
Updating the sequence ID after inserts is required (line 94 above) if you are using auto-incrementing generated keys.
If your tables have foreign keys, you should insert data to the parent tables before the child tables.
Test for Unicode (especially multibyte) strings. We messed this up. The charset in Laravel to the MYSQL server was wrong, resulting in multibyte strings being broken. We had to run another command to fix those.
Another thing to look out for is case sensitivity. MYSQL varchar and text comparisons are case-insensitive (commonly used utf8_general_ci charset) by default while PGSQL text comparisons are case-sensitive. You might need to change your application logic to prepare for this or use citext.

Before anyone asks why we moved from MYSQL to PGSQL, well, it’s completely operational. We are planning an on-premise version of HYVOR and we want all our products to use the same DBMS for ease of operations.