A common obstacle we face when releasing new features is making production schema changes in MySQL. Many new features require additional columns or indexes. Running an “ALTER TABLE” in MySQL to add the needed columns and indexes locks the table, hanging the application. We need a better solution.

Option 1: Schema Change in Downtime
This is the simplest option. Put the application into downtime and perform the schema change. It requires us to have the application down for the duration of the “ALTER TABLE”. We’ve successfully used this option for smaller tables that can be altered in seconds or minutes. However, for large tables the alter can take hours making it less than desirable.

Option 2: Role Swap
This is the option that we have used in the past to perform schema changes on large tables. There are quite a few steps which make this option error-prone. When the changes are made, a short downtime to swap the application between the master and replica is required.

Here’s a sample of the process we follow to change the roles of the current master “A”, and the current replica “B”.

  1. On B, STOP SLAVE.
  2. On B, SHOW MASTER STATUS. This requires that binary logging is enabled on B.
  3. On B, execute schema changes with ALTER TABLE statements.
  4. On A, CHANGE MASTER TO MASTER_HOST=’B’, MASTER_USER=’...’, MASTER_PASSWORD=’...’, MASTER_LOG_FILE=’...’, MASTER_LOG_POS=nnn; according to the master status reported by B.
  5. On B, START SLAVE.
  6. Let B catch up to A.
  7. Stop applications writing to A, and watch SHOW PROCESSLIST until activity is totally quiet.
  8. On A, SET GLOBAL read_only=1;
  9. On B, STOP SLAVE once you’re sure it has applied all the changes replicated from A.
  10. On B, SET GLOBAL read_only=0;
  11. Reconfigure your applications to write to B instead of A, and restart applications.
  12. START SLAVE on A.
  13. Finally…Have a stiff drink. You need it at this point.

Option 3: pt-online-schema-change
We have recently started using pt-online-schema-change to perform our schema updates without needing to take downtime. It was developed by the folks at Percona. During the online schema change the original table is not locked and will continue to take reads and writes. In very basic terms here is what is happening when you run pt-online-schema-change.

  1. Create new table with same structure as original.
  2. Update schema on new table.
  3. Copy rows in batches from original table.
  4. Move original table out of the way and replace with new table.
  5. Drop old table.

Copying all rows of a table to the new table creates a lot of data that needs to be replicated to your slave. Fortunately pt-online-schema-change will monitor the status of your slave and will pause the data copy process if replication goes too far behind. We have also found that the data copy process can be disk intensive and will impact MySQL’s performance. Even with pt-online-schema-change we still choose to perform schema updates after hours to limit the impact on our applications. There are a number of settings in pt-online-schema-change which you can tune to maximize performance during the schema change. We also thoroughly test each schema change in staging prior to running production.

Next up: MySQL HA and Failover