This article is a translation of the Japanese article published on February 17th, 2022.
This article is part of our "Blog Series of Introduction of Developer Productivity Engineering at Mercari."
Author: @ichirin2501 (Core SRE Team)
In this article, I explain how I was able to overcome some constraints on special replica DBs, and apply DDL online.
We operate all kinds of systems at Mercari. However, the core of the Mercari service is composed of a monolithic system written using PHP, which use MySQL (v5.7) for its databases. Of course, we’ve continued our code freeze (*1) in order to accelerate our migration to microservices, so we generally aren’t developing features on our PHP systems. However, there are situations where we need to make changes, such as when features are developed on dependent microservices. In these cases, we need to perform DDL-related work (such as adding tables or columns) on monolithic databases. DDL is provided as a feature of MySQL. Although executing DDL as is will prevent load adjustment and could block writes for certain changes, there are some online schema change tools (such as pt-online-schema-change ["pt-osc"] or gh-ost that allow you to apply DDL online. However, Mercari has some special replica DBs which can break if we use these tools as is.
DB structure (partial)
Part of the DB structure used from our monolithic PHP system is shown below.
The special replica DB I mentioned earlier is the replica DB with a trigger defined, toward the top of the diagram. A trigger is a process started when some special event occurs in a table. This feature is used to build a replica DB with a different state. We currently use MySQL v5.7 for Mercari, with the binlog format set to MIXED (rather than the default value of ROW). We do this because we have historically done so until now, but also because the trigger defined only on the replica DB will not function using row-based replication (*2), so in order to maintain the special replica DB in the above diagram we cannot change the binlog format to ROW. Although this may, depending on the SQL statement, be deemed unsafe and fallback to ROW format even when is set to MIXED (*3) we do check for unexpected fallbacks in addition to simply avoiding the use of SQL that might cause this.
Why using online schema change tools (pt-osc, gh-ost) can break special replica DBs
When using pt-osc or gh-ost, there is a process that is initially executed that creates an empty table with a different name with DDL applied and copies data from the existing table. The SQL statement issued during this process is implemented using an INSERT … SELECT statement (*4). This is determined by MySQL to be an unsafe SQL statement, and so the binlog format falls back to ROW. Therefore, even if a trigger is defined for a table with another name on the replica DB, the trigger will not run and equivalent data will not be generated, preventing the special replica DB state from being maintained.
Now that we understand MySQL triggers and DB structure constraints, let’s cover how online schema change tools work, so that we can use these tools to apply DDL.
Process flow during gh-ost execution
The Mercari Core SRE Team decided to go with gh-ost, which makes interactive operation simple. I’ll continue my discussion under the assumption that gh-ost is being used.
The following diagram provides a simple overview of what happens when gh-ost is run.
gh-ost uses ROW format binlogs, so we use a ROW format DB for the DB from which gh-ost will retrieve binlogs (gh-ost can optionally change the DB settings if the DB does not use ROW format). We start with one table (called "table"). An empty table with a different name (called "table_gho") with DDL applied is created based on the definition of the original table, and then the existing data is copied (row-copy) from the original table. In the meantime, the original table receives updates from an application, so gh-ost reads the update differences from the binlog and continues to apply them to table_gho. You can find more details in the official gh-ost documentation.
The only time that gh-ost issues an INSERT … SELECT statement (which would cause a fallback to ROW) in binlog_format MIXED would be during the row-copy process. The process used by gh-ost to process update differences is an SQL statement where the actual value from the binlog_format ROW on the other replica is used directly, so the SQL statement itself is not determined to be unsafe and there is no fallback to ROW. In other words, the update differences in binlogs output from the master DB are written in statement format, which satisfies conditions to operate the trigger on the replica. Update differences are processed as SQL statements formatted to prevent an unsafe determination when applied to table_gho by gh-ost, even if an update SQL statement without a LIMIT clause or a UUID() function that would be determined unsafe (*5) is issued from an application. Once the row-copy process from the original table is completed, all that remains is to update differences from the application.
Procedure for using gh-ost to apply DDL
First of all, let’s see how we can replicate processes for copied rows that do not activate triggers. This is maybe unique to Mercari, but the trigger logic for the special replica DB here was pretty simple, so we could reproduce the generated data as long as we could use the original data, and we could resolve this without issue by defining and simultaneously running the same trigger on table_gho. This depends on the trigger that has been defined, so this method probably cannot be used in a lot of cases.
There were three key points involved in executing gh-ost while maintaining the special replica DB:
- The only process by gh-ost which results in binlog falling back to ROW format is the row copy phase
- We can reproduce the data generated by the trigger, as long as we can use the original data
- There is no issue with defining and simultaneously running the same trigger on table_gho
We can implement the final table cut-over for gh-ost at this time, so we can stop replication after the row copy phase, use the table_gho table generated by gh-ost (the original data) to reproduce the data, define a trigger on the table_gho table, and resume replication.
The following diagram shows the moment prior to cut-over by gh-ost:
The following diagram shows the situation after cut-over:
The table labeled "table’" is the renamed table_gho table to which DDL was applied.
Although there’s some amount of work involved here, we’re able to reap the benefits of an online schema change tool while still maintaining the Mercari special replica DB.
Many special structures have certain constraints, and this can present certain risks. There are many reasons why a company might be using a special structure. Although we continue to improve our services every day, we have to remind ourselves that these kinds of decisions were made for a good reason back then.
The Core SRE Team often runs into these so-called legacy systems, and these systems can be truly massive and have plenty of room for improvement. We’re hiring people to help us improve our services. Please contact us if that sounds interesting! We’re looking for interns as well!
- Mercari’s Microservice Migration (Winter 2019) (Japanese only)