How to Understand F1's Schema Change
Background
The DDL paper on F1 serves as the foundation for TiDB’s DDL implementation. There are two main papers on F1: one provides an overview of F1’s DDL, and the other specifically details the schema change method for DDL. I personally believe the second is key and more confusing to me. There is an introduction to the second paper here, which can help in understanding.
Understanding
Online DDL Concept
The DDL discussed here refers to online DDL. The concept of online DDL originates from databases like MySQL, whereas PostgreSQL and similar databases might not support it. This concept is also quite vague; the distinction is whether you need to use exclusive locks during DDL operations to block transactions. Therefore, all databases can perform online DDL; it just depends on whether they’re willing to put in the effort. For traditional businesses where 24/7 availability isn’t a priority, DDL operations can be performed during maintenance times or late at night. Even if a few users are online, at most, they might experience minor delays. However, modern internet businesses are strict about maintenance windows, creating a higher demand for non-blocking DDL, which MySQL, as a quintessential internet database, was first to support. The typical implementation involves creating a copy of the schema table, with operations being sent to both the new and old tables during the transition.
For MySQL’s supported online DDL, see this webpage. Primarily, it’s categorized into operations on indexes and columns. This explains my curiosity about why TiDB’s examples for implementation often involve adding indexes.
F1’s Method
Having worked on something similar to Aurora before, there were many issues with this area. If you’re only performing offline DDL, it doesn’t have to be this complicated. According to F1’s paper, it uses the following series of state changes to accomplish a DDL:
(reorganization)
absent -> delete only -> write only ---------------> public
In this sequence, each node passes through these four states, transitioning to the next state upon receiving a command. The agreed-upon rule is that each state doesn’t persist for more than twice the lease time across all nodes. How is this ensured? Through the following rule: if a node takes too long to move to the next state after receiving the transition command (meaning a state exceeds twice the lease time), it means the node received the command too late and will stop providing services and shut down.
Within these four states, ‘absent’ indicates a state where the node hasn’t received instructions yet, and ‘public’ signifies the completion of the DDL. What about the two middle states? The background link described them as follows:
- A delete-only table or column can be modified only by delete operations.
- A delete-only index can be modified only by delete and update operations. Update operations can delete key-value pairs corresponding to updated index keys, but they cannot create any new ones.
- A write-only column or index can have their key-value pairs modified by insert, delete, and update operations, but none of their pairs can be read by user transactions.
- A write-only constraint is applied for all new insert, delete, and update operations, but it is not guaranteed to hold over all existing data.
Summarized in a table:
delete only | write only |
---|---|
Tables and columns can only be deleted; indexes can be updated (tables not really) | Columns and indexes can be deleted, updated, and inserted |
The first state is rather peculiar, and the second one even more so—“can not read” might have been a better name for it. It’s said that this design allows two concurrent states to behave consistently, specifically pairs like (absent and delete only), (delete only and write only), and (write only and public).
Examples:
Adding index idx to table t, the deployment environment consists of two databases, a and b.
- a enters delete only and completes adding the index. b has yet to receive any instructions.
- Insert operations on a and b: a’s idx is ignored, b is unaware and also ignores.
- Read operations: a and b ignore idx.
- Deletes and updates: a’s idx responds, b ignores.
- a enters write only, b enters delete only, and indexing is completed.
- Insert operations on a and b: a’s idx responds, b ignores (b loses index data?).
- Read operations: a and b ignore idx.
- Deletes and updates: both a and b’s idx respond.
- a enters the public state, b enters the write-only state.
- Insert operations on a and b: both a and b’s idx respond.
- Read operations: a’s idx responds, b ignores.
- Deletes and updates: both a and b’s idx respond.
Removing index idx from table t, with two databases a and b in the deployment environment.
- a enters delete only and completes removing the index. b has yet to receive any instructions.
- Insert operations on a and b: a’s idx is ignored, b has index idx and inserts into the index.
- Read operations: a ignores idx, b uses it.
- Deletes and updates: a’s idx responds (no-op if the index is removed), b responds.
- a enters write only, b enters delete only, and fulfills the command.
- Insert operations on a and b: a’s idx responds (no-op), b ignores.
- Read operations: both a and b ignore idx.
- Deletes and updates: both a’s and b’s idx respond (it’s a no-op as the index is already deleted).
- a enters the public state, b enters write only.
- Insert operations on a and b: both a and b respond (b no-op?).
- Read operations: a’s idx responds, b ignores.
- Deletes and updates: both a and b’s idx respond (b no-op?).