How to Understand F1's Schema Change

Background

The DDL paper on F1 serves as the foundation for TiDB’s DDL implementation. There are two main papers on F1: one provides an overview of F1’s DDL, and the other specifically details the schema change method for DDL. I personally believe the second is key and more confusing to me. There is an introduction to the second paper here, which can help in understanding.

Understanding

Online DDL Concept

The DDL discussed here refers to online DDL. The concept of online DDL originates from databases like MySQL, whereas PostgreSQL and similar databases might not support it. This concept is also quite vague; the distinction is whether you need to use exclusive locks during DDL operations to block transactions. Therefore, all databases can perform online DDL; it just depends on whether they’re willing to put in the effort. For traditional businesses where 24/7 availability isn’t a priority, DDL operations can be performed during maintenance times or late at night. Even if a few users are online, at most, they might experience minor delays. However, modern internet businesses are strict about maintenance windows, creating a higher demand for non-blocking DDL, which MySQL, as a quintessential internet database, was first to support. The typical implementation involves creating a copy of the schema table, with operations being sent to both the new and old tables during the transition.

For MySQL’s supported online DDL, see this webpage. Primarily, it’s categorized into operations on indexes and columns. This explains my curiosity about why TiDB’s examples for implementation often involve adding indexes.

F1’s Method

Having worked on something similar to Aurora before, there were many issues with this area. If you’re only performing offline DDL, it doesn’t have to be this complicated. According to F1’s paper, it uses the following series of state changes to accomplish a DDL:

                                    (reorganization)
absent -> delete only -> write only ---------------> public

In this sequence, each node passes through these four states, transitioning to the next state upon receiving a command. The agreed-upon rule is that each state doesn’t persist for more than twice the lease time across all nodes. How is this ensured? Through the following rule: if a node takes too long to move to the next state after receiving the transition command (meaning a state exceeds twice the lease time), it means the node received the command too late and will stop providing services and shut down.

Within these four states, ‘absent’ indicates a state where the node hasn’t received instructions yet, and ‘public’ signifies the completion of the DDL. What about the two middle states? The background link described them as follows:

Summarized in a table:

delete only write only
Tables and columns can only be deleted; indexes can be updated (tables not really) Columns and indexes can be deleted, updated, and inserted

The first state is rather peculiar, and the second one even more so—“can not read” might have been a better name for it. It’s said that this design allows two concurrent states to behave consistently, specifically pairs like (absent and delete only), (delete only and write only), and (write only and public).

Examples:

Adding index idx to table t, the deployment environment consists of two databases, a and b.

  1. a enters delete only and completes adding the index. b has yet to receive any instructions.
    • Insert operations on a and b: a’s idx is ignored, b is unaware and also ignores.
    • Read operations: a and b ignore idx.
    • Deletes and updates: a’s idx responds, b ignores.
  2. a enters write only, b enters delete only, and indexing is completed.
    • Insert operations on a and b: a’s idx responds, b ignores (b loses index data?).
    • Read operations: a and b ignore idx.
    • Deletes and updates: both a and b’s idx respond.
  3. a enters the public state, b enters the write-only state.
    • Insert operations on a and b: both a and b’s idx respond.
    • Read operations: a’s idx responds, b ignores.
    • Deletes and updates: both a and b’s idx respond.

Removing index idx from table t, with two databases a and b in the deployment environment.

  1. a enters delete only and completes removing the index. b has yet to receive any instructions.
    • Insert operations on a and b: a’s idx is ignored, b has index idx and inserts into the index.
    • Read operations: a ignores idx, b uses it.
    • Deletes and updates: a’s idx responds (no-op if the index is removed), b responds.
  2. a enters write only, b enters delete only, and fulfills the command.
    • Insert operations on a and b: a’s idx responds (no-op), b ignores.
    • Read operations: both a and b ignore idx.
    • Deletes and updates: both a’s and b’s idx respond (it’s a no-op as the index is already deleted).
  3. a enters the public state, b enters write only.
    • Insert operations on a and b: both a and b respond (b no-op?).
    • Read operations: a’s idx responds, b ignores.
    • Deletes and updates: both a and b’s idx respond (b no-op?).

Comments

comments powered by Disqus