OceanBase Internals: Transactions, Replay, SQL Engine, and Unit Placement

Why These Paths Matter OceanBase targets high availability and scalability in a shared-nothing cluster. The core engineering challenge is to make four critical subsystems work together with predictable latency and correctness: Write transactions must be durable, replicated, and efficiently committed. Tablet replay must recover state quickly and safely. SQL parse to execute must optimize well while respecting multi-tenant constraints. Unit placement must map tenants to physical resources without fragmentation. This article focuses on motivation, design, implementation highlights, and tradeoffs, using concrete code entry points from the OceanBase codebase. ...

January 17, 2026 · 8 min · 1517 words · Jack Yu

How to Compare Data Consistency between MySQL and PostgreSQL

Background Recently, I encountered a problem where a user wanted to synchronize data from PostgreSQL to TiDB (which uses the same protocol as MySQL) and wanted to know whether the data after synchronization is consistent. I hadn’t dealt with this kind of issue before, so I did a bit of research. Typically, to verify data consistency, you compute a checksum on both sides and compare them. TiDB (MySQL) Side For the verification of a specific table, the following SQL is used: ...

May 9, 2021 · 3 min · 461 words · Jack Yu

How to Read TiDB Source Code (Part 5)

When using TiDB, you may occasionally encounter some exceptions, such as the “Lost connection to MySQL server during query” error. This indicates that the connection between the client and the database has been disconnected (not due to user action). The reasons for disconnection can vary. This article attempts to analyze some common TiDB errors from the perspective of exception handling and code analysis. Additionally, some exceptions are not errors but performance issues due to slow execution. In the second half of this article, we will also introduce common tools for tracking performance. ...

September 8, 2020 · 5 min · 892 words · Jack Yu

How to Read TiDB Source Code (Part 4)

This article will introduce some key functions and the interpretation of logs in TiDB. Key Functions The definition of key functions varies from person to person, so the content of this section is subjective. execute The execute function is the necessary pathway for text protocol execution. It also nicely demonstrates the various processes of SQL handling. ParseSQL analyzes the SQL. The final implementation is in the parser, where SQL is parsed according to the rules introduced in the second article. Note that the parsed SQL may be a single statement or multiple statements. TiDB itself supports the multi-SQL feature, allowing multiple SQL statements to be executed at once. ...

July 31, 2020 · 7 min · 1397 words · Jack Yu

How to Read TiDB Source Code (Part 3)

In the previous article, we introduced methods for viewing syntax and configurations. In this article, we will discuss how to view system variables, including default values, scopes, and how to monitor metrics. System Variables The system variable names in TiDB are defined in tidb_vars.go. This file also includes some default values for variables, but the place where they are actually assembled is defaultSysVars. This large struct array defines the scope, variable names, and default values for all variables in TiDB. Besides TiDB’s own system variables, it also includes compatibility with MySQL’s system variables. ...

July 28, 2020 · 8 min · 1648 words · Jack Yu

How to Read TiDB Source Code (Part 2)

Continuing from the previous article, we learned how to set up the environment for reading code and where to start reading the code. In this part, we’ll introduce methods for viewing code based on some common needs. How to Check the Support Level of a Syntax There are usually two methods: Check through the parser repo Directly check within the TiDB repo Both of these methods require the environment setup from the previous article. If you haven’t tried that yet, give it a go. ...

July 12, 2020 · 4 min · 791 words · Jack Yu

How to Read TiDB Source Code (Part 1)

Background There are many articles on reading the source code of TiDB, often referred to as the “Twenty-Four Chapters Scriptures”. However, these introductions typically proceed from a macro to a micro perspective. This series attempts to introduce how to read TiDB’s source code from an easier angle. The goals we aim to achieve are: Enable readers to start reading TiDB’s code themselves, rather than understanding it passively through pre-written articles. Provide some common examples of looking into the details of the code, such as examining the scope of a variable. After all, teaching people to fish is better than giving them fish. While the code changes often, the methods remain mostly unchanged. ...

July 6, 2020 · 6 min · 1247 words · Jack Yu

How TiDB Implements the INSERT Statement

In a previous article “TiDB Source Code Reading Series (4) Overview of INSERT Statement”, we introduced the general process of the INSERT statement. Why write a separate article for INSERT? Because in TiDB, simply inserting a piece of data is the simplest and most common case. It becomes more complex when defining various behaviors within the INSERT statement, such as how to handle situations with Unique Key conflicts: Should we return an error? Ignore the current data insertion? Or overwrite existing data? Therefore, this article will continue to delve into the INSERT statement. ...

July 11, 2018 · 11 min · 2221 words · Jack Yu

How to Test CockroachDB Performance Using Benchmarksql

Why Test TPC-C First of all, TPC-C is the de facto OLTP benchmark standard. It is a set of specifications, and any database can publish its test results under this standard, so there’s no issue of quarreling over the testing tools used. Secondly, TPC-C is closer to real-world scenarios as it includes a transaction model within it. In the flow of this transaction model, there are both high-frequency simple transaction statements and low-frequency inventory query statements. Therefore, it tests the database more comprehensively and practically. ...

July 6, 2018 · 3 min · 603 words · Jack Yu

How to Test CockroachDB Performance Using Sysbench

Compiling Sysbench with pgsql Support CockroachDB uses the PostgreSQL protocol. If you want to use Sysbench for testing, you need to enable pg protocol support in Sysbench. Sysbench already supports the pg protocol, but it is not enabled by default during compilation. You can configure it with the following command: ./configure --with-pgsql Of course, preliminary work involves downloading the Sysbench source code and installing the necessary PostgreSQL header files required for compilation (you can use yum or sudo to install them). ...

June 11, 2018 · 3 min · 600 words · Jack Yu