<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Distributed-Systems on Mini Fish</title>
    <link>https://blog.minifish.org/tags/distributed-systems/</link>
    <description>Recent content in Distributed-Systems on Mini Fish</description>
    <image>
      <title>Mini Fish</title>
      <url>https://blog.minifish.org/android-chrome-512x512.png</url>
      <link>https://blog.minifish.org/android-chrome-512x512.png</link>
    </image>
    <generator>Hugo -- 0.154.5</generator>
    <language>en-US</language>
    <copyright>Mini Fish 2014-present. Licensed under CC-BY-NC</copyright>
    <lastBuildDate>Sat, 17 Jan 2026 10:00:00 +0800</lastBuildDate>
    <atom:link href="https://blog.minifish.org/tags/distributed-systems/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>OceanBase Internals: Transactions, Replay, SQL Engine, and Unit Placement</title>
      <link>https://blog.minifish.org/posts/oceanbase-internals-transaction-replay-sql-unit-placement/</link>
      <pubDate>Sat, 17 Jan 2026 10:00:00 +0800</pubDate>
      <guid>https://blog.minifish.org/posts/oceanbase-internals-transaction-replay-sql-unit-placement/</guid>
      <description>&lt;h2 id=&#34;why-these-paths-matter&#34;&gt;Why These Paths Matter&lt;/h2&gt;
&lt;p&gt;OceanBase targets high availability and scalability in a shared-nothing cluster. The core engineering challenge is to make four critical subsystems work together with predictable latency and correctness:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Write transactions must be durable, replicated, and efficiently committed.&lt;/li&gt;
&lt;li&gt;Tablet replay must recover state quickly and safely.&lt;/li&gt;
&lt;li&gt;SQL parse to execute must optimize well while respecting multi-tenant constraints.&lt;/li&gt;
&lt;li&gt;Unit placement must map tenants to physical resources without fragmentation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This article focuses on motivation, design, implementation highlights, and tradeoffs, using concrete code entry points from the OceanBase codebase.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<h2 id="why-these-paths-matter">Why These Paths Matter</h2>
<p>OceanBase targets high availability and scalability in a shared-nothing cluster. The core engineering challenge is to make four critical subsystems work together with predictable latency and correctness:</p>
<ul>
<li>Write transactions must be durable, replicated, and efficiently committed.</li>
<li>Tablet replay must recover state quickly and safely.</li>
<li>SQL parse to execute must optimize well while respecting multi-tenant constraints.</li>
<li>Unit placement must map tenants to physical resources without fragmentation.</li>
</ul>
<p>This article focuses on motivation, design, implementation highlights, and tradeoffs, using concrete code entry points from the OceanBase codebase.</p>
<h2 id="system-architecture">System Architecture</h2>
<p>OceanBase adopts a shared-nothing architecture where each node is equal and runs its own SQL engine, storage engine, and transaction engine. Understanding the overall architecture is essential before diving into implementation details.</p>
<h3 id="cluster-zone-and-node-organization">Cluster, Zone, and Node Organization</h3>
<pre tabindex="0"><code class="language-mermaid" data-lang="mermaid">graph TB
    subgraph Cluster[&#34;OceanBase Cluster&#34;]
        subgraph Z1[&#34;Zone 1&#34;]
            N1[&#34;OBServer Node 1&#34;]
        end
        subgraph Z2[&#34;Zone 2&#34;]
            N2[&#34;OBServer Node 2&#34;]
        end
        subgraph Z3[&#34;Zone 3&#34;]
            N3[&#34;OBServer Node 3&#34;]
        end
    end
    
    subgraph Proxy[&#34;obproxy Layer&#34;]
        P1[&#34;obproxy 1&#34;]
        P2[&#34;obproxy 2&#34;]
    end
    
    Client[&#34;Client Applications&#34;] --&gt; Proxy
    Proxy --&gt; N1
    Proxy --&gt; N2
    Proxy --&gt; N3
</code></pre><p><strong>Key Concepts:</strong></p>
<ul>
<li><strong>Cluster</strong>: A collection of nodes working together</li>
<li><strong>Zone</strong>: Logical availability zones for high availability and disaster recovery</li>
<li><strong>OBServer</strong>: Service process on each node handling SQL, storage, and transactions</li>
<li><strong>obproxy</strong>: Stateless proxy layer routing SQL requests to appropriate OBServer nodes</li>
</ul>
<h3 id="data-organization-partition-tablet-and-log-stream">Data Organization: Partition, Tablet, and Log Stream</h3>
<pre tabindex="0"><code class="language-mermaid" data-lang="mermaid">graph TB
    subgraph Table[&#34;Table&#34;]
        P1[&#34;Partition 1&#34;]
        P2[&#34;Partition 2&#34;]
        P3[&#34;Partition 3&#34;]
    end
    
    subgraph LS1[&#34;Log Stream 1&#34;]
        T1[&#34;Tablet 1&#34;]
        T2[&#34;Tablet 2&#34;]
    end
    
    subgraph LS2[&#34;Log Stream 2&#34;]
        T3[&#34;Tablet 3&#34;]
    end
    
    subgraph LS3[&#34;Log Stream 3&#34;]
        T4[&#34;Tablet 4&#34;]
    end
    
    P1 --&gt; T1
    P1 --&gt; T2
    P2 --&gt; T3
    P3 --&gt; T4
    
    T1 --&gt; LS1
    T2 --&gt; LS1
    T3 --&gt; LS2
    T4 --&gt; LS3
</code></pre><p><strong>Key Concepts:</strong></p>
<ul>
<li><strong>Partition</strong>: Logical shard of a table (hash, range, list partitioning)</li>
<li><strong>Tablet</strong>: Physical storage object storing ordered data records for a partition</li>
<li><strong>Log Stream (LS)</strong>: Replication unit using Multi-Paxos for data consistency</li>
<li><strong>Replication</strong>: Each tablet has multiple replicas across zones, with one leader accepting writes. Log streams replicate data via Multi-Paxos protocol across different zones.</li>
</ul>
<h3 id="multi-tenant-resource-model">Multi-Tenant Resource Model</h3>
<pre tabindex="0"><code class="language-mermaid" data-lang="mermaid">graph TB
    subgraph Tenant[&#34;Tenant&#34;]
        T1[&#34;Tenant 1 MySQL Mode&#34;]
        T2[&#34;Tenant 2 Oracle Mode&#34;]
        T3[&#34;System Tenant&#34;]
    end
    
    subgraph Pool[&#34;Resource Pool&#34;]
        RP1[&#34;Pool 1&#34;]
        RP2[&#34;Pool 2&#34;]
        RP3[&#34;Pool 3&#34;]
    end
    
    subgraph Unit[&#34;Resource Unit&#34;]
        U1[&#34;Unit 1 CPU Memory Disk&#34;]
        U2[&#34;Unit 2 CPU Memory Disk&#34;]
        U3[&#34;Unit 3 CPU Memory Disk&#34;]
    end
    
    subgraph Server[&#34;Physical Server&#34;]
        S1[&#34;Server 1&#34;]
        S2[&#34;Server 2&#34;]
        S3[&#34;Server 3&#34;]
    end
    
    T1 --&gt; RP1
    T2 --&gt; RP2
    T3 --&gt; RP3
    
    RP1 --&gt; U1
    RP2 --&gt; U2
    RP3 --&gt; U3
    
    U1 --&gt; S1
    U2 --&gt; S2
    U3 --&gt; S3
</code></pre><p><strong>Key Concepts:</strong></p>
<ul>
<li><strong>Tenant</strong>: Isolated database instance (MySQL or Oracle compatibility)</li>
<li><strong>Resource Pool</strong>: Groups resource units for a tenant across zones</li>
<li><strong>Resource Unit</strong>: Virtual container with CPU, memory, and disk resources</li>
<li><strong>Unit Placement</strong>: Rootserver schedules units to physical servers based on resource constraints</li>
</ul>
<h3 id="layered-architecture">Layered Architecture</h3>
<pre tabindex="0"><code class="language-mermaid" data-lang="mermaid">graph TB
    subgraph App[&#34;Application Layer&#34;]
        Client[&#34;Client Applications&#34;]
    end
    
    subgraph Proxy[&#34;Proxy Layer&#34;]
        ODP[&#34;obproxy Router&#34;]
    end
    
    subgraph OBServer[&#34;OBServer Layer&#34;]
        subgraph Node[&#34;OBServer Node&#34;]
            SQL[&#34;SQL Engine&#34;]
            TX[&#34;Transaction Engine&#34;]
            ST[&#34;Storage Engine&#34;]
        end
    end
    
    subgraph Storage[&#34;Storage Layer&#34;]
        subgraph LS[&#34;Log Stream&#34;]
            Tablet[&#34;Tablet&#34;]
            Memtable[&#34;Memtable&#34;]
            SSTable[&#34;SSTable&#34;]
        end
        Palf[&#34;Paxos Log Service&#34;]
    end
    
    subgraph Resource[&#34;Resource Layer&#34;]
        Tenant[&#34;Tenant&#34;]
        Unit[&#34;Resource Unit&#34;]
        Pool[&#34;Resource Pool&#34;]
        RS[&#34;Rootserver&#34;]
    end
    
    Client --&gt; ODP
    ODP --&gt; SQL
    SQL --&gt; TX
    TX --&gt; ST
    ST --&gt; LS
    LS --&gt; Palf
    Tenant --&gt; Unit
    Unit --&gt; Pool
    Pool --&gt; RS
    RS --&gt; Node
</code></pre><p><strong>Key Concepts:</strong></p>
<ul>
<li><strong>SQL Engine</strong>: Parses, optimizes, and executes SQL statements</li>
<li><strong>Transaction Engine</strong>: Manages transaction lifecycle, commit protocols, and consistency</li>
<li><strong>Storage Engine</strong>: Handles data organization, memtables, and SSTables</li>
<li><strong>Log Service</strong>: Provides Paxos-based replication and durability</li>
<li><strong>Rootserver</strong>: Manages cluster metadata, resource scheduling, and placement</li>
</ul>
<h2 id="design-overview">Design Overview</h2>
<p>At a high level, each node runs a full SQL engine, storage engine, and transaction engine. Data is organized into tablets, which belong to log streams. Log streams replicate changes using Paxos-based log service. Tenants slice resources via unit configurations and pools, while rootserver components place those units on servers.</p>
<p>The following sections walk through each path with the relevant implementation anchors.</p>
<h2 id="architecture-diagrams">Architecture Diagrams</h2>
<h3 id="transaction-write-path">Transaction Write Path</h3>
<pre tabindex="0"><code class="language-mermaid" data-lang="mermaid">flowchart LR
  subgraph A[&#34;Transaction Write Path&#34;]
    C[&#34;Client&#34;] --&gt; S[&#34;SQL Engine&#34;]
    S --&gt; T[&#34;Transaction Context&#34;]
    T --&gt; M[&#34;Memtable Write&#34;]
    M --&gt; R[&#34;Redo Buffer&#34;]
    R --&gt; L[&#34;Log Service&#34;]
    L --&gt; P[&#34;Replicated Log&#34;]
    P --&gt; K[&#34;Commit Result&#34;]
  end
</code></pre><h3 id="tablet-replay-path">Tablet Replay Path</h3>
<pre tabindex="0"><code class="language-mermaid" data-lang="mermaid">flowchart LR
  subgraph B[&#34;Tablet Replay Path&#34;]
    L[&#34;Log Service&#34;] --&gt; RS[&#34;Replay Service&#34;]
    RS --&gt; E[&#34;Tablet Replay Executor&#34;]
    E --&gt; LS[&#34;Log Stream&#34;]
    LS --&gt; TB[&#34;Tablet&#34;]
    TB --&gt; CK[&#34;Replay Checks&#34;]
    CK --&gt; AP[&#34;Apply Operation&#34;]
    AP --&gt; ST[&#34;Updated Tablet State&#34;]
  end
</code></pre><h3 id="sql-compile-and-execute">SQL Compile and Execute</h3>
<pre tabindex="0"><code class="language-mermaid" data-lang="mermaid">flowchart LR
  subgraph C[&#34;SQL Compile and Execute&#34;]
    Q[&#34;SQL Text&#34;] --&gt; P[&#34;Parser&#34;]
    P --&gt; R[&#34;Resolver&#34;]
    R --&gt; W[&#34;Rewriter&#34;]
    W --&gt; O[&#34;Optimizer&#34;]
    O --&gt; LP[&#34;Logical Plan&#34;]
    LP --&gt; CG[&#34;Code Generator&#34;]
    CG --&gt; PP[&#34;Physical Plan&#34;]
    PP --&gt; EX[&#34;Executor&#34;]
  end
</code></pre><h3 id="unit-placement">Unit Placement</h3>
<pre tabindex="0"><code class="language-mermaid" data-lang="mermaid">flowchart LR
  subgraph D[&#34;Unit Placement&#34;]
    UC[&#34;Unit Config&#34;] --&gt; RP[&#34;Resource Pool&#34;]
    RP --&gt; PS[&#34;Placement Strategy&#34;]
    PS --&gt; CS[&#34;Candidate Servers&#34;]
    CS --&gt; CH[&#34;Chosen Server&#34;]
    CH --&gt; UN[&#34;Unit Instance&#34;]
  end
</code></pre><h2 id="write-transaction-from-memtable-to-replicated-log">Write Transaction: From Memtable to Replicated Log</h2>
<h3 id="motivation">Motivation</h3>
<p>A write transaction must be both fast and durable. OceanBase uses memtables for in-memory writes, and a log stream for redo replication. The design must allow low-latency commit while supporting parallel redo submission and multi-participant (2PC) transactions.</p>
<h3 id="design-sketch">Design Sketch</h3>
<ul>
<li>Each transaction is represented by a per-LS context (<code>ObPartTransCtx</code>).</li>
<li>Redo is flushed based on pressure or explicit triggers.</li>
<li>Commit chooses one-phase or two-phase based on participants.</li>
<li>Logs are submitted via a log adapter backed by logservice.</li>
</ul>
<h3 id="implementation-highlights">Implementation Highlights</h3>
<ul>
<li>Transaction context lifecycle and commit logic are in <code>src/storage/tx/ob_trans_part_ctx.cpp</code>.</li>
<li>Redo submission is driven by <code>submit_redo_after_write</code>, which switches between serial and parallel logging based on thresholds.</li>
<li>Commit decides between one-phase and two-phase commit depending on participant count.</li>
<li>The log writer (<code>ObTxLSLogWriter</code>) submits serialized logs via <code>ObITxLogAdapter</code>, which is wired to logservice (<code>ObLogHandler</code>).</li>
</ul>
<h3 id="tradeoffs">Tradeoffs</h3>
<ul>
<li><strong>Serial vs parallel redo</strong>: Serial logging is simpler and cheaper for small transactions, but parallel logging reduces tail latency for large transactions at the cost of more coordination.</li>
<li><strong>1PC vs 2PC</strong>: 1PC is fast for single-participant transactions; 2PC is required for distributed consistency but increases coordination overhead.</li>
<li><strong>In-memory batching vs durability</strong>: Larger batching improves throughput but can delay durability and increase replay time.</li>
</ul>
<h2 id="tablet-replay-reconstructing-state-safely">Tablet Replay: Reconstructing State Safely</h2>
<h3 id="motivation-1">Motivation</h3>
<p>Recovery needs to be deterministic and safe: the system must replay logs to reconstruct tablet state without violating invariants or applying obsolete data.</p>
<h3 id="design-sketch-1">Design Sketch</h3>
<ul>
<li>Logservice schedules replay tasks per log stream.</li>
<li>Tablet replay executor fetches the LS, locates the tablet, validates replay conditions, and applies the log.</li>
<li>Specialized replay executors handle different log types (e.g., schema updates, split operations).</li>
</ul>
<h3 id="implementation-highlights-1">Implementation Highlights</h3>
<ul>
<li>Replay orchestration lives in <code>src/logservice/replayservice/ob_log_replay_service.cpp</code>.</li>
<li>Tablet replay logic is in <code>src/logservice/replayservice/ob_tablet_replay_executor.cpp</code>.</li>
<li>Specific tablet operations are applied in dedicated executors, such as <code>ObTabletServiceClogReplayExecutor</code> in <code>src/storage/tablet/ob_tablet_service_clog_replay_executor.cpp</code>.</li>
</ul>
<h3 id="tradeoffs-1">Tradeoffs</h3>
<ul>
<li><strong>Strictness vs throughput</strong>: Replay barriers enforce ordering for correctness but can reduce parallelism.</li>
<li><strong>Tablet existence checks</strong>: Allowing missing tablets can speed recovery but requires careful validation to avoid partial state.</li>
<li><strong>MDS synchronization</strong>: Metadata state updates improve correctness but add contention via locks.</li>
</ul>
<h2 id="sql-parse-to-execute-compile-pipeline-for-performance">SQL Parse to Execute: Compile Pipeline for Performance</h2>
<h3 id="motivation-2">Motivation</h3>
<p>OceanBase supports MySQL and Oracle compatibility with rich SQL features. The compile pipeline must be fast, cache-friendly, and yield efficient execution plans.</p>
<h3 id="design-sketch-2">Design Sketch</h3>
<ul>
<li>SQL text enters the engine via <code>ObSql::stmt_query</code>.</li>
<li>Parsing produces a parse tree.</li>
<li>Resolution turns the parse tree into a typed statement tree.</li>
<li>Rewrite and optimization generate a logical plan.</li>
<li>Code generation produces a physical plan and execution context.</li>
</ul>
<h3 id="implementation-highlights-2">Implementation Highlights</h3>
<ul>
<li>Entry and query handling: <code>src/sql/ob_sql.cpp</code> (<code>stmt_query</code>, <code>handle_text_query</code>).</li>
<li>Resolver: <code>ObResolver</code> in <code>src/sql/resolver/ob_resolver.h</code>.</li>
<li>Transform and optimize: <code>ObSql::transform_stmt</code> and <code>ObSql::optimize_stmt</code> in <code>src/sql/ob_sql.cpp</code>.</li>
<li>Code generation: <code>ObSql::code_generate</code> in <code>src/sql/ob_sql.cpp</code>.</li>
</ul>
<h3 id="tradeoffs-2">Tradeoffs</h3>
<ul>
<li><strong>Plan cache vs compile accuracy</strong>: Plan caching reduces latency but may reuse suboptimal plans under changing data distributions.</li>
<li><strong>Rewrite aggressiveness</strong>: More transformations can yield better plans but increase compile cost.</li>
<li><strong>JIT and rich formats</strong>: Faster execution for some workloads, but added complexity and memory pressure.</li>
</ul>
<h2 id="unit-placement-scheduling-tenant-resources">Unit Placement: Scheduling Tenant Resources</h2>
<h3 id="motivation-3">Motivation</h3>
<p>Multi-tenancy requires predictable isolation and efficient resource utilization. Unit placement must respect CPU, memory, and disk constraints while minimizing fragmentation.</p>
<h3 id="design-sketch-3">Design Sketch</h3>
<ul>
<li>Unit config defines resource demands.</li>
<li>Resource pool groups units by tenant and zone.</li>
<li>Placement strategy scores candidate servers to pick a host for each unit.</li>
</ul>
<h3 id="implementation-highlights-3">Implementation Highlights</h3>
<ul>
<li>Resource types and pools: <code>src/share/unit/ob_unit_config.h</code>, <code>src/share/unit/ob_resource_pool.h</code>, <code>src/share/unit/ob_unit_info.h</code>.</li>
<li>Placement policy: <code>src/rootserver/ob_unit_placement_strategy.cpp</code> uses a weighted dot-product of remaining resources to choose a server.</li>
<li>Orchestration: <code>src/rootserver/ob_unit_manager.cpp</code> handles creation, alteration, and migration of units and pools.</li>
</ul>
<h3 id="tradeoffs-3">Tradeoffs</h3>
<ul>
<li><strong>Greedy placement vs global optimality</strong>: Dot-product scoring is efficient and practical but may not be globally optimal.</li>
<li><strong>Capacity normalization</strong>: Assuming comparable server capacities simplifies scoring but may bias placement in heterogeneous clusters.</li>
<li><strong>Latency vs stability</strong>: Fast placement decisions can lead to more churn; conservative placement improves stability but can reduce utilization.</li>
</ul>
<h2 id="closing-thoughts">Closing Thoughts</h2>
<p>These four paths demonstrate how OceanBase balances correctness, performance, and operability. The code structure follows clear separation of responsibilities: transaction logic is in <code>storage/tx</code>, replication and replay are in <code>logservice</code>, SQL compilation is in <code>sql</code>, and scheduling is in <code>rootserver</code> and <code>share/unit</code>. The tradeoffs are explicit and largely encoded in thresholds and policies, which makes tuning feasible without invasive rewrites.</p>
<p>If you are extending OceanBase, start with the entry points highlighted above and follow the call chains into the relevant subsystem. It is the fastest way to build a mental model grounded in the actual implementation.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Understanding the CAP Theorem</title>
      <link>https://blog.minifish.org/posts/understanding-the-cap-theorem/</link>
      <pubDate>Wed, 20 Dec 2017 22:21:06 +0800</pubDate>
      <guid>https://blog.minifish.org/posts/understanding-the-cap-theorem/</guid>
      <description>&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;The CAP theorem has become one of the hottest theorems in recent years; when discussing distributed systems, CAP is inevitably mentioned. However, I feel that I haven&amp;rsquo;t thoroughly understood it, so I wanted to write a blog post to record my understanding. I will update the content as I gain new insights.&lt;/p&gt;
&lt;h2 id=&#34;understanding&#34;&gt;Understanding&lt;/h2&gt;
&lt;p&gt;I read the first part of this &lt;a href=&#34;https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45855.pdf&#34;&gt;paper&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The CAP theorem [Bre12] says that you can only have two of the three desirable properties of:&lt;/p&gt;</description>
      <content:encoded><![CDATA[<h2 id="background">Background</h2>
<p>The CAP theorem has become one of the hottest theorems in recent years; when discussing distributed systems, CAP is inevitably mentioned. However, I feel that I haven&rsquo;t thoroughly understood it, so I wanted to write a blog post to record my understanding. I will update the content as I gain new insights.</p>
<h2 id="understanding">Understanding</h2>
<p>I read the first part of this <a href="https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45855.pdf">paper</a>.</p>
<blockquote>
<p>The CAP theorem [Bre12] says that you can only have two of the three desirable properties of:</p>
<ul>
<li>C: Consistency, which we can think of as serializability for this discussion;</li>
<li>A: 100% availability, for both reads and updates;</li>
<li>P: tolerance to network partitions.</li>
</ul>
<p>This leads to three kinds of systems: CA, CP and AP, based on what letter you leave out.</p>
</blockquote>
<p>Let me share my understanding, using a network composed of three machines (x, y, and z) as an example:</p>
<ul>
<li>
<p><strong>C (Consistency)</strong>: The three machines appear as one. Operations of addition, deletion, modification, and query on any one machine should always be consistent. That is, if you read data from x and then read from y, the results are the same. If you write data to x and then read from y, you should also read the newly written data. Wikipedia also specifically mentions that it&rsquo;s acceptable to read the data just written to x from y after a short period of time (eventual consistency).</p>
</li>
<li>
<p><strong>A (Availability)</strong>: The three machines, as a whole, must always be readable and writable; even if some parts fail, it must be readable and writable.</p>
</li>
<li>
<p><strong>P (Partition Tolerance)</strong>: If the network between x, y, and z is broken, any machine cannot or refuses to provide services; it is neither readable nor writable.</p>
</li>
</ul>
<p>Here, <strong>C</strong> is the easiest to understand. The concepts of <strong>A</strong> and <strong>P</strong> are somewhat vague and easy to confuse.</p>
<p>Now let&rsquo;s discuss the three combinations:</p>
<p>If the network between x, y, and z is disconnected:</p>
<ul>
<li>
<p><strong>CA</strong>: Ensure data consistency (<strong>C</strong>). When x writes data, y can read it (<strong>C</strong>). Allow the system to continue providing services—even if only x and y are operational—ensuring it is readable and writable (<strong>A</strong>). We can only tolerate z not providing service; it cannot read or write, nor return incorrect data (losing <strong>P</strong>).</p>
</li>
<li>
<p><strong>CP</strong>: Ensure data consistency (<strong>C</strong>). Allow all three machines to provide services (even if only for reads) (<strong>P</strong>). We can only tolerate that x, y, and z cannot write (losing <strong>A</strong>).</p>
</li>
<li>
<p><strong>AP</strong>: Allow all three machines to write (<strong>A</strong>). Allow all three machines to provide services (reads count) (<strong>P</strong>). We can only tolerate that the data written by x and y doesn&rsquo;t reach z; z will return data inconsistent with x and y (losing <strong>C</strong>).</p>
</li>
</ul>
<p><strong>CA</strong> is exemplified by Paxos/Raft, which are majority protocols that sacrifice <strong>P</strong>; minority nodes remain completely silent. <strong>CP</strong> represents a read-only system; if a system is read-only, whether there&rsquo;s a network partition doesn&rsquo;t really matter—the tolerance to network partitions is infinitely large. <strong>AP</strong> is suitable for systems that only append and do not update—only inserts, no deletes or updates. Finally, by merging the results together, it can still function.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
