Comparison with TLA+

The previous part of this book focused on model checking runnable actor systems. Stateright is also able to model check higher level designs via "abstract models," much like a traditional model checker. This chapter compares two abstract models of the two-phase commit protocol: one written in a language called TLA+ and the other written in Rust. The chapter exists to assist those who are already familiar with TLA+, so feel free to skip if you are not familiar with TLA+.

Attribution

The TLA+ model comes from the paper "Consensus on transaction commit" by Jim Gray and Leslie Lamport and is used in accordance with the ACM's Software Copyright Notice. It has been adapted slightly for this book. Here are the copyright details from the paper:

Copyright 2005 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.

Citation:

Jim Gray and Leslie Lamport. 2006. Consensus on transaction commit. ACM Trans. Database Syst. 31, 1 (March 2006), 133–160. DOI:https://doi.org/10.1145/1132863.1132867

Lecture 6 of Leslie Lamport's TLA+ Video Course is recommended for an additional overview of the specification.

Unique Benefits of Each

Before getting to the code, it is valuable to highlight that while the functionality of TLC (the model checker for TLA+) and Stateright overlap to some degree, each has unique benefits. Enumerating some of these can assist with deciding when to use each solution.

Unique benefits of TLC/TLA+:

  • Brevity: TLA+ is more concise than Rust.
  • State Reduction: TLC supports symmetry reduction.
  • Features: TLC supports arbitrarily complex temporal properties including fairness.
  • Features: TLC supports refinement mapping between models.

Unique benefits of Stateright/Rust:

  • Additional Verification: With Stateright your model and final implementation are encouraged to share code. This eliminates a possible failure mode whereby a model and it's resulting production system implementation deviate.
  • Reuse and Extensibility: Rust has a far larger library ecosystem and is arguably more amenable to reuse than TLA+. For example, Stateright's code for defining a system's operational semantics are not built into the model checker and could be provided as an external library. The same can be said about the included register semantics, linearizability tester, and actor model to name a few other examples. More generally the entire Rust crate registry (with tens of thousands of libraries) is at your disposal. In contrast, the pattern for reuse in TLA+ is copy-paste-modify, and the number of reusable modules is relatively small.
  • Checker Performance: Stateright tends to be faster and offers additional optimization possibilities such as replacing a set with a bit vector for example. While TLC allows you to override modules with Java implementations, doing so is relatively cumbersome and rarely used.
  • Final Implementation Performance: As a more auxiliary benefit, Stateright can serve as a stress test for your final implementation, identifying regressions and also facilitating performance investigation.
  • Features: Stateright offers "sometimes" properties that serve to sanity check that expected outcomes are possible. These are less powerful than temporal properties but serve a slightly different purpose because examples of these properties being met are included in the checker discoveries. You can simulate these in TLC by introducing false "invariants," but they need to be commented out and periodically run, which is more cumbersome.
  • Features: Stateright Explorer allows you to interactively browse a model's state space and also lets you jump between discoveries (whether they are property violations or instances of "sometimes" properties).

With those out of the way, let's move on to the code comparison.

Code Comparison

Both models are parameterized by a collection of "resource managers." The TLA+ spec does not specify a type, but a set is expected. The Stateright spec maps each resource manager to an integer in the range 0..N (0 to N-1 inclusive).

CONSTANT RM    
type R = usize; // RM in 0..N

#[derive(Clone)]
struct TwoPhaseSys { pub rms: Range<R> }

Next we define variables. These are global in the TLA+ spec, and their type constraints are indicated later in the spec via an invariant. In Stateright these have a statically defined type.

VARIABLES
  rmState,
  tmState,
  tmPrepared,
  msgs           
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
struct TwoPhaseState {
    rm_state: Vec<RmState>,
    tm_state: TmState,
    tm_prepared: Vec<bool>,
    msgs: BTreeSet<Message>,
}

Types in the TLA+ spec are conveyed via an invariant that is passed to its model checker, TLC.

Message ==
  [type : {"Prepared"}, rm : RM]  \cup  [type : {"Commit", "Abort"}]
   
TypeOK ==  
  /\ rmState \in [RM -> {"working", "prepared", "committed", "aborted"}]
  /\ tmState \in {"init", "committed", "aborted"}
  /\ tmPrepared \subseteq RM
  /\ msgs \subseteq Message
#[derive(Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
enum Message { Prepared { rm: R }, Commit, Abort }

#[derive(Clone, Debug, Eq, Hash, PartialEq)]
enum RmState { Working, Prepared, Committed, Aborted }

#[derive(Clone, Debug, Eq, Hash, PartialEq)]
enum TmState { Init, Committed, Aborted }

TLA+ leverages temporal logic to convey the specification, while Stateright requires that a trait is implemented. One other distinct aspect is that Stateright actions are also types.

Spec == Init
     /\ [][Next]_<<rmState, tmState, tmPrepared, msgs>>
#[derive(Clone, Debug)]
enum Action {
    TmRcvPrepared(R),
    TmCommit,
    TmAbort,
    RmPrepare(R),
    RmChooseToAbort(R),
    RmRcvCommitMsg(R),
    RmRcvAbortMsg(R),
}

impl Model for TwoPhaseSys {
    type State = TwoPhaseState;
    type Action = Action;
    ...
}

The initial aggregate state is one where

  • the resource mangers are ready to start a transaction;
  • the sole transaction manager is ready to start a transaction;
  • the transaction manager's view of each resource manager indicates that none have indicated that they are preparing a transaction;
  • and the network is an empty set.

Note that set semantics provide an ideal model for a network in this case because they capture the fact that networks rarely make guarantees about message order. Those guarantees must be imposed by additional protocols. And keep in mind that often those protocols only provide guarantees under limited scopes (e.g. TCP only orders messages within the lifetime of that connection, so a transient network partition can still cause message redelivery with TCP).

Init ==   
  /\ rmState = [rm \in RM |-> "working"]
  /\ tmState = "init"
  /\ tmPrepared   = {}
  /\ msgs = {}
fn init_states(&self) -> Vec<Self::State> {
    vec![TwoPhaseState {
        rm_state: self.rms.clone().map(|_| RmState::Working).collect(),
        tm_state: TmState::Init,
        tm_prepared: self.rms.clone().map(|_| false).collect(),
        msgs: Default::default(),
    }]
}

Now we get to the most interesting part of the model, the state transitions (which Stateright calls actions). TLA+ requires each transition relation to precede the aggregate next state relation (Next in this case). Each action serves two roles: (1) it defines its own preconditions and (2) it definies the subsequent state change (along with the unchanged states). In Stateright it is more idiomatic (and performant) to distinguish between the preconditions (in fn actions...) and state change (in fn next_state...).

TMCommit ==
  /\ tmState = "init"
  /\ tmPrepared = RM
  /\ tmState' = "committed"
  /\ msgs' = msgs \cup {[type |-> "Commit"]}
  /\ UNCHANGED <<rmState, tmPrepared>>

TMAbort ==
  /\ tmState = "init"
  /\ tmState' = "aborted"
  /\ msgs' = msgs \cup {[type |-> "Abort"]}
  /\ UNCHANGED <<rmState, tmPrepared>>

TMRcvPrepared(rm) ==
  /\ tmState = "init"
  /\ [type |-> "Prepared", rm |-> rm] \in msgs
  /\ tmPrepared' = tmPrepared \cup {rm}
  /\ UNCHANGED <<rmState, tmState, msgs>>

RMPrepare(rm) == 
  /\ rmState[rm] = "working"
  /\ rmState' = [rmState EXCEPT ![rm] = "prepared"]
  /\ msgs' = msgs \cup {[type |-> "Prepared", rm |-> rm]}
  /\ UNCHANGED <<tmState, tmPrepared>>
  
RMChooseToAbort(rm) ==
  /\ rmState[rm] = "working"
  /\ rmState' = [rmState EXCEPT ![rm] = "aborted"]
  /\ UNCHANGED <<tmState, tmPrepared, msgs>>

RMRcvCommitMsg(rm) ==
  /\ [type |-> "Commit"] \in msgs
  /\ rmState' = [rmState EXCEPT ![rm] = "committed"]
  /\ UNCHANGED <<tmState, tmPrepared, msgs>>

RMRcvAbortMsg(rm) ==
  /\ [type |-> "Abort"] \in msgs
  /\ rmState' = [rmState EXCEPT ![rm] = "aborted"]
  /\ UNCHANGED <<tmState, tmPrepared, msgs>>

Next ==
  \/ TMCommit \/ TMAbort
  \/ \E rm \in RM : 
       TMRcvPrepared(rm) \/ RMPrepare(rm) \/ RMChooseToAbort(rm)
         \/ RMRcvCommitMsg(rm) \/ RMRcvAbortMsg(rm)
fn actions(&self, state: &Self::State, actions: &mut Vec<Self::Action>) {
    if state.tm_state == TmState::Init
            && state.tm_prepared.iter().all(|p| *p) {
        actions.push(Action::TmCommit);
    }
    if state.tm_state == TmState::Init {
        actions.push(Action::TmAbort);
    }
    for rm in self.rms.clone() {
        if state.tm_state == TmState::Init
                && state.msgs.contains(&Message::Prepared { rm }) {
            actions.push(Action::TmRcvPrepared(rm));
        }
        if state.rm_state.get(rm) == Some(&RmState::Working) {
            actions.push(Action::RmPrepare(rm));
        }
        if state.rm_state.get(rm) == Some(&RmState::Working) {
            actions.push(Action::RmChooseToAbort(rm));
        }
        if state.msgs.contains(&Message::Commit) {
            actions.push(Action::RmRcvCommitMsg(rm));
        }
        if state.msgs.contains(&Message::Abort) {
            actions.push(Action::RmRcvAbortMsg(rm));
        }
    }
}

fn next_state(&self, last_state: &Self::State, action: Self::Action)
        -> Option<Self::State> {
    let mut state = last_state.clone();
    match action {
        Action::TmRcvPrepared(rm) => {
            state.tm_prepared[rm] = true;
        }
        Action::TmCommit => {
            state.tm_state = TmState::Committed;
            state.msgs.insert(Message::Commit);
        }
        Action::TmAbort => {
            state.tm_state = TmState::Aborted;
            state.msgs.insert(Message::Abort);
        },
        Action::RmPrepare(rm) => {
            state.rm_state[rm] = RmState::Prepared;
            state.msgs.insert(Message::Prepared { rm });
        },
        Action::RmChooseToAbort(rm) => {
            state.rm_state[rm] = RmState::Aborted;
        }
        Action::RmRcvCommitMsg(rm) => {
            state.rm_state[rm] = RmState::Committed;
        }
        Action::RmRcvAbortMsg(rm) => {
            state.rm_state[rm] = RmState::Aborted;
        }
    }
    Some(state)
}

Then we get to the sole property: if a resource manager reaches a final commit/abort state, then no other resource manager can disagree with that decision.

Consistent ==
  \A rm1, rm2 \in RM : ~ /\ rmState[rm1] = "aborted"
                         /\ rmState[rm2] = "committed"
fn properties(&self) -> Vec<Property<Self>> {
    vec![
        Property::<Self>::always("consistent", |_, state| {
           !state.rm_state.iter().any(|s1|
                state.rm_state.iter().any(|s2|
                    s1 == &RmState::Aborted && s2 == &RmState::Committed))
        }),
    ]
}

Performance Comparison

Now we need to configure the model. For TLC, this is done via a special "CFG" file, while for Stateright you simply introduce a Rust test. Rust also requires a Cargo.toml file.

SPECIFICATION Spec
INVARIANT TypeOK
INVARIANT Consistent
CONSTANTS
  RM = {0,1,2,3,4,5,6}
#[cfg(test)]
#[test]
fn can_model_2pc() {
    use stateright::Checker;
    TwoPhaseSys { rms: 0..7 }.checker()
        .threads(num_cpus::get()).spawn_dfs().join()
    	.assert_properties();
}
[package]
name = "comparison-with-tlaplus"
version = "0.1.0"
edition = "2018"

[dependencies]
num_cpus = "1"
stateright = "0.30"

TLA+ can be run from the command line using a tool such as tla-bin or tlacli, while the Stateright test is run using cargo test --release. The example below first calls build --tests to avoid timing dependency compilation but then revises the file timestamp to include compilation time relevant to the development iteration cycle.

$ tlc -workers auto TwoPhase.tla  
$ export RUSTFLAGS='-C target-cpu=native'  
$ cargo build --tests --release
$ touch src/lib.rs
$ time cargo test --release

Stateright is generally faster, and the speedup tends to increase with larger state spaces. A comparison of model checking times on the author's laptop follows.

plot of checking time (y) against state count (x)

#RMStatesTLCStaterightSpeedup
7296,4483 s1.697 s1.8X
81,745,40812 s2.566 s4.7X
910,340,35290 s8.902 s10.1X
1061,515,776674 s54.709 s12.3X