(Replying to PARENT post)
This tweet from Rich provides the missing puzzle pieces:
"The log is not a btree - used for durability, not query. Separate indexes combine memory with batch-updated storage." https://twitter.com/richhickey/status/420910382538948608
Restating in my own words:
The transaction log is essentially just a stream of asserts and retracts with metadata. That's used for durability first and echoed to clients/peers and the indexing engine. The indexing engine asynchronously batch updates indexes (which probably are b-trees, but have no need to be streaming append-only style in this context). The peers query a joined dataset of the latest consistent indexes plus the set of unindexed transactions.
> As append-only B tree is persistent.
You're right, it can be. However, the popular append-only b-tree systems (such as CouchDB) are actually append-only B+-tree systems. That "+" means the leaves of the tree are intrusively linked together, and so are not persistent. You can not "fork" such trees cheaply.
> losing information is a specific feature that was added to git
(Replying to PARENT post)
And there is something amusing about using git as an example of how there is no technical argument against losing information. Because, losing information is a specific feature that was added to git. (Look up shallow clones.) I can accept that there is no technical reason to have this, actually. But there are pragmatic reasons to have it. Which is basically the point of this article.