Bitcoin in Bloom: How IBLTs Allow Bitcoin to Scale

Bitcoin Chief Scientist Gavin Andresen believes a new data structure, Invertible Bloom Lookup Tables (IBLTs), could realign the interests of miners with the rest of the network. Currently, Bitcoin miners have an incentive to keep blocks as small as possible. The smaller the block, the faster it spreads across the payment network. The faster a block spreads, the sooner the network accepts it. Two valid blocks mined by different operators at the same time enter a race to see which block can reach the most nodes the fastest. The first block to reach the majority of the network wins. The other becomes an orphaned block. With 25 bitcoins on the line, why should pool operators include transactions, which slow propagation time, in blocks at all?

Let’s illustrate the issue: In the year 2034, MondoHash Corp mines block #1,523,292. Block size limits have since increased to 1TB. Real estate, contracts and property ownership are being managed by the blockchain. MondoHash’s block is only almost full, containing nearly 1TB of transaction data. At the same time, CryptoFunk mines the same block. CryptoFunk does not include the 1TB of transaction data in its blocks. Their block is a few KB in size (10^9 smaller). If otherwise left unchanged from today, the Bitcoin network would receive the CryptoFunk block long before MondoHash’s block. Similar to how your computer downloads an image faster than it would download a Blu-ray movie.

Hypothetically, if it took every full node one hour to download MondoHash’s block while only a fraction of a second for CryptoFunk’s block there wouldn’t need to be a tie at all. By serving the network, MondoHash is handicapped. Their block propagation is extra time other miners get to keep mining in hopes that they find a block themselves. And, if successful, those miners could submit a much smaller block that lacks transactions and beats MondoHash – even much later than the original correct block.

This scenario is one of the often cited, esoteric, intentionally vague and dystopic futures critics use – “Bitcoin doesn’t scale.”

But that is years and many technological advances away – too many to account. Even at this time the Bitcoin network is nowhere near its capacity. This scenario is still unlikely to occur, even in the future. Bitcoin’s greatest minds are at work on the “scalability” issue.

Then Who Cares?

How does this affect us now? Well, Bitcoin allows transactions without fees – fees are like a tip added to the block reward for the miners. Many transactions meet those requirements. It’s possible (though not feasible) that some or all miners could begin ignoring transactions without fees in order to save precious bits and seconds to secure the block reward. Even transactions that include fees may not provide enough incentive to be included in the block – why risk the entire block reward for significantly smaller transaction fees?

When you broadcast your transaction to the network, it waits in a node’s transaction pool for the next block. The mining software will include X KB worth of data from the transaction pool in the block – sorted by mining fees of course. A transaction included in a block receives its first confirmation. The block, then gets forwarded to the miner’s known nodes. Those nodes forward to their known nodes and this repeats until the transaction fully propagates the network.

Possibly, competition between large miners could become so intense that they begin ignoring transactions in favor of a faster block confirmation. Our transactions would still exist, albeit indefinitely in limbo. They would have zero confirmations until a miner finally chose to include the transaction or it expires (very unlikely). Empty blocks are mined. In fact, most of the early blocks had no transactions around the time Satoshi was mining. For now, though, the number of miners and transactions are not a problem. As mentioned, the issue is in Bitcoin scaling and the critics have a point.

Scalability

Scalability is how well a system accommodates the increased demand for its services. As a payment network Bitcoin can practically support 7 transactions per second (tps) – sort of. For comparison, Visa averages 2,000 tps, peaking at 47,000. For Bitcoin to process the world’s transactions, it will need to surpass Visa eventually.

I say Bitcoin sort of handles 7 tps because transactions are stored in a mem pool, not written to blocks as they broadcast. The current maximum block size is 1MB. Using the average block time of 10 minutes we discover the formula for how many transactions Bitcoin can support. Here’s the back of the napkin math:

Average Transaction Size = AvgTxSize Max Transactions = MaxTx = (1MB / AvgTxSize) Max Transactions per Second = MaxTPS = MaxTx / 600s If AvgTxSize = 250 Bytes then MaxTx = 4000 and MaxTPS = 6.66

The maximum transactions a block can handle = the block size divided by the average transaction size. Transaction size can vary, but I won’t explain every reason here. Just note that the size of a Tx increases depending on the number of inputs/outputs.

Satoshi added the 1MB block size to prevent DOS attacks. It would be possible to spam the network with absurdly large blocks. Miners would be stuck endlessly verifying bad block and transaction data. This limit can be changed any time. For skeptics claiming we must do so soon, Bitcoin only uses about 10% – 20% of it’s MAXTPS.

Typically, it’s cheaper to add new hardware to the system than performance tune software. You could spend months looking at your website’s code to try to increase the maximum number of visitors you can support. Or, you could just pay another $10 a month for a second server and maybe double capacity. For Bitcoin, it’s different. Adding new nodes to the network just increases the propagation time.

Additionally, the Bitcoin network is decentralized and upgrading the physical wires is impractical – the forces who control that are, or will be, at odds with Bitcoin in the future. So increasing hardware is not an option unless there are some Bit Angels pushing the development of the Internet 2.0.

The IBLT Proposal

Remember those pools where are transactions are kept in limbo until a miner decides to place them into a block? Gavin Andresen thinks we can use these pools to reduce overhead on the network. In his published proposed approach, he says:

If memory pools were perfectly synchronized and all miners had exactly the same policy for choosing which transactions to include in their blocks and in what order to include them, then miners could just announce new blocks as the fixed 80-byte block header and the coinbase transaction; there would be no need to include any transaction data at all, it could be reconstructed from their peers’ memory pools.

Back to our illustration: The year is 2034 (again) MondoHash mines a block. Instead of sending 1TB of data with a block, it sends the block header (Identifying Block Info) and the coinbase transaction (Claiming the Block Reward) and some metadata about included transactions. Effectively, this is very similar to what Cryptofunk was doing in the previous example.

Currently, miners can choose what data goes in the block. Under Gavin’s proposal, all miners would have synchronized transaction pools. Once synchronized the block data would not need to be sent along with a block – transactions already exist in the synced pool. Miner’s will broadcast block outlines or framework. Network nodes follow the framework, properly filling the block with transactions.

There is another tool in programming called Big O Notation. It is used to rate software computational speed on the “worst-case” basis. Two programmers could write different code to achieve a similar end, but their implementation could be completely different. Big O notation is one metric that can measure the amount of work the computer does for each bit of code.

For example, we can use Big O Notation to compare the current block propagation time with the proposed (IBLT) block propagation time. It gets much more complex than this but here are some examples of ratings from fastest to slowest

O(1) – Constant Time

Opening a dictionary to the correct word, every time the first time.
A function that evaluates and returns True/False

O(log(n)) – Logarithmic Time

Eliminate half the possible dictionary words with each guess.
Binary search

O(n) – Linear Time

Finding a word in the dictionary one word at a time.
ArrayList.ToString

O(n^2) – Quadratic Time

Checking every item in inventory for each item on the inventory list.
Nested Loops

O(n^n) – Fail

Microsoft is hiring
There is a fantastic explanation here if you would like to know more.

Miners’ current choice is between O(1), including no transactions, and O(n), including n transactions. What “n” depends on how many transactions they include; one, all, just the ones with fees, etc. Gavin’s proposal would remove the choice and put all miners on O(1) time for propagating their blocks.

Mining would change in the following ways:

The canonical ordering of transactions in blocks

Start with the set of transactions in the block (including the coinbase transaction)
Sort them in ascending order, using the smallest (previous-transaction-hash, index) of each transaction’s inputs as the sort key. Note that the coinbase transaction will be first since its previous input/index is all zero.
Add the first transaction on the sorted list that does not depend on a transaction later in the list to the block (and remove it from the sorted list).
Continue adding transactions until the sorted list is empty.

A similar policy of selecting those pool transactions go into blocks

Transactions are selected based on priority and fee-per-kilobyte
Sending the block header, transaction data size, and size of high priority transactions for new blocks is sufficient for the IBLT to reconcile the data.

Peers send IBLTs containing transaction data to one another, synchronizing pools

For me, this is the quintessential spirit of Bitcoin. Instead of targeting the “bad” actor who is behaving rationally within the system, the rules are changed to realign the interests of all actors, and everyone benefits from the same advantage.

Easy right? Now all that’s left to do is synchronize every full Bitcoin node on the planet – just wait until we have to deal with block propagation from Mars (thanks, Elon). Synchronizing datasets between nodes is the proposed implementation for IBLTs.

Invertible Bloom Lookup Tables

So what the Hell is an IBLT?

An IBLT is a Data Structure. In programming, we use a data structure to define a piece of code. In video games, you have a data structure for NPCs, for example. Instead of programming hundreds of Goombas for Super Mario you write just one and copy it over and over.

In Bitcoin, a transaction is a data structure. A Block is a data structure that mostly contains other data structures – transactions. If you have experience in Object-Oriented Programming, this should all be familiar. If you are not a programmer think of data structures as gears in a machine. They work with other data structures to produce working software.

IBLTs store data in key-value pairs. For Bitcoin, values need to be fixed. Using 48 bits from the transactions data hash and a 16-bit sequence number gives a 64-bit key. Keys have a .1% chance of collision (two unique transactions with the same key) – even at one million transactions a block. In the event that an actual collision occurs the transaction will simply wait for the next block.

Fantastic but what’s the point? Well, IBLTs can be used to reconcile sets of data. In the MondoHash example, 1TB can fit approximately 4,000,000 transactions. Instead of sending all of those transactions with a block, peers will compress all their transaction data into a much smaller IBLT. Peers will pass IBLTs back and forth, comparing their transaction records. When a block is mined, transaction pools should be nearly synchronized. Thus, sending transaction data with the block information is redundant.

Suppose Alice wants to reconcile her transaction pool with Bob:

1. Alice’s pool, P1, has more transactions than Bob’s, P2. 2. Alice creates an IBLT for P1, IBLT-1. The transaction data hash/sequence salt is our key. The table stores the remaining serialized transaction data. 3. Alice sends IBLT-1 to Bob. 4. Bob deletes transactions from IBLT-1 that match transactions in his pool. 5. Bob requests the remaining transactions from the network and adds them to his transaction pool

Proponents, Opponents, and Components

Bitcoin will not eclipse Visa as a payment processor overnight. As mentioned above, transactions can increase by a power of ten before the maximum block size begins delaying transactions – so the fire hasn’t broken out yet. There is still a healthy discussion taking place among developers as to which of the available solutions will scale the best.

Below I am including links to other possible solutions as well as feedback and discussions that have taken place over the past months concerning scaling, and the problems, IBLTs hope to address.

Matt Corallo is working on a high-speed backbone relay network for Bitcoin.
“For those who have been using this to get faster relays to/from the Network, you may have noticed some instability recently. This is because the nodes were all being upgraded to use some new relaying code which should cut down on duplicate transaction relaying in blocks, improving relay speed within the network and to nodes which run new clients which use the same relaying technique. Essentially instead of relaying entire blocks, nodes keep a rolling window of recently-seen transactions and skip those when relaying blocks.”

Several devs have discussed relaying just the block header information
“Bitcoin uses a multi-hop broadcast to propagate transactions and blocks through the network to update the ledger replicas. We then use the gathered information to verify the conjecture that the propagation delay in the network is the primary cause for blockchain forks. Blockchain forks should be avoided as they are symptomatic for inconsistencies among the replicas in the network. We then show what can be achieved by pushing the current protocol to its limit with unilateral changes to the client’s behavior.”

Information Propagation in the Bitcoin Network

Gregory Maxwell expresses his ideas

Gregory Maxwell comparing the Matt Corallo Relay Network to IBLTs

Peter Todd voices his opinions here
“Unfortunately IBLT only works for scalability if all the participants are “honest” and run the algorithm as-is, rather than try to maximize revenue. The problem is still large miners: if I’m a big miner like GHash.IO, I can earn more revenue than my competitors by including transactions that they don’t yet have, which pushes their orphan rate up much more than it does mine. The miner will take a hit on subsidy profit temporarily, but their competitors take a much bigger hit.(1) In the future if transaction fees dominate revenue even that won’t be true: you’ll earn more money outright because you get a higher % of the transactions.”

Here is a snippet of the devs discussing Bitcoin’s scalability on the Bitcoin IRC

Images from Shutterstock.

About the Author

Alex Gorale

[email protected]

X.com