Editors Note: This article comes fromChain News ChainNewsEditors Note: This article comes from
Chain News ChainNews
(ID: chainnewscom), author: Wang Jiaping, executive director of Sinovation Ventures, published by Odaily with authorization.
Counting from Satoshi Nakamotos paper Bitcoin: A Peer-to-Peer Electronic Cash System published in November 2008, Bitcoin is about to usher in its first tenth anniversary. In the past ten years, Bitcoin and the blockchain technology behind it have flourished. In the name of decentralized technology, it has great momentum and ambition to change the entire online digital world.
However, ambitions belong to ambitions, and the thriving blockchain technology, especially in the field of public chains, has a bottleneck that has yet to be broken through: with the scale and volume of todays digital world, any online system, if there is no large-capacity, High-throughput infrastructure cannot carry even an Internet-level application.
This is a worldwide problem, and the smartest scholars and developers in the world are trying to solve it. I worked for Microsoft for many years as a researcher in charge of Microsoft Research Institute, focusing on the research of distributed systems for a long time; after leaving Microsoft, I served as the executive director in charge of blockchain and artificial intelligence investment in Innovation Works. Years of research experience in distributed systems, as well as experience in evaluating multiple public chain projects in the field of blockchain investment, let me deeply understand that it is extremely difficult to achieve high-performance design in a completely decentralized system. Tall, challenging jobs.
I see that there are a lot of discussions in the industry on the performance bottlenecks and solutions of blockchain public chains. Some of them are full of insights and benefit a lot, but there are also many fallacies, and there are many specious opinions fabricated for the promotion of their own projects. , at the risk of misleading the discussion. After many in-depth exchanges with many top scholars, developers, and investors in the industry, they all encouraged me to share my views. After thinking twice, I decided to record some of my humble opinions on this topic, so that some of my thinking can be precipitated, and at the same time, I hope to have some discussions with more colleagues who are interested in this topic.
secondary titleDont just focus on performance bottlenecks and ignore capacity bottlenecksLet me talk about one of my conclusions first: In the current situation where finance-like applications are the mainstream application scenarios,
The most important performance bottleneck of the blockchain system is caused by the broadcast delay of block data「Chain of Blocks」, is essentially limited by the bandwidth and communication delay of the Internet, which directly restricts the throughput TPS.only if。
No matter what consensus algorithm is used in the system, whether it is proof-of-work PoW, proof-of-stake PoS, Byzantine fault-tolerant BFT, or delegated proof-of-stake DPoS, before the next block is issued, it is necessary to ensure that the previous block is in the entire network. There is a certain synchronization rate, which constrains that each block cannot be too large, and the block frequency cannot be too high. Then,no solution to this problemPlease note that the blockchain system mentioned here specifically refers to the Chain of Blocks system, which is characterized by「Graph of Blocks」Ensure that the system can eventually converge to a single linked list structure
, and only the blocks above this chain are confirmed, the counterexample isSystem, such as the DAG structure IOTA adopted.Assuming that the bandwidth and delay of the physical network can be ignored, such as EOS based on the high-speed link of the data center,
The second bottleneck of the system is the limited book capacity, is essentially limited by the memory capacity of a single full node, which directly restricts how many user addresses and how many DApps can be carried on the chain. No matter what consensus algorithm is used, as long as any user may be involved in the transaction verification/execution process at any time, a single full node must keep the state of each user and each DApp in the entire network at any time in memory for transaction Verify real-time access.All current mainstream Chain of Blocks systems, including Bitcoin blockchain, Ethereum, EOS, etc., have this problem, and the same,
This question is also unsolvable. Multi-level cache database technology, such as RocksDB, can slightly improve this limitation, so that only active users are limited by memory, and the total user base is limited by the capacity of the hard disk. But this does not fundamentally solve the problem.
The issue of capacity has received far less attention than throughput. The reason is simple: because the shortcoming of throughput has not been resolved, the capacity issue has been covered up.Please keep in mind that once the throughput is greatly improved, the capacity problem will appear immediately: on a high-throughput system, if the number of users does not increase, it is likely that the high performance will not be able to run at all.A typical example is EOS.
When EOS solved the throughput problem at the cost of losing the decentralization feature, the capacity problem immediately became prominent.Then, EOS packaged the book capacity bottleneck into a scarce resource and tokenized it into EOS RAM virtual currency. Of course, in addition to memory, a single full-node CPU will also become a capacity bottleneck, so it is also tokenized and becomes EOS CPU virtual currency. However, in financial-like application scenarios, the computational complexity is usually very low, so memory will be the main bottleneck.Consensus algorithms can’t actually solve performance and capacity bottlenecks. Trying to improve the performance of the “Chain of Blocks” system based on unconventional consensus algorithms will basically not substantially improve system performance.
. In short, solving the two bottlenecks mentioned above requires ingenuity in the design of distributed systems, which is related to consensus algorithms and cryptography, but the essential starting point is not consensus algorithms and cryptography.
secondary title
Performance Bottleneck: What a Block Producer is Doing
First of all, the block producing node is also a full node, accepting the confirmed blocks and unconfirmed transactions of the whole network, and constructing a chain, constantly maintaining the latest state of the ledger, and then seize the opportunity to try to add new blocks at the end of the chain. No matter which consensus algorithm is used, it will go through the following steps:
In the first step, according to the latest state of the account book, select a number of verified legal transactions from the unconfirmed transaction set, and then construct a new block;
The second step is to participate in the competition or candidate for the right to produce blocks for this new block. At this stage, there is a high probability that it will be interrupted because the account book status is updated, that is, other nodes successfully produced blocks, and return to the first step;The third step is to broadcast the new block to the whole network after obtaining the right to generate blocks, update the account book status, and return to the first step.The core difference between different consensus algorithms lies in how to complete the second step of the competition or candidate for the right to produce blocks. but
This contradiction makes it necessary to have a relatively long block interval to ensure that the block is fully disseminated in the entire network before the next block is generated if each block is relatively large and can contain more transactions.
If the propagation is not sufficient, in the PoW and PoS systems, it will be manifested as a higher fork rate and invalid blocks, while in the BFT system, it will be manifested as a higher failure rate block cannot get 2/3 of the consent ticket.
secondary title
Proof-of-Work and Proof-of-StakePoW requires that the Hash value must be less than a specific value by setting a Hash Target. For example, treat the 256-bit Hash value as a large integer. The Hash value must be calculated based on the new block data and a Nonce data. Find any node that satisfies the Hash Target corresponding to the Nonce, and you will have the right to produce blocks. Since the Nonce can only be found through random exhaustive methods, this competition is transformed into a competition for calculating Hash computing power. PoS such as Peercoin is a variant of PoW, which introduces the mechanism of consuming Coin Age to increase the Hash Target, so that the competition for block generation power can be partially replaced by the time and amount of digital currency held.It can be seen that the biggest advantage of the PoW mechanism is that it uses a simple algorithm to realize the random assignment of completely non-permissionless premissionless block production rights. There is no need for coordination and communication between competing nodes, and it can easily support any number of block production nodes. Competition, with excellent decentralization characteristics. It is precisely because of this,
This algorithm leads to a contradiction between block broadcast delay and block interval.When the block generation interval is short, before a new block is fully broadcast on the entire network, another miner generates another new block at the same height, which is the so-called fork. In this case, eventually one of the blocks will be discarded by ophaned. The probability of this happening should not be too high, otherwise it will significantly reduce the original 51% computing power attack benchmark Selfish Mining, and in extreme cases, it may even cause the fork to never reach a stable convergence.
The block broadcast delay is mainly determined by the block size and the bandwidth between nodes in the entire network.In the current Internet environment, it takes about 10 seconds to broadcast to more than 90% of the nodes. Therefore, in the Bitcoin network, the block interval of about 10 minutes makes the probability of block forks extremely low. Throughout the first half of 2018, there were only two forks. In the Ethereum network, the block interval of about 15 seconds keeps the probability of block forking at about 10%, even if its block is much smaller than that of Bitcoin. It should be noted that the block generation interval of PoW is in a statistical sense. The actual situation is that the block generation interval is large and small, and the statistical expectation is 10 minutes. This is not caused by fluctuations in the computing power of the entire network, but because the process of searching for Nonce is a random probing process, so many mining pools have given their own luck value curves.
For the Bitcoin network, the 10-minute block interval is actually a big reservation in todays Internet environment. You must know that this is a proposal proposed 10 years ago, which makes it possible to expand the block size. Simple scaling solution, butDue to the fundamental contradiction of block broadcast delay, this improvement is only effective to a certain extent.Also, its worth mentioningGHOST protocol
. The protocol gives a new criterion to determine which fork is accepted when forking. It changes the longest chain principle originally proposed by Satoshi Nakamoto into the subtree containing the most computing power. The two criteria are completely equivalent when the fork probability is very low, but when the probability is relatively high, such as the 10% fork of ETH, the GHOST protocol can avoid Selfish Mining and improve security. butRegardless of whether the GHOST protocol is adopted or not, it will not substantially help the performance of the public chain.The computing power of PoW has nothing to do with the performance of the blockchain system, and any software or hardware that accelerates the hash algorithm will not improve the throughput of the blockchain system per unit time.
in addition,This is why the hash computing power of the entire network of the Bitcoin blockchain has increased by trillions of times, but its throughput has always been around 7 TPS.in addition,Because the total amount of energy invested in mining has been determined when each mine is established, when more energy-efficient mining technologies or equipment appear, the competition for computing power will cause all miners to apply new technologies, which will eventually drive up the global mining industry. The mining difficulty of the network is nothing more than that.
Therefore, the actual total energy consumption, macroscopically, is only related to the currency price, electricity price and investment confidence in digital currency, and has nothing to do with mining efficiency.
secondary title
Byzantine Fault Tolerant BFTThe Byzantine fault-tolerant consensus algorithm uses a random algorithm to determine the node for each block, based on the digital currency address on the ledger instead of the IP address. All nodes participating in the block candidate do not need to compete. The new block will be verified and signed by all members of the committee, a group of verifiers, and voted, and then broadcast to the entire network, and then start the next block production process.
Different from PoW, BFT block candidate is a collaborative process, which involves at least O(n^2) communication complexity, while PoW does not require any communication cost in the process of block competition. The BFT-based collaboration process will not lead to forks, and does not need to consume scarce resource computing power or Coin Age. However, since this collaboration process involves quite a lot of data communication, this process cannot be candidate, verified and signed on the entire network. The process cannot be carried out on the whole network. This is why BFT-like algorithms will definitely involve a committee construction process, and verification signatures only happen in a small range, and the rest of the people just trust them.
Many recent BFT-based public chain projects, such as Algorand, have done a lot of work on how to select this committee safely and fairly, although these works are not directly related to the improvement of system performance.The votes of BFT algorithms are usually weighted to avoid Sybil Attack. And this weight is mostly related to the rights and interests of the participants, which is similar to the spirit of PoS, and now many people call this kind of voting algorithm of BFT the PoS algorithm. In fact, the BFT-like consensus algorithm and the PoS algorithm proposed at the beginning (such as Peercoin) are fundamentally different mechanisms.
We mentioned above that the process of selecting block producers and committee members for different BFT algorithms has little to do with the performance of the system. Similar to PoW/PoS, its throughput performance also depends on the size of each block and the cycle of block generation. In the BFT system, if you want to allow each block to be relatively large, you need to have a relatively long period of block generation, so that there is a high probability that the newly generated block and its committee signature data will be completely propagated within the committee.Theoretically, the size of the committee is much smaller than that of the entire network, and the broadcast delay in the BFT algorithm will be smaller than that of the PoW/PoS network of the same size. In fact, this is true, but the broadcast delay based on the Gossip protocol is proportional to the logarithm of the network size rather than linear, so the broadcast delay is not much smaller. In addition, BFT algorithms rely on some additional security measures such as periodic global synchronization, so that in actual effect,
BFT-like algorithms do not have much performance advantage over PoW/PoS systems.
secondary title
No matter which algorithm, the performance cannot be greatly improvedHowever, the time to achieve full propagation based on the Gossip protocol has a linear relationship with the amount of data propagated and a logarithmic relationship with the number of nodes propagated, so BFT does not have much advantage in propagation delay.
The result is that no matter which algorithm is used, there is an irreconcilable contradiction between the block size and the block interval, so that the performance cannot be greatly improved.
secondary titleCapacity bottleneck: What is a full node that does not produce blocks doing?In the single-chain Chain of Blocks system, there are roughly three types of nodes:
Full nodes that produce blocks, full nodes that do not produce blocks, and lightweight nodes.
Regardless of whether a block is produced or not, the full node will verify and relay broadcast new blocks and unconfirmed transactions. The broadcast work here occupies the main traffic and disk I/O load. For TPS, there are only a dozen Ethereum geth Say, this traffic is about 1.5Mbps.In order to complete the verification of new blocks and unconfirmed transactions in real time, all user account books and all smart contract states need to reside in memory, which occupies the main memory overhead. The current scale of Ethereum will occupy nearly 4GB of memory . Every full node will need to bear such a load. If you want to produce a PoW mining node or a PoS verification node, you need to do extra things. The price of these loads is exchanged for safe and complete decentralization. Any full node does not need to trust any other nodes in advance, and any full node has no ability to deceive other full nodes.The value of ordinary full nodes is reflected in two aspects:
Relay broadcasts legal data and maintains the latest state of the entire network account book for users or lightweight nodes to query.For example, a lightweight node such as a mobile wallet does not verify or relay broadcast block data or unconfirmed transactions. It relies on and trusts one or more pre-set full nodes, and obtains the status of a specific user through these full nodes, such as Account balance, and initiate a transfer transaction. Lightweight nodes themselves have no ability to verify the authenticity of information, and are more like a terminal in the blockchain world.For a single-chain Chain of Blocks system, if the throughput TPS of the system is increased by 100 times, the communication volume of 150Mbps is required; or the user scale is expanded by 100 times, 400GB of memory is required.So basically most ordinary servers on the Internet cannot deploy a full node smoothly.If the full node can only be operated by professional mining farms, and ordinary people cannot independently deploy a full node, then the entire system will degenerate into a centralized cloud service deployed in multiple places, and become vulnerable to attacks and bans .
Therefore, these two bottlenecks need to be solved not only for block producing nodes, but also for ordinary full nodes.
secondary titleWhy not change your mind and find a new way out
We have already mentioned the performance bottleneck and capacity bottleneck. In the current single-chain Chain of Blocks system, it is difficult to make a big improvement, especially the capacity bottleneck. This is the origin of the so-called blockchain impossible triangle. Throughout the history of computer technology development, there is only one large-capacity and high-throughput design paradigm that has achieved large-scale success:
Scale-Out Scale-Out.For example, the GPU uses thousands of cores with common performance to work in parallel to achieve performance improvements that exceed the computing performance of the CPU by several orders of magnitude, and the semiconductor technology that the GPU relies on is not fundamentally different from the CPU chip. Another example is that todays online cloud service system uses thousands or even tens of thousands of servers with ordinary performance to work in parallel to support online services with large capacity and high throughput.
Here I might as well imagine boldly:
Perhaps a large-capacity and high-throughput blockchain system will be a similar solution, that is, let thousands of homogeneous single-chain instances work together in parallel, and divide the workload of the entire network to achieve overall large capacity and high throughput.