During trying to achieve the performance with Hyperledger Fabric which IBM team reported in their article Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains, I faced some problems and errors. I collected all useful information and want to share it with the HF community. Also, I have a couple of questions to the Fabric developers about its performance. <h3>Target description</h3> Hyperledger Fabric v1.1.0 network deployed using Cello on four c5.9xlarge (36vCPU) aws instances: <pre class="prettyprint"><code>{ fabric001: { cas: [], peers: ["anchor@peer1st.main"], orderers: ["orderer1st.orderer"], zookeepers: ["zookeeper1st"], kafkas: ["kafka1st"] }, fabric002: { cas: [], peers: ["worker@peer2nd.main"], orderers: ["orderer2nd.orderer"], zookeepers: ["zookeeper2nd"], kafkas: ["kafka2nd"] }, fabric003: { cas: [], peers: ["worker@peer3rd.main"], orderers: ["orderer3rd.orderer"], zookeepers: ["zookeeper3rd"], kafkas: ["kafka3rd"] }, fabric004: { cas: ["ca1st.main"], peers: [], orderers: ["orderer4th.orderer"], zookeepers: ["zookeeper4th"], kafkas: ["kafka4th"] } } </code></pre> TLS is disabled. Fabric channel configuration (all others parameters are the default): <pre class="prettyprint"><code>BatchTimeout: 1s BatchSize: MaxMessageCount: 500 AbsoluteMaxBytes: 200 MB PreferredMaxBytes: 50 MB </code></pre> I performed tests for both CouchDB and LevelDB as a state database. I use official Fabcar chaincode (Golang implementation) for my tests. I created simple nodejs app which interacts with the Fabric network using SDK and exposes HTTP API for load tests. This app is stateless and can be easily scaled. For load testing, I'm using tool YandexTank. I've performed two kinds of tests with high load: query (requests via peer001 to the Fabric state when blockchain is empty) and insert (transactions within the blockchain). <h3>Results</h3> <h3>CouchDB as a state database</h3> <ul> <li>Query results: https://overload.yandex.net/101153. At ~1100 rps latency starts to increase. But Fabric instance is not loaded and have a lot of free resources. On the figure below you can see CPU and Memory usage by the Fabric network containers on the instance fabric001 during the test. 100% CPU usage means one full vCPU load. <img src="https://i.stack.imgur.com/s01B4.png" alt="fabric001 container instances (couchdb, query)"> Also peer001 prints a lot of similar error logs (not full output, just tiny part, I can share it with you if needed): https://gist.github.com/krabradosty/9780cacc92fcdeaa0c36377a91727ade</li> <li>Insert results: https://overload.yandex.net/101217. At ~600 rps latency degradation is very fast. Before is slowly, but anyway, exist. CPU and Memory usage of the fabric003 containers on the figure below: <img src="https://i.stack.imgur.com/gURGz.png" alt="fabric001 container instances (couchdb, insert)"> A lot of error logs from the peer (again, not full output): https://gist.github.com/krabradosty/3810151b8e101d8279cc705aef22863e</li> </ul> Based on this I can conclude that Fabric Peer has problems with the CouchDB connection under the load. My questions: Does Fabric comminity know about this bug? Do you have plans how to solve it? <h3>LevelDB as a state database</h3> <ul> <li> Query results: https://overload.yandex.net/102035. CPU and Memory usage of the fabric001 containers on the figure below: <img src="https://i.stack.imgur.com/Cp4nw.png" alt="fabric001 container instances (leveldb, query)"> There are no any errors from the blockchain, I just see latency degradation.</li> <li> Insert results: https://overload.yandex.net/102040. CPU and Memory usage of the fabric001 containers on the figure below: <img src="https://i.stack.imgur.com/DwXiV.png" alt="fabric001 container instances (leveldb, insert)"> Aggressive latency degradation starts at ~850 rps. No errors from the blockchain.</li> </ul> My questions: What is the cause of this latency degradation? Why I can't achieve 3500 rps performance that IBM report in their article? What plans does Fabric community have on improving the performance?

Fabric is a queueing system. With a high load, the waiting time increases exponentially (queueing property) and hence the transaction latency. However, for golevelDB, we should get at least 2000 tps with a low latency. From the CPU utilization plot, it looks like only 16 vCPUs are utilized fully out of 36 vCPUs. What value is set for validatorPoolSize in core.yaml for each peer? You can set this value equal or lesser than the block size and check whether the throughput increases. The performance would differ based on the <ol> <li>workload (fabcar vs fabcoin), </li> <li>disk (hdd vs ssd, local vs network attached), </li> <li>load generator (CLI vs SDK), </li> <li>load generation method (open system vs closed system vs some distribution) and </li> <li>network bandwidth (at least 1.6 Gbps for 2700 tps). </li> </ol> Also, ensure that the load generator is not becoming a bottleneck. It would be better the latency can be divided further into (endorsement latency, ordering latency, commit latency) and collect other resource utilization such as network and disk so that the bottleneck can be identified easily. You can refer to our technical paper titled Performance Benchmarking and Optimizing Hyperledger Fabric. We have conducted a comprehensive empirical study. With levelDB, we should get at least 2000 tps with a low latency.

Performance Test of the Hyperledger Fabric

Tags:

performance

blockchain

load-testing

hyperledger-fabric

During trying to achieve the performance with Hyperledger Fabric which IBM team reported in their article Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains, I faced some problems and errors. I collected all useful information and want to share it with the HF community. Also, I have a couple of questions to the Fabric developers about its performance.

Target description

Hyperledger Fabric v1.1.0 network deployed using Cello on four c5.9xlarge (36vCPU) aws instances:

{
    fabric001: {
      cas: [],
      peers: ["[email protected]"],
      orderers: ["orderer1st.orderer"],
      zookeepers: ["zookeeper1st"],
      kafkas: ["kafka1st"]
    },
    fabric002: {
      cas: [],
      peers: ["[email protected]"],
      orderers: ["orderer2nd.orderer"],
      zookeepers: ["zookeeper2nd"],
      kafkas: ["kafka2nd"]
    },
    fabric003: {
      cas: [],
      peers: ["[email protected]"],
      orderers: ["orderer3rd.orderer"],
      zookeepers: ["zookeeper3rd"],
      kafkas: ["kafka3rd"]
    },
    fabric004: {
      cas: ["ca1st.main"],
      peers: [],
      orderers: ["orderer4th.orderer"],
      zookeepers: ["zookeeper4th"],
      kafkas: ["kafka4th"]
    }
}

TLS is disabled.

Fabric channel configuration (all others parameters are the default):

BatchTimeout: 1s
BatchSize:
    MaxMessageCount: 500
    AbsoluteMaxBytes: 200 MB
    PreferredMaxBytes: 50 MB

I performed tests for both CouchDB and LevelDB as a state database. I use official Fabcar chaincode (Golang implementation) for my tests. I created simple nodejs app which interacts with the Fabric network using SDK and exposes HTTP API for load tests. This app is stateless and can be easily scaled. For load testing, I'm using tool YandexTank. I've performed two kinds of tests with high load: query (requests via peer001 to the Fabric state when blockchain is empty) and insert (transactions within the blockchain).

Results

CouchDB as a state database

Query results: https://overload.yandex.net/101153. At ~1100 rps latency starts to increase. But Fabric instance is not loaded and have a lot of free resources. On the figure below you can see CPU and Memory usage by the Fabric network containers on the instance fabric001 during the test. 100% CPU usage means one full vCPU load. Also peer001 prints a lot of similar error logs (not full output, just tiny part, I can share it with you if needed): https://gist.github.com/krabradosty/9780cacc92fcdeaa0c36377a91727ade
Insert results: https://overload.yandex.net/101217. At ~600 rps latency degradation is very fast. Before is slowly, but anyway, exist. CPU and Memory usage of the fabric003 containers on the figure below: A lot of error logs from the peer (again, not full output): https://gist.github.com/krabradosty/3810151b8e101d8279cc705aef22863e

Based on this I can conclude that Fabric Peer has problems with the CouchDB connection under the load.

My questions: Does Fabric comminity know about this bug? Do you have plans how to solve it?

LevelDB as a state database

Query results: https://overload.yandex.net/102035. CPU and Memory usage of the fabric001 containers on the figure below: There are no any errors from the blockchain, I just see latency degradation.
Insert results: https://overload.yandex.net/102040. CPU and Memory usage of the fabric001 containers on the figure below: Aggressive latency degradation starts at ~850 rps. No errors from the blockchain.

My questions: What is the cause of this latency degradation? Why I can't achieve 3500 rps performance that IBM report in their article? What plans does Fabric community have on improving the performance?

373

asked May 14 '18 15:05

Dmitry Pugachev

1 Answers

Fabric is a queueing system. With a high load, the waiting time increases exponentially (queueing property) and hence the transaction latency. However, for golevelDB, we should get at least 2000 tps with a low latency.

From the CPU utilization plot, it looks like only 16 vCPUs are utilized fully out of 36 vCPUs. What value is set for validatorPoolSize in core.yaml for each peer? You can set this value equal or lesser than the block size and check whether the throughput increases.

The performance would differ based on the

workload (fabcar vs fabcoin),
disk (hdd vs ssd, local vs network attached),
load generator (CLI vs SDK),
load generation method (open system vs closed system vs some distribution) and
network bandwidth (at least 1.6 Gbps for 2700 tps).

Also, ensure that the load generator is not becoming a bottleneck. It would be better the latency can be divided further into (endorsement latency, ordering latency, commit latency) and collect other resource utilization such as network and disk so that the bottleneck can be identified easily.

You can refer to our technical paper titled Performance Benchmarking and Optimizing Hyperledger Fabric. We have conducted a comprehensive empirical study. With levelDB, we should get at least 2000 tps with a low latency.

189

answered Oct 19 '22 03:10

senthil nathan

Related questions
                            
                                Is there a way to improve performance of linux pipes?
                            
                                Occasional slow requests on Heroku
                            
                                Ionic: slow transitions in installed android app
                            
                                How make Java 8 Nashorn fast?
                            
                                Unused function changes performances
                            
                                Android scroll up hide view and scroll down show view effect like twitter
                            
                                Python - For loop millions of rows
                            
                                Optimum file buffer read size?
                            
                                Good F# Performance Profiling Tool
                            
                                Do jQuery scripts still need $(document).ready if they are loaded after all of the page HTML?
                            
                                Can I precompile the format string in String.format? (Or do any other thing to make formatting logs faster?)
                            
                                How to improve FlatList render performance for large list with ReactNative 0.43?
                            
                                Keras inconsistent prediction time
                            
                                What is the minimum Cross AppDomain communication performance penalty?
                            
                                Performance Explanation: code runs slower after warm up
                            
                                How does choosing between pre and post zero padding of sequences impact results
                            
                                Why is my regex so much slower compiled than interpreted?
                            
                                Cost of raising an Intent in android
                            
                                Why pickle eat memory?
                            
                                PercentRelativeLayout is more Performant?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With