Attempt to achieve high throughput in Hyperledger Fabric network

Tags:

Hyperledger community in the article Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains shows that Fabric achieves end-to-end throughput of more than 3500 transactions per second in certain popular deployment configurations. I'm trying to achieve this result in my project, but I'm far from it. Here I report my first results of load testing and invite you to join the investigation how to achieve a high throughput with Hyperledger Fabric and Composer

Project descriptions

We build high-load service that uses Hyperledger Fabric. Our backend system consists of HF blockchain network, several microservices (node js) which communicate with blockchain via Hyperledger Composer, message broker for communication between microservices.

Hyperledger Fabric v1.1. Hypeledger Composer v0.19.0

Fabric network (deployed with Cello):

{
    fabric001: {
      cas: [],
      peers: ["[email protected]"],
      orderers: ["orderer1st.orderer"],
      zookeepers: ["zookeeper1st"],
      kafkas: ["kafka1st"]
    },
    fabric002: {
      cas: [],
      peers: ["[email protected]"],
      orderers: ["orderer2nd.orderer"],
      zookeepers: ["zookeeper2nd"],
      kafkas: ["kafka2nd"]
    },
    fabric003: {
      cas: [],
      peers: ["[email protected]"],
      orderers: ["orderer3rd.orderer"],
      zookeepers: ["zookeeper3rd"],
      kafkas: ["kafka3rd"]
    },
    fabric004: {
      cas: ["ca1st.main"],
      peers: [],
      orderers: ["orderer4th.orderer"],
      zookeepers: ["zookeeper4th"],
      kafkas: ["kafka4th"]
    }
}

fabric001-004 - AWS ec2 instances of t2.xlarge type. Initially, I used m5.4xlarge, but it costs a lot and CPU usage was always low even when Fabric starts to fail.

Fabric config:

BatchTimeout: 0.2s
BatchSize:
    MaxMessageCount: 10
    AbsoluteMaxBytes: 98 MB
    PreferredMaxBytes: 512 KB

TLS disabled.

If required I can perform new tests with any configuration.

Load testing

First of all I decided to test request to the state of the ledger (CouchDB). Blockchain is empty, only system data and few participants. Direct query requests to the CouchDB open port are very fast (~150 ms). My microservice connects to the Fabric by establishing a permanent connection for the existing identity. Requests take up ~500 ms in our system without high load. Half of this time accounts for message broker (AWS SQS is really slow). For load testing I'm using tool YandexTank. Load is going smoothly without latency increasing up to ~70 requests per second. Then latency stats degrade and at some point, chaincode starts return error messages. You can see test results here:

TEST RESULTS

There are two types of error messages that I received during iterations of load tests:

[Hyperledger-Composer] undefined:HLFQueryHandler :queryChaincode() query payload returned an error: Error: 2 UNKNOWN: error executing chaincode: failed to execute transaction: timeout expired while executing transaction

LFQueryHandler :queryChaincode() query payload returned an error: Error: 2 UNKNOWN: error executing chaincode: transaction returned with failure: Error: The current identity, with the name 'txBuilder' and the identifier '5606acbada327a8ef33134e601f990076872b31a3dda5ec0a983e04915d16007', has not been registered`

Chaincode container does not restart by itself, but from this time it doesn't work well. Sometimes I can't ping it, sometimes I can, but anyway latency is terrible. Only restart of the peer container can help. (I remind you that request to the ledger goes through one peer due to Composer, that's not good, but it's not the point of my investigation). The second error is really strange because this is the only identity I use and it works before chaincode starts to fail. And it works after I restart peer.

During applying the load, CPU usage of the peer, chaicode and CouchDB are the most (as expected). I'm in the middle of a configuring monitoring system for my blockchain network and soon I will be able to share more information.

Any thoughts?

UPDATE #1

I've been advised to use c*-type AWS instances for deploying Fabric. I chose c5.4xlarge (16 vCPU) for my tests. Also, I changed Fabric config a little bit:

BatchTimeout: 1s
BatchSize:
    MaxMessageCount: 20
    AbsoluteMaxBytes: 98 MB
    PreferredMaxBytes: 512 KB

I performed the same test and, to my regret, I got the same result:

TEST RESULTS

In the figure below you can see the plot of containers CPU usage during the test which lasts 1 minute

CPU load of fabric001 instance

Total CPU usage in maximum was ~ 30%. So we can see that problem of latency degradation lies elsewhere.

UPDATE #2

As performance results were very poor, I decided to continue my tests with pure Fabric without any unnecessary intermediate components. Just Fabric network and nodejs SDK. See new report here

451

asked Apr 17 '18 10:04

Dmitry Pugachev

1 Answers

I did a similar test with similar kind of setup and could achieve about 220 RPS, using 8 peer nodes, single Org. With a 2nd org, this performance would drop for sure. I used the high performance chaincode provided with fabric samples. Not sure how did they manage to get 3500 RPS.

195

answered Oct 06 '22 00:10

Ashish Mishra

Related questions
                            
                                Modularizing JMeter tests?
                            
                                How to simulate 120 concurrent users of a web application with real conditions?
                            
                                The JVM should have exited but did not
                            
                                Tsung. contents_from_file attribute with variable value
                            
                                Load test for 1000 concurrent users
                            
                                Precompile C# method before executing
                            
                                Mysql show no more than 5 concurrent connection
                            
                                How to ignore timeouts in ab (apache bench)?
                            
                                Can I use Amazon ELB for my RDS instance for load balancing?
                            
                                how to test your mysql queries?
                            
                                how to set counter of loop inside loop correctly inside jmeter?
                            
                                Please recommend performance testing tool for MySQL under Ubuntu/Debian [closed]
                            
                                Purpose of Throughput Controller?
                            
                                jMeter Slave - Server failed to start: java.rmi.RemoteException: Cannot start. ip-10-142-111-66 is a loopback address
                            
                                Invalid URI error during web test run
                            
                                How to increase the request per second on amazon EC2 T2.micro instance?
                            
                                Load testing an ASP.NET web site

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Attempt to achieve high throughput in Hyperledger Fabric network

Tags:

load-testing

hyperledger-fabric

hyperledger-composer