Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fully utilizing HW accelerator

I would like to use OpenSSL for handling all our SSL communication (both client and server sides). We would like to use HW acceleration card for offloading the heavy cryptographic calculations.

We noticed that in the OpenSSL 'speed' test, there are direct calls to the cryptographic functions (e.g. RSA_sign/decrypt, etc.). In order to fully utilize the HW capacity, multiple threads were needed (up to 128 threads) which load the card with requests and make sure the HW card is never idle.

We would like to use the high level OpenSSL API for handling SSL connections (e.g. SSL_connect/read/write/accept), but this API doesn't expose the point where the actual cryptographic operation is done. For example, when calling SSL_connect, we are not aware of the point where the RSA operations are done, and we don't know in advance which calls will lead to heavy cryptographic calculations and refer only those to the accelerator.

Questions:

  1. How can I use the high level API while still fully utilizing the HW accelerator? Should I use multiple threads?
  2. Is there a 'standard' way of doing this? (implementation example)
  3. (Answered in UPDATE) Are you familiar with Intel's asynchronous OpenSSL ? It seems that they were trying to solve this exact issue, but we cannot find the actual code or usage examples.

UPDATE

  1. From Accelerating OpenSSL* Using Intel® QuickAssist Technology you can see, that Intel also mentions utilization of multiple threads/processes:

    The standard release of OpenSSL is serial in nature, meaning it handles one connection within one context. From the point of view of cryptographic operations, the release is based on a synchronous/ blocking programming model. A major limitation is throughput can be scaled higher only by adding more threads (i.e., processes) to take advantage of core parallelization, but this will also increase context management overhead.

  2. The Intel's OpenSSL branch is finally found here. More info can be found in pdf contained here.

    It looks like Intel changed the way OpenSSL ENGINE works - it posts work to driver and immediately returns, while the corresponding result should be polled.

    If you use other SSL accelerator, than corresponding OpenSSL ENGINE should be modified too.

like image 881
dimba Avatar asked Oct 07 '15 14:10

dimba


People also ask

Should I turn on hardware acceleration?

Turning on hardware acceleration improves your battery life, performance, and responsiveness. Hardware acceleration offloads certain tasks from the CPU to the GPU or any other specialized hardware that can do it more efficiently, resulting in faster processing times and longer-lasting batteries.

Should I turn off hardware acceleration?

Faulty hardware acceleration doesn't help your PC or browser at all, so it's best to fix it or disable it. You might also run into error messages because of it. For example, when playing a video game, you could get an error warning you about slow performance.

Should I turn on hardware accelerated GPU scheduling?

Should You Enable GPU Hardware Scheduling? If your computer has a low or mid-tier CPU, the GPU hardware scheduling feature might be worth turning on. Especially if your CPU reaches 100% load in certain games.

Should I turn on hardware acceleration on Android?

Hardware acceleration is enabled by default if your Target API level is >=14, but can also be explicitly enabled. If your application uses only standard views and Drawable s, turning it on globally should not cause any adverse drawing effects.


1 Answers

According to Interpreting openssl speed output for rsa with multi option , -multi doesn't "parallelize" work or something, it just runs multiple benchmarks in parallel.

So, your HW card's load will be essentially limited by how much work is available at the moment (note that in industry in general, 80% planned capacity load is traditionally considered optimal in case of load spikes). Of course, running multiple server threads/processes will give you the same effect as multiple benchmarks.

OpenSSL supports multiple threads provided that you give it callbacks to lock shared data. For multiple processes, it warns about reusing data state inherited from parent.

That's it for scaling vertically. For scaling horizontally:

  • openssl supports asynchronous I/O through asynchronous BIOs
  • but, its elemental crypto operations and internal ENGINE calls are synchronous, and changing this would require a logic overhaul
  • private efforts to make them provide asynchronous operation have met severe criticism due to major design flaws

Intel announced some "Asynchronous OpenSSL" project (08.2014) to use with its hardware, but the linked white paper gives little details about its implementation and development state. One developer published some related code (10.2015), noting that it's "stable enough to get an overview".

like image 189
ivan_pozdeev Avatar answered Sep 18 '22 06:09

ivan_pozdeev