Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chunking big datasets in PyRFC. Possible?

Is there a way to do "chunking" of big results into several smaller parts with SAP-RFC?

According to these links it seems like you need to implement chunking yourself :-(

  • https://archive.sap.com/discussions/thread/1416684
  • https://github.com/SAP/PyRFC/issues/20

I would like to avoid this, and I hope that there is a way let SAP-RFC library do the chunking.

Use case:

The result are 100k rows. I would like to fetch 1k rows until all rows are received.

I guess it does not matter much, but I will use PyRFC for my code.

like image 258
guettli Avatar asked Apr 06 '18 11:04

guettli


2 Answers

According to this issue #60 sap-rfc can't do chunking. You need to make several smaller RFC calls.

That's sad. I guess there are several hundred dirty homegrown chunking solutions in proprietary closed source which all do solve the same thing over and over again.

like image 136
guettli Avatar answered Nov 20 '22 09:11

guettli


The RFC library can't do much here: it just makes a request and then receives the response from the R/3 system. So if the R/3 system returns 100k rows, the library will receive these 100k rows, if the R/3 system returns a chunk of these rows, the library will receive only this chunk...

In order to do chunking (or "paging") the two sides (the external program and the ABAP code that gets called) will have to cooperate in some way. This is nothing a generic library could do.

RFC basically follows the "request-response" pattern, and if you want smaller pieces of response data, then the client has to make multiple requests and the server has to return only a part of the "overall data" for each of these requests.

Edit: I have also read your issue #60 now, and if your main concern is performance, then perhaps you are better of to use the C/C++ NW RFC Library directly instead of a Python wrapper?

I'm not familiar with how Python works, but if it is somewhat similar to Java/JNI, then I expect that you have a total of two copies of all the data in memory: first the RFC library receives the data from the wire and stores it on the C heap, and then some C <-> Python interop layer needs to copy that data over to the Python virtual machine?! If that is the case, you could already safe 50% of memory consumption by writing your extractor program in C/C++.

like image 41
Lanzelot Avatar answered Nov 20 '22 09:11

Lanzelot