Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sp_execute_external_script Python In Memory Variable for Faster Process

Is there a way to make a variable saved in memory (like global variable) without loading using pickle.loads every time executing a script using sp_execute_external_script?

I have a Python script that process a data using preprocessed matrix. I have the matrix saved in a table once using script A.

--Script A
DECLARE @matrix VARBINARY(MAX)
EXECUTE sp_execute_external_script @language = N'Python'
  , @script = N'
...
matrix = pickle.dumps(processed_matrix)
'
  , @input_data_1 = N'SOME SELECT QUERY'
  , @params = N'@matrix VARBINARY(MAX) OUTPUT'
  , @matrix = @matrix OUTPUT

DELETE FROM MatrixTable
INSERT INTO MatrixTable(matrix) VALUES(@matrix)

Then sending the matrix through a parameter every time running script B.

--Script B
DECLARE @matrix VARBINARY(MAX)
SELECT @matrix = matrix
FROM MatrixTable

EXECUTE sp_execute_external_script @language = N'Python'
  , @script = N'
preprocessed_matrix = pickle.loads(matrix)
...
'
  , @input_data_1 = N'SOME SELECT QUERY'
  , @params = N'@matrix VARBINARY(MAX)'
  , @matrix = @matrix

Because the matrix is processed only once and it loads multiple times, so I think it could be great if script A runs on server starts and stored the resulting matrix in sql memory that can be accessed from script B without save and load from a table. Is there a way to store the matrix in memory so I don't need to save it to a table and load it using pickle to make it faster?

like image 659
Viki Theolorado Avatar asked May 30 '26 06:05

Viki Theolorado


1 Answers

Do you really need to pickle the matrix and save it this way?

I would just convert the matrix to a pandas dataframe and store it into a SQL table. This way you can access it using SQL Server cached memory. Use it as a table that reloads.

Depending on how big your data is, this should be the best approach. Remember SQL Server stores data in 8k pages, so storing a lob like VARBINARY(MAX) into a single column means SQL Server has to split the data into multiple pages.

Having the matrix row by row in a SQL Table is the preferred way of doing it via SQL Server. It is built and optimized for this.

like image 181
Feodot Bogdan Avatar answered Jun 01 '26 19:06

Feodot Bogdan