Using
dd = {'ID': ['H576','H577','H578','H600', 'H700'],
'CD': ['AAAAAAA', 'BBBBB', 'CCCCCC','DDDDDD', 'EEEEEEE']}
df = pd.DataFrame(dd)
Pre Pandas 0.25, this below worked.
set: redisConn.set("key", df.to_msgpack(compress='zlib'))
get: pd.read_msgpack(redisConn.get("key"))
Now, there are deprecated warnings..
FutureWarning: to_msgpack is deprecated and will be removed in a future version.
It is recommended to use pyarrow for on-the-wire transmission of pandas objects.
The read_msgpack is deprecated and will be removed in a future version.
It is recommended to use pyarrow for on-the-wire transmission of pandas objects.
How does pyarrow work? And, how do I get pyarrow objects into and back from Redis.
reference: How to set/get pandas.DataFrame to/from Redis?
To interface with pandas, PyArrow provides various conversion routines to consume pandas structures and convert back to them. While pandas uses NumPy as a backend, it has enough peculiarities (such as a different type system, and support for null values) that this is a separate topic from NumPy Integration.
To summarize, if your apps save/load data from disk frequently, then it's a wise decision to leave these operations to PyArrow. Heck, it's 7 times faster for the identical file format. Imagine we introduced Parquet file format to the mix.
where spark is the SparkSession object. Example 1: Create a DataFrame and then Convert using spark.createDataFrame () method Example 2: Create a DataFrame and then Convert using spark.createDataFrame () method In this method, we are using Apache Arrow to convert Pandas to Pyspark DataFrame.
It is recommended to use pyarrow for on-the-wire transmission of pandas objects. The read_msgpack is deprecated and will be removed in a future version. It is recommended to use pyarrow for on-the-wire transmission of pandas objects. How does pyarrow work? And, how do I get pyarrow objects into and back from Redis.
That leaves good & trusty msgpack to serialize objects such that they can be pushed/stored into redis. Thanks for contributing an answer to Stack Overflow!
Apply the code to go from the DataFrame to SQL: You can then find the maximum price among all the products using this query: Finally, get back from SQL to the DataFrame: Putting all the code components together: Once you run the code in Python, you’ll get the product with the maximum price:
Here's a full example to use pyarrow for serialization of a pandas dataframe to store in redis
apt-get install python3 python3-pip redis-server
pip3 install pandas pyarrow redis
and then in python
import pandas as pd
import pyarrow as pa
import redis
df=pd.DataFrame({'A':[1,2,3]})
r = redis.Redis(host='localhost', port=6379, db=0)
context = pa.default_serialization_context()
r.set("key", context.serialize(df).to_buffer().to_pybytes())
context.deserialize(r.get("key"))
A
0 1
1 2
2 3
I just submitted PR 28494 to pandas to include this pyarrow example in the docs.
Reference docs:
Here is how I do it since default_serialization_context is deprecated and things are a bit simpler:
import pyarrow as pa
import redis
pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
r = redis.Redis(connection_pool=pool)
def storeInRedis(alias, df):
df_compressed = pa.serialize(df).to_buffer().to_pybytes()
res = r.set(alias,df_compressed)
if res == True:
print(f'{alias} cached')
def loadFromRedis(alias):
data = r.get(alias)
try:
return pa.deserialize(data)
except:
print("No data")
storeInRedis('locations', locdf)
loadFromRedis('locations')
If you would like to compress the data in Redis, you can use the built in support for parquet & gzip
def openRedisCon():
pool = redis.ConnectionPool(host=REDIS_HOST, port=REDIS_PORT, db=0)
r = redis.Redis(connection_pool=pool)
return r
def storeDFInRedis(alias, df):
"""Store the dataframe object in Redis
"""
buffer = io.BytesIO()
df.to_parquet(buffer, compression='gzip')
buffer.seek(0) # re-set the pointer to the beginning after reading
r = openRedisCon()
res = r.set(alias,buffer.read())
def loadDFFromRedis(alias, useStale: bool = False):
"""Load the named key from Redis into a DataFrame and return the DF object
"""
r = openRedisCon()
try:
buffer = io.BytesIO(r.get(alias))
buffer.seek(0)
df = pd.read_parquet(buffer)
return df
except:
return None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With