Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting pandas.DataFrame to bytes

I need convert the data stored in a pandas.DataFrame into a byte string where each column can have a separate data type (integer or floating point). Here is a simple set of data:

df = pd.DataFrame([ 10, 15, 20], dtype='u1', columns=['a'])
df['b'] = np.array([np.iinfo('u8').max, 230498234019, 32094812309], dtype='u8')
df['c'] = np.array([1.324e10, 3.14159, 234.1341], dtype='f8')

and df looks something like this:

    a            b                  c
0   10  18446744073709551615    1.324000e+10
1   15  230498234019            3.141590e+00
2   20  32094812309             2.341341e+02

The DataFrame knows about the types of each column df.dtypes so I'd like to do something like this:

data_to_pack = [tuple(record) for _, record in df.iterrows()]
data_array = np.array(data_to_pack, dtype=zip(df.columns, df.dtypes))
data_bytes = data_array.tostring()

This typically works fine but in this case (due to the maximum value stored in df['b'][0]. The second line above converting the array of tuples to an np.array with a given set of types causes the following error:

OverflowError: Python int too large to convert to C long

The error results (I believe) in the first line which extracts the record as a Series with a single data type (defaults to float64) and the representation chosen in float64 for the maximum uint64 value is not directly convertible back to uint64.

1) Since the DataFrame already knows the types of each column is there a way to get around creating a row of tuples for input into the typed numpy.array constructor? Or is there a better way than outlined above to preserve the type information in such a conversion?

2) Is there a way to go directly from DataFrame to a byte string representing the data using the type information for each column.

like image 270
Paul Joireman Avatar asked Jan 07 '16 23:01

Paul Joireman


People also ask

How do you convert to bytes in Python?

We can use the built-in Bytes class in Python to convert a string to bytes: simply pass the string as the first input of the constructor of the Bytes class and then pass the encoding as the second argument. Printing the object shows a user-friendly textual representation, but the data contained in it is​ in bytes.

How do you convert a string to a byte?

We can use String class getBytes() method to encode the string into a sequence of bytes using the platform's default charset. This method is overloaded and we can also pass Charset as argument.

How do you convert numbers to bytes?

Basically, to do a bit to byte conversion, you take an 8 bit binary number and form it into groups of 4 bits (nibbles). You then translate each nibble into a hexadecimal number (a 2 hex digit byte) using this table. You then multiply the left digit by 16 and add the result to the first digit.


2 Answers

You can use df.to_records() to convert your dataframe to a numpy recarray, then call .tostring() to convert this to a string of bytes:

rec = df.to_records(index=False)

print(repr(rec))
# rec.array([(10, 18446744073709551615, 13240000000.0), (15, 230498234019, 3.14159),
#  (20, 32094812309, 234.1341)], 
#           dtype=[('a', '|u1'), ('b', '<u8'), ('c', '<f8')])

s = rec.tostring()
rec2 = np.fromstring(s, rec.dtype)

print(np.all(rec2 == rec))
# True
like image 50
ali_m Avatar answered Oct 23 '22 19:10

ali_m


import pandas as pd

df = pd.DataFrame([ 10, 15, 20], dtype='u1', columns=['a'])
df_byte = df.to_json().encode()
print(df_byte)
like image 43
Zeno Avatar answered Oct 23 '22 18:10

Zeno