Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash each row of pandas dataframe column using apply

I'm trying to hash each value of a python 3.6 pandas dataframe column with the following algorithm on the dataframe-column ORIG:

HK_ORIG = base64.b64encode(hashlib.sha1(str(df.ORIG).encode("UTF-8")).digest())

However, the above mentioned code does not hash each value of the column, so, in order to hash each value of the df-column ORIG, I need to use the apply function. Unfortunatelly, I don't seem to be good enough to get this done.

I imagine it to look like the following code:

df["HK_ORIG"] = str(df['ORIG']).encode("UTF-8")).apply(hashlib.sha1)

I'm looking very much forward to your answers! Many thanks in advance!

like image 338
C.Tomas Avatar asked Jul 04 '18 18:07

C.Tomas


People also ask

Does ILOC select rows or columns?

Selecting rows and columns simultaneouslyiloc and loc indexers to select rows and columns simultaneously. The rows and column values may be scalar values, lists, slice objects or boolean.

Is pandas query faster than LOC?

The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.

How do you hash a DataFrame in Python?

generate hash SHA512 on concatenated value and put to new column. put hashed value to defined Destination DataFrame as destinationdf where column name is start with Hash_ combine with all columns in column list (Column name will be Hash_IDSalt in this case)


1 Answers

You can either create a named function and apply it - or apply a lambda function. In either case, do as much processing as possible withing the dataframe.

A lambda-based solution:

df['ORIG'].astype(str).str.encode('UTF-8')\
          .apply(lambda x: base64.b64encode(hashlib.sha1(x).digest()))

A named function solution:

def hashme(x):
    return base64.b64encode(hashlib.sha1(x).digest())
df['ORIG'].astype(str).str.encode('UTF-8')\
          .apply(hashme)
like image 128
DYZ Avatar answered Oct 23 '22 05:10

DYZ