I'm trying to hash each value of a python 3.6 pandas dataframe column with the following algorithm on the dataframe-column ORIG:
HK_ORIG = base64.b64encode(hashlib.sha1(str(df.ORIG).encode("UTF-8")).digest())
However, the above mentioned code does not hash each value of the column, so, in order to hash each value of the df-column ORIG, I need to use the apply function. Unfortunatelly, I don't seem to be good enough to get this done.
I imagine it to look like the following code:
df["HK_ORIG"] = str(df['ORIG']).encode("UTF-8")).apply(hashlib.sha1)
I'm looking very much forward to your answers! Many thanks in advance!
Selecting rows and columns simultaneouslyiloc and loc indexers to select rows and columns simultaneously. The rows and column values may be scalar values, lists, slice objects or boolean.
The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.
generate hash SHA512 on concatenated value and put to new column. put hashed value to defined Destination DataFrame as destinationdf where column name is start with Hash_ combine with all columns in column list (Column name will be Hash_IDSalt in this case)
You can either create a named function and apply it - or apply a lambda function. In either case, do as much processing as possible withing the dataframe.
A lambda-based solution:
df['ORIG'].astype(str).str.encode('UTF-8')\
.apply(lambda x: base64.b64encode(hashlib.sha1(x).digest()))
A named function solution:
def hashme(x):
return base64.b64encode(hashlib.sha1(x).digest())
df['ORIG'].astype(str).str.encode('UTF-8')\
.apply(hashme)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With