I have asked similar question in R about creating hash value for each row of data. I know that I can use something like hashlib.md5(b'Hello World').hexdigest()
to hash a string, but how about a row in a dataframe?
I have drafted my code as below:
for index, row in course_staff_df.iterrows():
temp_df.loc[index,'hash'] = hashlib.md5(str(row[['cola','colb']].values)).hexdigest()
It seems not very pythonic to me, any better solution?
generate hash SHA512 on concatenated value and put to new column. put hashed value to defined Destination DataFrame as destinationdf where column name is start with Hash_ combine with all columns in column list (Column name will be Hash_IDSalt in this case)
DataFrame. iloc[] is an index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more. One of the main advantages of DataFrame is its ease of use.
Or simply:
df.apply(lambda x: hash(tuple(x)), axis = 1)
As an example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3,5))
print df
df.apply(lambda x: hash(tuple(x)), axis = 1)
0 1 2 3 4
0 0.728046 0.542013 0.672425 0.374253 0.718211
1 0.875581 0.512513 0.826147 0.748880 0.835621
2 0.451142 0.178005 0.002384 0.060760 0.098650
0 5024405147753823273
1 -798936807792898628
2 -8745618293760919309
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With