Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create hash value for each row of data with selected columns in dataframe in python pandas

I have asked similar question in R about creating hash value for each row of data. I know that I can use something like hashlib.md5(b'Hello World').hexdigest() to hash a string, but how about a row in a dataframe?

update 01

I have drafted my code as below:

for index, row in course_staff_df.iterrows():
        temp_df.loc[index,'hash'] = hashlib.md5(str(row[['cola','colb']].values)).hexdigest()

It seems not very pythonic to me, any better solution?

like image 830
lokheart Avatar asked Sep 10 '14 03:09

lokheart


People also ask

How do you hash a DataFrame in Python?

generate hash SHA512 on concatenated value and put to new column. put hashed value to defined Destination DataFrame as destinationdf where column name is start with Hash_ combine with all columns in column list (Column name will be Hash_IDSalt in this case)

Does ILOC select rows or columns?

DataFrame. iloc[] is an index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more. One of the main advantages of DataFrame is its ease of use.


1 Answers

Or simply:

df.apply(lambda x: hash(tuple(x)), axis = 1)

As an example:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3,5))
print df
df.apply(lambda x: hash(tuple(x)), axis = 1)

     0         1         2         3         4
0  0.728046  0.542013  0.672425  0.374253  0.718211
1  0.875581  0.512513  0.826147  0.748880  0.835621
2  0.451142  0.178005  0.002384  0.060760  0.098650

0    5024405147753823273
1    -798936807792898628
2   -8745618293760919309
like image 185
cwharland Avatar answered Oct 19 '22 16:10

cwharland