In python, I am trying to find the quickest to hash each value in a pandas data frame.
I know any string can be hashed using:
hash('a string')
But how do I apply this function on each element of a pandas data frame?
This may be a very simple thing to do, but I have just started using python.
Show activity on this post. As of Pandas 0.20.1, you can use the little known (and poorly documented) hash_pandas_object ( source code) which was recently made public in pandas.util. It returns one hash value for reach row of the dataframe (and works on series etc. too)
DataFrame Looping (iteration) with a for statement. You can loop over a pandas dataframe, for each column row by row. Below pandas.
Using a DataFrame as an example. You can use the iteritems () method to use the column name (column name) and the column data (pandas. Series) tuple (column name, Series) can be obtained.
Y our dataset can commonly contain sensitive data in one or more columns. For example, user IDs, patient numbers, or license numbers. Here I share how to create a new column containing hashed strings based on the clear-text strings of the other column of Pandas DataFrame.
Pandas also has a function to apply a hash function on an array or column:
import pandas as pd
df = pd.DataFrame({'a':['asds','asdds','asdsadsdas']})
df["hash"] = pd.util.hash_array(df["a"].to_numpy())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With