Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shuffle one column in pandas dataframe

How does one shuffle only one column of data in pandas?

I have a Dataframe with production data that I want to load onto dev for testing. However, the data contains personally identifiable information so I want to shuffle those columns.

Columns: FirstName LastName Birthdate SSN OtherData

If the original dataframe is created by read_csv and I want to translate the data into a second dataframe for sql loading but shuffle first name, last name, and SSN, I would have expected to be able to do this:

if devprod == 'prod':
    #do not shuffle data
    df1['HS_FIRST_NAME'] = df[4]
    df1['HS_LAST_NAME'] = df[6]
    df1['HS_SSN'] = df[8]
else:
    df1['HS_FIRST_NAME'] = np.random.shuffle(df[4])
    df1['HS_LAST_NAME'] = np.random.shuffle(df[6])
    df1['HS_SSN'] = np.random.shuffle(df[8])

However, when I try that I get the following error:

A value is trying to be set on a copy of a slice from a DataFrame

like image 253
Arlo Guthrie Avatar asked Jan 02 '19 15:01

Arlo Guthrie


2 Answers

The immediate error is a symptom of using an inadvisable approach when working with dataframes.

np.random.shuffle works in-place and returns None, so assigning to the output of np.random.shuffle will not work. In fact, in-place operations are rarely required, and often yield no material benefits.

Here, for example, you can use np.random.permutation and use NumPy arrays via pd.Series.values rather than series:

if devprod == 'prod':
    #do not shuffle data
    df1['HS_FIRST_NAME'] = df[4]
    df1['HS_LAST_NAME'] = df[6]
    df1['HS_SSN'] = df[8]
else:
    df1['HS_FIRST_NAME'] = np.random.permutation(df[4].values)
    df1['HS_LAST_NAME'] = np.random.permutation(df[6].values)
    df1['HS_SSN'] = np.random.permutation(df[8].values)
like image 51
jpp Avatar answered Oct 06 '22 02:10

jpp


This also appears to do the job:

df1['HS_FIRST_NAME'] = df[4].sample(frac=1).values
like image 31
jeremy_rutman Avatar answered Oct 06 '22 03:10

jeremy_rutman