Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas assigning random string to each group as new column

We have a dataframe like

Out[90]: 
   customer_id                 created_at
0     11492288 2017-03-15 10:20:18.280437
1      8953727 2017-03-16 12:51:00.145629
2     11492288 2017-03-15 10:20:18.284974
3     11473213 2017-03-09 14:15:22.712369
4      9526296 2017-03-14 18:56:04.665410
5      9526296 2017-03-14 18:56:04.662082

I would like to create new column here, based on groups of customer_id, random strings of 8 characters assigned to each group.

For example the output would then look like

Out[90]: 
   customer_id                 created_at     code
0     11492288 2017-03-15 10:20:18.280437 nKAILfyV
1      8953727 2017-03-16 12:51:00.145629 785Vsw0b
2     11492288 2017-03-15 10:20:18.284974 nKAILfyV
3     11473213 2017-03-09 14:15:22.712369 dk6JXq3u
4      9526296 2017-03-14 18:56:04.665410 1WESdAsD
5      9526296 2017-03-14 18:56:04.662082 1WESdAsD

I am used to R and dplyr, and it is super easy to write this transformation using them. I am looking for something similar in Pandas to this:

library(dplyr)
library(stringi)

df %>%
  group_by(customer_id) %>%
  mutate(code = stri_rand_strings(1, 8))

I can figure out the random character part. Just curious on how Pandas groupby works in this case.

Thanks!

like image 410
user4505419 Avatar asked Sep 07 '17 21:09

user4505419


People also ask

How do I create a new column from the output of pandas Groupby () SUM ()?

To create a new column for the output of groupby. sum(), we will first apply the groupby. sim() operation and then we will store this result in a new column.

How do I assign a random value to a column in pandas?

We can generate a 2D numpy array of random numbers using numpy. random. randint() and the pass it to pandas. Dataframe() to create a multiple Dataframe of random values.

How do you make a new column in pandas that is an aggregation of other elements from other columns?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.


1 Answers

In pandas (R's mutate) is transform

df['code']=df.groupby('customer_id').transform(lambda x:pd.util.testing.rands_array(8,1))
df
Out[314]: 
   customer_id  created_at      code
0     11492288  2017-03-15  L6Odf65d
1      8953727  2017-03-16  fwLpgLnt
2     11492288  2017-03-15  L6Odf65d
3     11473213  2017-03-09  AuSUPnJ9
4      9526296  2017-03-14  U1AiLyx0
5      9526296  2017-03-14  U1AiLyx0

EDIT (from cᴏʟᴅsᴘᴇᴇᴅ) :df.groupby('customer_id').customer_id.transform(lambda x:pd.util.testing.rands_array(8,1))

Also some improvement in you R code ,

Match=data.frame(A=unique(df$customer_id),B=replicate(length(unique(df$year)), stri_rand_strings(1, 8)))
df$Code=Match$B[match(df$customer_id,Match$A)]
like image 117
BENY Avatar answered Oct 05 '22 23:10

BENY