Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace and duplicate string with a specific max count in pandas

I have a dataset, df, that repeats a sequence for X amount of times. I would like to replace certain letters of this sequence and then repeat for a given max count.

Data

xy_pod  xy_pod  xy_pod  xy_pod
xy_pod  xy_pod  xy_pod  xy_pod
xy_pod  xy_pod  xy_pod  xy_pod

other letters where I would like to replace the 'xy' portion with:

   aa
   vee
   lee

Desired

xy_pod  xy_pod  xy_pod  xy_pod
xy_pod  xy_pod  xy_pod  xy_pod
xy_pod  xy_pod  xy_pod  xy_pod



aa_pod  aa_pod  aa_pod  aa_pod
aa_pod  aa_pod  aa_pod  aa_pod
aa_pod  aa_pod  aa_pod  aa_pod
    

vee_pod vee_pod vee_pod vee_pod
vee_pod vee_pod vee_pod vee_pod
vee_pod vee_pod vee_pod vee_pod


lee_pod lee_pod lee_pod lee_pod
lee_pod lee_pod lee_pod lee_pod
lee_pod lee_pod lee_pod lee_pod

Doing

df.replace(xy_pod, aa_pod, 12)
df.replace(aa_pod, vee_pod, 12)   
df.replace(vee_pod, lee_pod, 12)

This is very similar to the find and replace logic that excel offers. However, I am not sure how to specify the number of repetitions that I wish to occur. Also, how would I perform this for multiple sequences so that I do not have to perform the function for every new entry? Is there a more efficient way to do this?

Any suggestion or advice is appreciated

like image 699
Lynn Avatar asked May 20 '26 03:05

Lynn


2 Answers

Try this:

pd.concat([df]+[df.stack().str.replace('xy', i).unstack() for i in ['aa','vee', 'lll']])

Output:

         0        1        2        3
0   xy_pod   xy_pod   xy_pod   xy_pod
1   xy_pod   xy_pod   xy_pod   xy_pod
2   xy_pod   xy_pod   xy_pod   xy_pod
0   aa_pod   aa_pod   aa_pod   aa_pod
1   aa_pod   aa_pod   aa_pod   aa_pod
2   aa_pod   aa_pod   aa_pod   aa_pod
0  vee_pod  vee_pod  vee_pod  vee_pod
1  vee_pod  vee_pod  vee_pod  vee_pod
2  vee_pod  vee_pod  vee_pod  vee_pod
0  lll_pod  lll_pod  lll_pod  lll_pod
1  lll_pod  lll_pod  lll_pod  lll_pod
2  lll_pod  lll_pod  lll_pod  lll_pod
like image 98
Scott Boston Avatar answered May 21 '26 18:05

Scott Boston


Looking for the 12 time count need stack the find the count

s = df.stack()
find_count = s.groupby(s.shift().ne(s).cumsum()).transform('count')
n = 12
out = s[find_count==n].replace({'xy':'aa'},regex=True).combine_first(s).unstack()
out
Out[227]: 
        0       1       2       3
0  aa_pod  aa_pod  aa_pod  aa_pod
1  aa_pod  aa_pod  aa_pod  aa_pod
2  aa_pod  aa_pod  aa_pod  aa_pod
like image 22
BENY Avatar answered May 21 '26 16:05

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!