Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas manipulating strings in two columns

Tags:

string

pandas

Here is my dataframe:

           A                B
0   asdf|afsdf|fasd    sdsd|wer|wer
1   sdfsdf             sdfsdff
2   sdf|s              sdfsde|sdf

I would like to form column C which would contain concatenated values from column A and column B up to first |, and if | is not present, then it will simply concatenate two columns. Also while concatenating I would like to insert -- here is how column C should look like:

         C
0   asdf--sdsd
1   sdfsdf--sdfsdff
2   sdf--sdfsde

I can go through every single row with df.loc and get what I need, but it is slow and I wonder if there is faster way of doing it.

like image 357
user1700890 Avatar asked Feb 05 '23 23:02

user1700890


1 Answers

There may be a more fancy way, but this works:

In [1]: import pandas as pd

In [2]: d = {'A': ('asdf|afsdf|fasd', 'sdfsdf', 'sdf|s'),
             'B': ('sdsd|wer|wer', 'sdfsdff', 'sdfsde|sdf')}

In [3]: data = pd.DataFrame(d)

In [4]: data['C'] = data['A'].str.split('|').str.get(0) + "--" + data['B'].str.split('|').str.get(0)

In [5]: data
Out[5]: 
                 A             B                 C
0  asdf|afsdf|fasd  sdsd|wer|wer       asdf--sdsd
1           sdfsdf       sdfsdff  sdfsdf--sdfsdff
2            sdf|s    sdfsde|sdf      sdf--sdfsde

References on working with strings in pandas can be found in their documentation.

like image 182
Michael Gecht Avatar answered Feb 13 '23 17:02

Michael Gecht