Here is my dataframe:
A B
0 asdf|afsdf|fasd sdsd|wer|wer
1 sdfsdf sdfsdff
2 sdf|s sdfsde|sdf
I would like to form column C
which would contain concatenated values from column A
and column B
up to first |
, and if |
is not present, then it will simply concatenate two columns. Also while concatenating I would like to insert --
here is how column C
should look like:
C
0 asdf--sdsd
1 sdfsdf--sdfsdff
2 sdf--sdfsde
I can go through every single row with df.loc
and get what I need, but it is slow and I wonder if there is faster way of doing it.
There may be a more fancy way, but this works:
In [1]: import pandas as pd
In [2]: d = {'A': ('asdf|afsdf|fasd', 'sdfsdf', 'sdf|s'),
'B': ('sdsd|wer|wer', 'sdfsdff', 'sdfsde|sdf')}
In [3]: data = pd.DataFrame(d)
In [4]: data['C'] = data['A'].str.split('|').str.get(0) + "--" + data['B'].str.split('|').str.get(0)
In [5]: data
Out[5]:
A B C
0 asdf|afsdf|fasd sdsd|wer|wer asdf--sdsd
1 sdfsdf sdfsdff sdfsdf--sdfsdff
2 sdf|s sdfsde|sdf sdf--sdfsde
References on working with strings in pandas
can be found in their documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With