I have a pandas data frame like df with a column construct_name
construct_name aaaa_t1_2 cccc_t4_10 bbbb_g3_3
and so on. I want to first split all the names at the underscore and store the first element (aaaa,cccc, etc.) as another column name.
Expected output
construct_name name aaaa_t1_2 aaaa cccc_t4_10 bbbb
and so on.
I tried the following df['construct_name'].map(lambda row:row.split("_"))
and it gives me a list like
[aaaa,t1,2] [cccc,t4,10]
and so on
But when I do
df['construct_name'].map(lambda row:row.split("_"))[0]
to get the first element of the list I get an error. Can you suggest a fix. Thanks
Use the str. split() method with maxsplit set to 1 to split a string and get the first element, e.g. my_str. split('_', 1)[0] . The split() method will only perform a single split when maxsplit is set to 1 .
To split a string by underscore in Python, pass the underscore character "_" as a delimiter to the split() function. It returns a list of strings resulting from splitting the original string on the occurrences of "_" .
Splitting on a Specific Substring By providing an optional parameter, . split('x') can be used to split a string on a specific substring 'x'. Without 'x' specified, . split() simply splits on all whitespace, as seen above.
Just use the vectorised str
method split
and use integer indexing on the list to get the first element:
In [228]: df['first'] = df['construct_name'].str.split('_').str[0] df Out[228]: construct_name first 0 aaaa_t1_2 aaaa 1 cccc_t4_10 cccc 2 bbbb_g3_3 bbbb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With