Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove everything after the last occurence of a character in a Dataframe?

I have a dataframe DF that looks like this (This is a sample):

    EQ1                    EQ2                       EQ3
0   Apple.fruit            Oranage.eatable.fruit     NaN
1   Pear.eatable.fruit     Banana.fruit              NaN
2   Orange.fruit           Tomato.eatable            Potato.eatable.vegetable
3   Kiwi.eatable           Pear.fruit                Cabbage.vegetable
<And so on.. It is a large Dataframe>

I would like to remove everything AFTER the LAST occurrence of the dot . in every element of DF and save it under a different name,say df_temp.
Desired ouput:

   EQ1               EQ2                 EQ3
0   Apple            Oranage.eatable     NaN
1   Pear.eatable     Banana              NaN
2   Orange           Tomato              Potato.eatable
3   Kiwi             Pear                Cabbage
<And so on>

This is what I tried: df_temp=".".join(DF.split(".")[:-1]).
Unfortunately this seems to work only with strings and not Dataframe. Do I have to tweak this line a bit to achieve what I want? Someone please help!

like image 452
controlfreak Avatar asked Feb 06 '23 22:02

controlfreak


1 Answers

You could do:

df_temp = df.apply(lambda x: x.str.split('.').str[:-1].str.join('.'))

output:

            EQ1              EQ2             EQ3
0         Apple  Oranage.eatable             NaN
1  Pear.eatable           Banana             NaN
2        Orange           Tomato  Potato.eatable
3          Kiwi             Pear         Cabbage   

see the string method docs

like image 73
Mr.F Avatar answered Feb 09 '23 12:02

Mr.F