Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a column in pandas dataframe based on dot

I went through similar questions but could not solve my problem. A part of my dataframe looks like this:

     Index Character           Top 10 by edits            Top 10 by added text
780    NaN   Viradha  David G Brault · 8 (40%)  David G Brault · 1,915 (81.4%)
781    NaN   Viradha         Wiki-uk · 4 (20%)       Risingstar12 · 213 (9.1%)
782    NaN   Viradha  Rich Farmbrough · 1 (5%)         Woohookitty · 44 (1.9%)
783    NaN   Viradha      Woohookitty · 1 (5%)           World8115 · 41 (1.7%)
784    NaN   Viradha        World8115 · 1 (5%)     Rich Farmbrough · 33 (1.4%)
785    NaN   Viradha    141.213.55.83 · 1 (5%)            SmackBot · 31 (1.3%)
786    NaN   Viradha     Omnipaedista · 1 (5%)      Citation bot 1 · 27 (1.1%)
787    NaN   Viradha      Jayarathina · 1 (5%)        Omnipaedista · 20 (0.9%)
788    NaN   Viradha     Risingstar12 · 1 (5%)             Wiki-uk · 17 (0.7%)
789    NaN   Viradha   203.142.46.153 · 1 (5%)      203.142.46.153 · 11 (0.5%)

Now I want to split the two columns "Top 10 by edits" and "Top 10 by added text" by matching the dot in between ("space-dot-space"). To split the first column, I tried:

s = df["Top 10 by edits"].str.split(" . ", n = 1, expand = True)

df["Top 10 by edits"]  = s[0]
df["Edits contribution"] = s[1]

However, this results in the following dataframe:

     Index Character  Top 10 by edits            Top 10 by added text Edits contribution
780    NaN   Viradha            David  David G Brault · 1,915 (81.4%)   Brault · 8 (40%)
781    NaN   Viradha          Wiki-uk       Risingstar12 · 213 (9.1%)            4 (20%)
782    NaN   Viradha  Rich Farmbrough         Woohookitty · 44 (1.9%)             1 (5%)
783    NaN   Viradha      Woohookitty           World8115 · 41 (1.7%)             1 (5%)
784    NaN   Viradha        World8115     Rich Farmbrough · 33 (1.4%)             1 (5%)
785    NaN   Viradha    141.213.55.83            SmackBot · 31 (1.3%)             1 (5%)
786    NaN   Viradha     Omnipaedista      Citation bot 1 · 27 (1.1%)             1 (5%)
787    NaN   Viradha      Jayarathina        Omnipaedista · 20 (0.9%)             1 (5%)
788    NaN   Viradha     Risingstar12             Wiki-uk · 17 (0.7%)             1 (5%)
789    NaN   Viradha   203.142.46.153      203.142.46.153 · 11 (0.5%)             1 (5%)

As can be seen, the first row is not split at .. I also tried \. and r" . " but nothing does what I need. What exactly is wrong? Thanks in advance.

like image 939
Peaceful Avatar asked May 19 '26 02:05

Peaceful


1 Answers

The dot in the 'Top 10 by added text' column is not a period but is rather a dot character whereas you are trying to split by a period in your code. Try changing one or the other to match.

like image 96
maf164 Avatar answered May 20 '26 15:05

maf164



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!