Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a column of strings and counting the number of words with pandas [duplicate]

id   string   
0    31672;0           
1    31965;0
2    0;78464
3      51462
4    31931;0

Hi, I have that table. i would like to split the string table by the ';', and store it to the new column. the final column shold be like this

 id   string   word_count
0    31672;0    2       
1    31965;0    2
2    0;78464    2
3      51462    1
4    31931;0    2

it would be nice if someone knows how to do it with python.

like image 984
al1991 Avatar asked Jan 29 '23 23:01

al1991


1 Answers

Option 1
The basic solution using str.split + str.len -

df['word_count'] = df['string'].str.split(';').str.len()
df

     string  word_count
id                     
0   31672;0           2
1   31965;0           2
2   0;78464           2
3     51462           1
4   31931;0           2

Option 2
The clever (efficient, less space consuming) solution with str.count -

df['word_count'] = df['string'].str.count(';') + 1
df

     string  word_count
id                     
0   31672;0           2
1   31965;0           2
2   0;78464           2
3     51462           1
4   31931;0           2

Caveat - this would ascribe a word count of 1 even for an empty string (in which case, stick with option 1).


If you want each word occupying a new column, there's a quick and simple way using tolist, loading the splits into a new dataframe, and concatenating the new dataframe with the original using concat -

v = pd.DataFrame(df['string'].str.split(';').tolist())\
        .rename(columns=lambda x: x + 1)\
        .add_prefix('string_')

pd.concat([df, v], 1)

     string  word_count string_1 string_2
id                                       
0   31672;0           2    31672        0
1   31965;0           2    31965        0
2   0;78464           2        0    78464
3     51462           1    51462     None
4   31931;0           2    31931        0
like image 174
cs95 Avatar answered Feb 02 '23 08:02

cs95