id string
0 31672;0
1 31965;0
2 0;78464
3 51462
4 31931;0
Hi, I have that table. i would like to split the string table by the ';', and store it to the new column. the final column shold be like this
id string word_count
0 31672;0 2
1 31965;0 2
2 0;78464 2
3 51462 1
4 31931;0 2
it would be nice if someone knows how to do it with python.
Option 1
The basic solution using str.split
+ str.len
-
df['word_count'] = df['string'].str.split(';').str.len()
df
string word_count
id
0 31672;0 2
1 31965;0 2
2 0;78464 2
3 51462 1
4 31931;0 2
Option 2
The clever (efficient, less space consuming) solution with str.count
-
df['word_count'] = df['string'].str.count(';') + 1
df
string word_count
id
0 31672;0 2
1 31965;0 2
2 0;78464 2
3 51462 1
4 31931;0 2
Caveat - this would ascribe a word count of 1 even for an empty string (in which case, stick with option 1).
If you want each word occupying a new column, there's a quick and simple way using tolist
, loading the splits into a new dataframe, and concatenating the new dataframe with the original using concat
-
v = pd.DataFrame(df['string'].str.split(';').tolist())\
.rename(columns=lambda x: x + 1)\
.add_prefix('string_')
pd.concat([df, v], 1)
string word_count string_1 string_2
id
0 31672;0 2 31672 0
1 31965;0 2 31965 0
2 0;78464 2 0 78464
3 51462 1 51462 None
4 31931;0 2 31931 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With