I have a pandas dataset with a column of words and a column of integer (0,1). All words that appear between a zero (first integer, or after a 1) and a 1(including) should be put into a 2D array.
Let me explain:
Consider this pandas dataframe:
import pandas as pd
df = pd.DataFrame(columns=['Text','Selection_Values'])
df["Text"] = ["Hi", "this is", "just", "a", "single", "sentence.", "This", "is another one."]
df["Selection_Values"] = [0,0,0,0,0,1,0,1]
print(df)
This is the example dataset:
Text Selection_Values
0 Hi 0
1 this is 0
2 just 0
3 a 0
4 single 0
5 sentence. 1
6 This 0
7 is another one. 1
The expected result should be:
[["Hi this is just a single sentence."],["This is another one"]]
Do you have any idea of how to go about this ?
This is what I have done so far:
result = []
s = ""
for i in range(len(df["Text"])):
s += df["Text"][i] + " "
if df["Selection_Values"][i] == 1:
result.append([s])
s = ""
It works:
[['Hi this is just a single sentence. '], ['This is another one. ']]
...but it might not be the best method. It does not make use of the pandas framework at all.
Using shift
+ ' '.join
. This assumes of course that every sentence has a closing 1
and there are no hanging sentences.
g = df['Selection_Values'].shift().eq(1).cumsum()
df['Text'].groupby(g).agg(' '.join).tolist()
['Hi this is just a single sentence.', 'This is another one.']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With