I have function get_tags
which returns a list of labels corresponding to a text:
def get_tags(text):
# Do some analysis and return a list of tags
return tags
E.g., get_tags(text1)
returns ['a', 'b', 'c']
while get_tags(text2)
returns ['a', 'b']
I also have a pandas DataFrame df
with columns [text, a, b, c, d, e, f]
having 500,000 rows. I want to fill 1's as labels to the text in a particular row. Right now, I am executing
for i in range(len(df)):
df.loc[i, get_tags(df.loc[i, "text"])] = 1
This is painfully slow. I can use joblib
but before that I want to see the most efficient way to achieve this.
Before execution, df
looks like this:
text a b c d e f
0 text having a, b, c tags 0 0 0 0 0 0
1 text having a, c tags 0 0 0 0 0 0
2 text having a, b, f tags 0 0 0 0 0 0
After the execution, it should look like this:
text a b c d e f
0 text having a, b, c tags 1 1 1 0 0 0
1 text having a, c tags 1 0 1 0 0 0
2 text having a, b, f tags 1 1 0 0 0 1
df is your raw dataframe,
we can also use MultiLabelBinarizer in sklearn.preprocessing.
Before execution, df is:
--------------------------
| | text | labels |
--------------------------
| 0 | A | a, b, c |
--------------------------
| 1 | B | a, c |
--------------------------
| 2 | C | a, b, f |
--------------------------
do as follows:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb_result = mlb.fit_transform([str(df.loc[i,'labels']).split(',') for i in range(len(df))])
df_final = pd.concat([df['text'],pd.DataFrame(mlb_result,columns=list(mlb.classes_))],axis=1)
After execution, df_final is:
------------------------------------
| | text | a | b | c | d | e | f |
------------------------------------
| 0 | A | 1 | 1 | 1 | 0 | 0 | 0 |
------------------------------------
| 1 | B | 1 | 0 | 1 | 0 | 0 | 0 |
------------------------------------
| 2 | C | 1 | 1 | 0 | 0 | 0 | 1 |
------------------------------------
df_final will be what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With