Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas extractall() - return list, not a MultiLevel index?

I have a question, which I have a feeling might have already been asked before, but in a different form. Point me to the original if that's the case please.

Anyway, I am playing with Pandas extractall() method, and I don't quite like the fact it returns a DataFrame with MultiLevel index (original index -> 'match' index) with all found elements listed under match 0, match 1, match 2 ... I would rather prefer if the output was a single indexed DataFrame, with multiple regex search results (if applicable) returned as a list in single cell. Is that possible at the moment?

Here's a visualization of what I have in mind:

Current output:

                   X
index    match
  0        0      thank
  1        0      thank
           1      thanks
           2      thanking
  2        0      thanked

Desired output

          X
index
  0      thank
  1      [thank, thanks, thanking]
  2      thanked

I`ll be grateful for any suggestions.

like image 359
Greem666 Avatar asked Jan 16 '18 05:01

Greem666


People also ask

How to reset index of Series Pandas?

reset_index() function generate a new DataFrame or Series with the index reset. This comes handy when index is need to be used as a column. drop : Just reset the index, without inserting it as a column in the new DataFrame.

How do you split a column by delimiter in Python?

Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.


1 Answers

Let's try:

df.groupby(level=0)['X'].apply(list)

Output:

0                      [thank]
1    [thank, thanks, thanking]
2                    [thanked]
Name: X, dtype: object
like image 59
Scott Boston Avatar answered Sep 19 '22 14:09

Scott Boston