Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split Pandas Dataframe Column According To a Value

I searched and I couldn't find a problem like mine. So if there is and somehow I couldn't find please let me know. So I can delete this post.

I stuck with a problem to split pandas dataframe into different data frames (df) by a value.

I have a dataset inside a text file and I store them as pandas dataframe that has only one column. There are more than one sets of information inside the dataset and a certain value defines the end of that set, you can see a sample below:

The Sample Input

In [8]: df
Out[8]: 
  var1
0    a
1    b
2    c
3    d
4    endValue
5    h
6    f
7    b
8    w
9    endValue

So I want to split this df into different data frames. I couldn't find a way to do that but I'm sure there must be an easy way. The format I display in sample output can be a wrong format. So, If you have a better idea I'd love to see. Thank you for help.

The sample output I'd like

  var1
{[0    a
1    b
2    c
3    d
4    endValue]},
{[0    h
1    f
2    b
3    w
4    endValue]}

like image 912
aysebilgegunduz Avatar asked Apr 27 '20 07:04

aysebilgegunduz


Video Answer


2 Answers

You could check where var1 is endValue, take the cumsum, and use the result as a custom grouper. Then Groupby and build a dictionary from the result:

d = dict(tuple(df.groupby(df.var1.eq('endValue').cumsum().shift(fill_value=0.))))

Or for a list of dataframes (effectively indexed in the same way):

l = [v for _,v in df.groupby(df.var1.eq('endValue').cumsum().shift(fill_value=0.))]

print(l[0])

       var1
0         a
1         b
2         c
3         d
4  endValue
like image 100
yatu Avatar answered Nov 15 '22 03:11

yatu


One idea with unique index values is replace non matched values to NaNs and backfilling them, last loop groupby object for list of DataFrames:

g = df.index.to_series().where(df['var1'].eq('endValue')).bfill()
dfs = [a for i, a in df.groupby(g, sort=False)]
print (dfs)
[       var1
0         a
1         b
2         c
3         d
4  endValue,        var1
5         h
6         f
7         b
8         w
9  endValue]
like image 39
jezrael Avatar answered Nov 15 '22 02:11

jezrael