Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas split DataFrame by column value

I have DataFrame with column Sales.

How can I split it into 2 based on Sales value?

First DataFrame will have data with 'Sales' < s and second with 'Sales' >= s

like image 970
146 percent Russian Avatar asked Nov 16 '15 18:11

146 percent Russian


People also ask

How do you split a DataFrame based on the value of a column?

In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.

How do I slice values in a column in pandas?

By using pandas. DataFrame. loc[] you can slice columns by names or labels. To slice the columns, the syntax is df.

How do I separate DataFrame in pandas?

We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns. Using this function, we can split a DataFrame based on rows or columns.

How do you divide data frames?

div() method divides element-wise division of one pandas DataFrame by another. DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/).


2 Answers

You can use boolean indexing:

df = pd.DataFrame({'Sales':[10,20,30,40,50], 'A':[3,4,7,6,1]}) print (df)    A  Sales 0  3     10 1  4     20 2  7     30 3  6     40 4  1     50  s = 30  df1 = df[df['Sales'] >= s] print (df1)    A  Sales 2  7     30 3  6     40 4  1     50  df2 = df[df['Sales'] < s] print (df2)    A  Sales 0  3     10 1  4     20 

It's also possible to invert mask by ~:

mask = df['Sales'] >= s df1 = df[mask] df2 = df[~mask] print (df1)    A  Sales 2  7     30 3  6     40 4  1     50  print (df2)    A  Sales 0  3     10 1  4     20 

print (mask) 0    False 1    False 2     True 3     True 4     True Name: Sales, dtype: bool  print (~mask) 0     True 1     True 2    False 3    False 4    False Name: Sales, dtype: bool 
like image 181
jezrael Avatar answered Nov 07 '22 08:11

jezrael


Using groupby you could split into two dataframes like

In [1047]: df1, df2 = [x for _, x in df.groupby(df['Sales'] < 30)]  In [1048]: df1 Out[1048]:    A  Sales 2  7     30 3  6     40 4  1     50  In [1049]: df2 Out[1049]:    A  Sales 0  3     10 1  4     20 
like image 45
Zero Avatar answered Nov 07 '22 07:11

Zero