Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a new dataframe based on rows with a certain value

I have a large dataframe of transactions which I want to break into two smaller dataframes based on a certain column ("Type"). If "Type" is "S" then add that entire row to the "cust_sell" dataframe, and if "Type" is "P" to the "cust_buy" dataframe. I am using a for loop, but this is only adding the index value to the dataframe. Any help is appreciated!

from win32com.shell import shell, shellcon
import pandas as pd

filename = (shell.SHGetFolderPath(0, shellcon.CSIDL_PERSONAL, None, 0)) + '\MSRB T-1_test.xlsx'
wb = pd.read_excel(filename, sheet_name='T1-20062017', index_col=0, header=0)
cust_buy = []
cust_sell = []

# Create a list of customer buys and sells separately
for i in wb.index:
    if wb['Type'][i] == 'S':
        cust_sell.append([i])
    elif wb['Type'][i] == 'P':
        cust_buy.append([i])
like image 945
Tom Avatar asked Jun 23 '18 18:06

Tom


People also ask

How do you create a DataFrame from a value?

To create a dataframe, we need to import pandas. Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters. The first one is the data which is to be filled in the dataframe table.

Can we create DataFrame from scalar values?

While creating data frames you might encounter an error “Valueerror if using all scalar values, you must pass an index.” We will look at the reason behind the occurrence of this error and the ways to solve it. This error occurs as Pandas is expecting the data values to be list values or dict values.

How do I create a new DataFrame in Pandas with specific columns?

You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.


1 Answers

You do not need to write loops. You can do it easily with pandas.

Assuming your dataframe looks like this:

import pandas as pd  

mainDf = pd.DataFrame()
mainDf['Type'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
mainDf['Dummy'] = [1, 2, 3, 4, 5, 6, 7, 8]

To create dataframe for S and P types, you can just do this:

cust_sell = mainDf[mainDf.Type == 'S']
cust_buy = mainDf[mainDf.Type == 'P']

cust_sell output:

  Type  Dummy
0    S      1
1    S      2
2    S      3
5    S      6
7    S      8

cust_buy output:

  Type  Dummy
3    P      4
4    P      5
6    P      7
like image 91
Ankur Sinha Avatar answered Sep 30 '22 10:09

Ankur Sinha