Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a new data frame based on conditions from another data frame

Just getting into Python, so hopefully I'm not asking a stupid question here...

So I have a pandas dataframe named "df_complete' with let's say 100 rows, and containing columns named: "type", "writer", "status", 'col a', 'col c'. I want to create/update a new dataframe named "temp_df" and create it based on conditions using "df_complete" values.

temp_df = pandas.DataFrame()

if ((df_complete['type'] == 'NDD') & (df_complete['writer'] == 'Mary') & (df_complete['status'] != '7')):
    temp_df['col A'] = df_complete['col a']
    temp_df['col B'] = 'good'
    temp_df['col C'] = df_complete['col c']

However, when I do this, I got the following error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I read this thread and changed my "and" to "&": Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I also read this thread here to put everything in parenthesis: comparing dtyped [float64] array with a scalar of type [bool] in Pandas DataFrame

But the error is still present. What is causing this? and how can I fix it?

** follow up question ** Also, how can I obtain the index values of those rows that met the condition?

like image 312
alwaysaskingquestions Avatar asked Nov 06 '16 22:11

alwaysaskingquestions


People also ask

How do you create a DataFrame using conditions?

You can create a conditional DataFrame column by checking multiple columns using numpy. select() function. The select() function is more capable than the previous methods. We can use it to give a set of conditions and a set of values.

How do you link two data frames together?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

How do I replace a value in a data frame from another?

Pandas DataFrame replace() MethodThe replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.

How to add a new column to The Dataframe based on condition?

As we can see in the output, we have successfully added a new column to the dataframe based on some condition. Solution #3 : We can use DataFrame.map () function to achieve the goal. It is a very straight forward method where we use a dictionary to simply map values to the newly added column based on the key.

How to create a Dataframe based on a condition in pandas?

We could also use pandas.Series.map () to create new DataFrame columns based on a given condition in Pandas. This method is applied elementwise for Series and maps values from one column to the other based on the input that could be a dictionary, function, or Series.

What is the difference between Old Dataframe and New Dataframe?

Notice that this new DataFrame only contains the points and column from the old DataFrame. Notice that this new DataFrame contains all of the columns from the original DataFrame except the points column.

How to build up a new data frame based on column names?

Example 1 shows how to build up a new data frame based on the column names of another data frame. To accomplish this, we can use square brackets and the c () function as shown below: Table 2 shows the output of the previous R syntax: A subset of our input data frame that was initialized based on the column names of the input data frame.


1 Answers

I think you need boolean indexing with loc for selecting only columns col a and col c:

temp_df = df_complete.loc[(df_complete['type'] == 'NDD') & 
                         (df_complete['writer'] == 'Mary') & 
                         (df_complete['status'] != '7'), ['col a','col c']]
#rename columns
temp_df = temp_df.rename(columns={'col a':'col A','col c':'col C'})
#add new column 
temp_df['col B'] = 'good'
#reorder columns
temp_df = temp_df[['col A','col B','col C']]

Sample:

df_complete = pd.DataFrame({'type':  ['NDD','NDD','NT'],
                            'writer':['Mary','Mary','John'],
                            'status':['4','5','6'],
                            'col a': [1,3,5],
                            'col b': [5,3,6],
                            'col c': [7,4,3]}, index=[3,4,5])

print (df_complete)
   col a  col b  col c status type writer
3      1      5      7      4  NDD   Mary
4      3      3      4      5  NDD   Mary
5      5      6      3      6   NT   John

temp_df = df_complete.loc[(df_complete['type'] == 'NDD') & 
                         (df_complete['writer'] == 'Mary') & 
                         (df_complete['status'] != '7'), ['col a','col c']]

print (temp_df)  
   col a  col c
3      1      7
4      3      4

temp_df = temp_df.rename(columns={'col a':'col A','col c':'col C'})
#add new column 
temp_df['col B'] = 'good'
#reorder columns
temp_df = temp_df[['col A','col B','col C']]
print (temp_df)  
   col A col B  col C
3      1  good      7
4      3  good      4
like image 148
jezrael Avatar answered Oct 28 '22 10:10

jezrael