Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a derived field based on df value comparison in python pandas

I have 2 dataframes - one is a data source dataframe and another is reference dataframe. I want to create an additional column in df1 based on the comparison of those 2 dataframes

df1 - data source

No | Name
213344 | Apple
242342 | Orange
234234 | Pineapple

df2 - reference table

RGE_FROM | RGE_TO | Value
2100 | 2190 | Sweet
2200 | 2322 | Bitter
2400 | 5000 | Neutral

final if first 4 character of df1.No fall between the range of df2.RGE_FROM to df2.RGE_TO, get df2.Value for the derived column df.DESC. else, blank

No | Name | DESC
213344 | Apple | Sweet
242342 | Orange | Natural
234234 | Pineapple | 

Any help is appreciated! Thank you!

like image 890
yuzu Avatar asked May 04 '21 13:05

yuzu


People also ask

How do you populate a column based on two columns values in a DataFrame?

Create New Columns in Pandas DataFrame Based on the Values of Other Columns Using the DataFrame. apply() Method. It applies the lambda function defined in the apply() method to each row of the DataFrame items_df and finally assigns the series of results to the Final Price column of the DataFrame items_df .

What is Dataframe in pandas?

A DataFrame is a 2D structure composed of rows and columns, and where data is stored into a tubular form. It is mutable in terms of size, and heterogeneous tabular data. Arithmetic operations can also be performed on both row and column labels. To know more about the creation of Pandas DataFrame.

How to apply a method over an existing column in pandas Dataframe?

If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas.DataFrame.apply () method should do the trick. For example, you can define your own method and then pass it to the apply () method.

How to add a new column ‘price’ to a Python Dataframe?

Solution #1 : We can use Python’s list comprehension technique to achieve this task. List comprehension is mostly faster than other methods. Now we will add a new column called ‘Price’ to the dataframe. For that purpose, we will use list comprehension technique. Set the price to 1500 if the ‘Event’ is ‘Music’ else 800.

How to add a new column to The Dataframe based on condition?

As we can see in the output, we have successfully added a new column to the dataframe based on some condition. Solution #3 : We can use DataFrame.map () function to achieve the goal. It is a very straight forward method where we use a dictionary to simply map values to the newly added column based on the key.


Video Answer


1 Answers

We can create an IntervalIndex from the columns RGE_FROM and RGE_TO, then set this as an index of column Value to create a mapping series, then slice the first four characters in the column No and using Series.map substitute the values from the mapping series.

i =  pd.IntervalIndex.from_arrays(df2['RGE_FROM'], df2['RGE_TO'], closed='both')
df1['Value'] = df1['No'].astype(str).str[:4].astype(int).map(df2.set_index(i)['Value'])

       No       Name    Value
0  213344      Apple    Sweet
1  242342     Orange  Neutral
2  234234  Pineapple      NaN
like image 58
Shubham Sharma Avatar answered Oct 16 '22 18:10

Shubham Sharma