I have 2 dataframes - one is a data source dataframe and another is reference dataframe. I want to create an additional column in df1 based on the comparison of those 2 dataframes df1 - data source <pre class="prettyprint"><code>No | Name 213344 | Apple 242342 | Orange 234234 | Pineapple </code></pre> df2 - reference table <pre class="prettyprint"><code>RGE_FROM | RGE_TO | Value 2100 | 2190 | Sweet 2200 | 2322 | Bitter 2400 | 5000 | Neutral </code></pre> final if first 4 character of df1.No fall between the range of df2.RGE_FROM to df2.RGE_TO, get df2.Value for the derived column df.DESC. else, blank <pre class="prettyprint"><code>No | Name | DESC 213344 | Apple | Sweet 242342 | Orange | Natural 234234 | Pineapple | </code></pre> Any help is appreciated! Thank you!

We can create an <code>IntervalIndex</code> from the columns <code>RGE_FROM</code> and <code>RGE_TO</code>, then set this as an index of column <code>Value</code> to create a mapping series, then slice the first four characters in the column <code>No</code> and using <code>Series.map</code> substitute the values from the mapping series. <pre class="prettyprint"><code>i = pd.IntervalIndex.from_arrays(df2['RGE_FROM'], df2['RGE_TO'], closed='both') df1['Value'] = df1['No'].astype(str).str[:4].astype(int).map(df2.set_index(i)['Value']) </code></pre> <hr> <pre class="prettyprint"><code> No Name Value 0 213344 Apple Sweet 1 242342 Orange Neutral 2 234234 Pineapple NaN </code></pre>

Creating a derived field based on df value comparison in python pandas

Tags:

python-3.x

pandas

dataframe

calculated-columns

I have 2 dataframes - one is a data source dataframe and another is reference dataframe. I want to create an additional column in df1 based on the comparison of those 2 dataframes

df1 - data source

No | Name
213344 | Apple
242342 | Orange
234234 | Pineapple

df2 - reference table

RGE_FROM | RGE_TO | Value
2100 | 2190 | Sweet
2200 | 2322 | Bitter
2400 | 5000 | Neutral

final if first 4 character of df1.No fall between the range of df2.RGE_FROM to df2.RGE_TO, get df2.Value for the derived column df.DESC. else, blank

No | Name | DESC
213344 | Apple | Sweet
242342 | Orange | Natural
234234 | Pineapple |

Any help is appreciated! Thank you!

890

asked May 04 '21 13:05

yuzu

Video Answer

1 Answers

We can create an IntervalIndex from the columns RGE_FROM and RGE_TO, then set this as an index of column Value to create a mapping series, then slice the first four characters in the column No and using Series.map substitute the values from the mapping series.

i =  pd.IntervalIndex.from_arrays(df2['RGE_FROM'], df2['RGE_TO'], closed='both')
df1['Value'] = df1['No'].astype(str).str[:4].astype(int).map(df2.set_index(i)['Value'])

       No       Name    Value
0  213344      Apple    Sweet
1  242342     Orange  Neutral
2  234234  Pineapple      NaN

answered Oct 16 '22 18:10

Shubham Sharma

Related questions
                            
                                How to enforce dataclass fields' types? [duplicate]
                            
                                random.SystemRandom().choice() vs random.choice()
                            
                                Scrapy - Extract Data from mutliple pages
                            
                                Boto3 Session "The config profile () could not be found"
                            
                                Setting dynamic folder and report name in pytest
                            
                                How do I SELECT WHERE IN VALUES with tuples in Python sqlite3?
                            
                                Kernel error after updating Spyder in anaconda [duplicate]
                            
                                In which file is a specified dataframe's attribution definition, such as columns, located?
                            
                                Graphene Graphql - how to chain mutations
                            
                                Difference between virtualGraph and pipelineStage Graphcore's PopART/Poplar libraries
                            
                                Installing pyttsx3 on Linux Mint
                            
                                Python's new `functools.cached_property` bug or limitation?
                            
                                How to autoremove dependent Python packages within a pipenv when uninstalling a package?
                            
                                PyTube3 Playlist returns empty list
                            
                                Auto place annotation bubble
                            
                                Element not interactable in Selenium Chrome Headless Mode
                            
                                how to give tuple via command line in python
                            
                                Copying a section of a string from one column and putting it into a new pandas column
                            
                                Unable to code for non-squares integers in Python
                            
                                How to get next available object or primary key from database in django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With