I have a dataframe <code>df</code> and it looks like this: <pre class="prettyprint"><code> id Type agent_id created_at 0 44525 Stunning 6 bedroom villa in New Delhi 184 2018-03-09 1 44859 Villa for sale in Amritsar 182 2017-02-19 2 45465 House in Faridabad 154 2017-04-17 3 50685 5 Hectre land near New Delhi 113 2017-09-01 4 130728 Duplex in Mumbai 157 2017-02-07 5 130856 Large plot with fantastic views in Mumbai 137 2018-01-16 6 130857 Modern Design Penthouse in Bangalore 199 2017-03-24 </code></pre> I've this tabular data and I'm trying to clean this data by extracting keywords from the column and hence create a new dataframe with new columns. <pre class="prettyprint"><code>Apartment = ['apartment', 'penthouse', 'duplex'] House = ['house', 'villa', 'country estate'] Plot = ['plot', 'land'] Location = ['New Delhi','Mumbai','Bangalore','Amritsar'] </code></pre> So the desired dataframe shoul look like this: <pre class="prettyprint"><code> id Type Location agent_id created_at 0 44525 House New Delhi 184 2018-03-09 1 44859 House Amritsar 182 2017-02-19 2 45465 House Faridabad 154 2017-04-17 3 50685 Plot New Delhi 113 2017-09-01 4 130728 Apartment Mumbai 157 2017-02-07 5 130856 Plot Mumbai 137 2018-01-16 6 130857 Apartment Bangalore 199 2017-03-24 </code></pre> So till now i've tried this: <pre class="prettyprint"><code>import pandas as pd df = pd.read_csv('test_data.csv') #i can extract these keywords one by one by using for loops but how #can i do this work in pandas with minimum possible line of code. for index, values in df.type.iteritems(): for i in Apartment: if i in values: print(i) df_new = pd. Dataframe(df['id']) </code></pre> Can someone tell me how to solve this?

First create <code>Location</code> column by <code>str.extract</code> with <code>|</code> for regex <code>OR</code>: <pre class="prettyprint"><code>pat = '|'.join(r"\b{}\b".format(x) for x in Location) df['Location'] = df['Type'].str.extract('('+ pat + ')', expand=False) </code></pre> Then create dictionary from another <code>list</code>s, swap keys with values and in loop set value by mask with <code>str.contains</code> and parameter <code>case=False</code>: <pre class="prettyprint"><code>d = {'Apartment' : Apartment, 'House' : House, 'Plot' : Plot} d1 = {k: oldk for oldk, oldv in d.items() for k in oldv} for k, v in d1.items(): df.loc[df['Type'].str.contains(k, case=False), 'Type'] = v print (df) id Type agent_id created_at Location 0 44525 House 184 2018-03-09 New Delhi 1 44859 House 182 2017-02-19 Amritsar 2 45465 House 154 2017-04-17 NaN 3 50685 Plot 113 2017-09-01 New Delhi 4 130728 Apartment 157 2017-02-07 Mumbai 5 130856 Plot 137 2018-01-16 Mumbai 6 130857 Apartment 199 2017-03-24 Bangalore </code></pre>

How to extract a keyword(string) from a column in pandas dataframe in python

Tags:

list

python-3.x

pandas

dataframe

keyword

I have a dataframe df and it looks like this:

         id                        Type                        agent_id  created_at
0       44525   Stunning 6 bedroom villa in New Delhi               184  2018-03-09
1       44859   Villa for sale in Amritsar                          182  2017-02-19
2       45465   House in Faridabad                                  154  2017-04-17
3       50685   5 Hectre land near New Delhi                        113  2017-09-01
4      130728   Duplex in Mumbai                                    157  2017-02-07
5      130856   Large plot with fantastic views in Mumbai           137  2018-01-16
6      130857   Modern Design Penthouse in Bangalore                199  2017-03-24

I've this tabular data and I'm trying to clean this data by extracting keywords from the column and hence create a new dataframe with new columns.

Apartment  = ['apartment', 'penthouse', 'duplex']
House      = ['house', 'villa', 'country estate']
Plot       = ['plot', 'land']
Location   = ['New Delhi','Mumbai','Bangalore','Amritsar']

So the desired dataframe shoul look like this:

         id      Type        Location    agent_id  created_at
0       44525   House       New Delhi         184  2018-03-09
1       44859   House        Amritsar         182  2017-02-19
2       45465   House       Faridabad         154  2017-04-17
3       50685   Plot        New Delhi         113  2017-09-01
4      130728   Apartment      Mumbai         157  2017-02-07
5      130856   Plot           Mumbai         137  2018-01-16
6      130857   Apartment   Bangalore         199  2017-03-24

So till now i've tried this:

import pandas as pd
df = pd.read_csv('test_data.csv')

#i can extract these keywords one by one by using for loops but how
#can i do this work in pandas with minimum possible line of code.

for index, values in df.type.iteritems():
  for i in Apartment:
     if i in values:
         print(i)

df_new = pd. Dataframe(df['id'])

Can someone tell me how to solve this?

734

asked Jan 30 '19 12:01

astroluv

1 Answers

First create Location column by str.extract with | for regex OR:

pat = '|'.join(r"\b{}\b".format(x) for x in Location)
df['Location'] = df['Type'].str.extract('('+ pat + ')', expand=False)

Then create dictionary from another lists, swap keys with values and in loop set value by mask with str.contains and parameter case=False:

d = {'Apartment' : Apartment,
     'House' : House,
     'Plot' : Plot}

d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}

for k, v in d1.items():
    df.loc[df['Type'].str.contains(k, case=False), 'Type'] = v

print (df)
       id       Type  agent_id  created_at   Location
0   44525      House       184  2018-03-09  New Delhi
1   44859      House       182  2017-02-19   Amritsar
2   45465      House       154  2017-04-17        NaN
3   50685       Plot       113  2017-09-01  New Delhi
4  130728  Apartment       157  2017-02-07     Mumbai
5  130856       Plot       137  2018-01-16     Mumbai
6  130857  Apartment       199  2017-03-24  Bangalore

138

answered Oct 23 '22 15:10

jezrael

Related questions
                            
                                PYQT Draw selection rectangle over picture
                            
                                Using click library in jupyter notebook cell
                            
                                Pass encoding parameter to cx_oracle from sqlalchemy
                            
                                How to on Import PEP8 the Package
                            
                                Heroku deployment error: The requested API endpoint was not found
                            
                                Move to searched text on active screen with pyautogui
                            
                                Cryptacular is broken
                            
                                Python: Remove duplicates for a specific item from list
                            
                                Upgrading SQLite3 version used in python3 on linux?
                            
                                Convert Pandas DataFrame to & from In-Memory Feather
                            
                                How to give different names to ThreadPoolExecutor threads in Python
                            
                                How to get the current locale's alphabet in Python 3?
                            
                                How to import object from builtins affecting just one class?
                            
                                How to create a type that is closed under inherited operations?
                            
                                How to create an abstract subclass of a concrete superclass in Python 3?
                            
                                Permission Check Discord.py Bot
                            
                                Multiprocessing large XML file with shared memory complex objects
                            
                                Fill missing value by averaging previous row value
                            
                                Structure of package that can also be run as command line script
                            
                                How to modify/override inherited class function in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With