Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: replace values in strings

I have data frame and I try to replace it from other df

I use:

df['term_code'] = df.search_term.map(rep_term.set_index('search_term')['code_action'])

But I get an error:

File "C:/Users/����� �����������/Desktop/projects/find_time_before_buy/graph (2).py", line 36, in <module>
df['term_code'] = df.search_term.map(rep_term.set_index('search_term')['code_action'])
 File "C:\Python27\lib\site-packages\pandas\core\series.py", line 2101, in map
indexer = arg.index.get_indexer(values)
 File "C:\Python27\lib\site-packages\pandas\indexes\base.py", line 2082, in get_indexer
   raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

What should I change? Where search_term is

729948                               None  
729949                               None  
729950                               None  
729951  пансионат джемете отдых 2016 цены  
729952                               None  
729953                               None  
729954                               купить телефон  
729955                               None  
729956                               вк  
729957                               None  
729958                               яндекс  

And rep_term looks like

search_term code_action
авито   6
вк  9
яндекс  12
мтс 7
связной 8
ситилинк    8
like image 455
ldevyataykina Avatar asked Apr 26 '26 20:04

ldevyataykina


1 Answers

There is problem with duplicates in DataFrame rep_term column search_term.

I simulate it:

import pandas as pd

df = pd.DataFrame({'search_term':[1,2,3]})

print (df)
   search_term
0            1
1            2
2            3

For value 1 in search_term you have 2 values in code_action:

rep_term = pd.DataFrame({'search_term':[1,2,1], 'code_action':['ss','dd','gg']})
print (rep_term)
  code_action  search_term
0          ss            1
1          dd            2
2          gg            1


df['term_code'] = df.search_term.map(rep_term.set_index('search_term')['code_action'])
print (df)
#InvalidIndexError: Reindexing only valid with uniquely valued Index objects

So first identify rows where are duplicated vaues by duplicated:

print (rep_term[rep_term.duplicated(subset=['search_term'], keep=False)])
  code_action  search_term
0          ss            1
2          gg            1

Then you can drop duplicity with keeping last or first values by drop_duplicates

rep_term1 = rep_term.drop_duplicates(subset=['search_term'], keep='first')
print (rep_term1)
  code_action  search_term
0          ss            1
1          dd            2

rep_term2 = rep_term.drop_duplicates(subset=['search_term'], keep='last')
print (rep_term2)
  code_action  search_term
1          dd            2
2          gg            1
like image 74
jezrael Avatar answered Apr 28 '26 09:04

jezrael