Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace rows of strings in dataframe with corresponding words in other dataframe pandas

Tags:

python

pandas

I have a df which has 1 column

     List
 0   What are you trying to achieve
 1   What is your purpose right here
 2   When students don’t have a proper foundation
 3   I am going to DESCRIBE a sunset

I have other dataframe df2

which has 2 columns

    original       correct
0     are          were
1     sunset       sunrise
2     I            we
3     right        correct
4     is           was

I want to replace such words in my df,which occurs in original column of my df2 and replace with corresponding words in correct column. and store the new strings in other dataframe df_new

Is it possible without using loops and iteration, and only using plain pandas concept?

i.e my df_new should contain.

     List
 0   What were you trying to achieve
 1   What was your purpose correct here
 2   When students don’t have a proper foundation
 3   we am going to DESCRIBE a sunrise

Also this is just a test example, MY df MIGHT CONTAIN millions of rows of string, and so my df2, What would be the most efficient solution path i can go on?

like image 429
Shubham R Avatar asked Feb 09 '17 08:02

Shubham R


People also ask

How do you replace a string with another string in pandas?

You can replace a string in the pandas DataFrame column by using replace(), str. replace() with lambda functions.

How do you replace values in a DataFrame based on a condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do I interchange rows in pandas DataFrame?

Pandas DataFrame: transpose() function The transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied.


1 Answers

One of many possible solutions:

In [371]: boundary = r'\b'
     ...:
     ...: df.List.replace((boundary + df2.orignal + boundary).values.tolist(),
     ...:                 df2.correct.values.tolist(),
     ...:                 regex=True)
     ...:
Out[371]:
0                  What were you trying to achieve
1               What was your purpose correct here
2     When students don’t have a proper foundation
3                we am going to DESCRIBE a sunrise
Name: List, dtype: object
like image 179
MaxU - stop WAR against UA Avatar answered Oct 05 '22 22:10

MaxU - stop WAR against UA