I have the following two dataframes:
>df1<-data.frame(A=c(0,0,0),B=c(0,201,0),C=c(0,467,0))
A B C
1 0 0 1
2 0 201 467
3 0 0 0
>df2<-data.frame(A=c(201,467),B=c('abc','def'))
A B
1 201 abc
2 467 def
I would like to replace the values in df1 using matching "B" values in df2, creating a dataframe that looks like this:
A B C
1 NA NA NA
2 NA abc def
3 NA NA NA
I can accomplish this on a column by column basis using the following code:
>df2$B[match(df1$B,df2$A)]
Unfortunately, I am working with a massive dataset and would therefore prefer to match all of the columns at once. Any help would be much appreciated.
In this article, we will learn how we can replace values of a DataFrame with the value of another DataFrame using pandas. It can be done using the DataFrame. replace() method. It is used to replace a regex, string, list, series, number, dictionary, etc.
Suppose that you want to replace multiple values with multiple new values for an individual DataFrame column. In that case, you may use this template: df['column name'] = df['column name']. replace(['1st old value','2nd old value',...],['1st new value','2nd new value',...])
You can do update a PySpark DataFrame Column using withColum(), select() and sql(), since DataFrame's are distributed immutable collection you can't really change the column values however when you change the value using withColumn() or any approach, PySpark returns a new Dataframe with updated values.
You can do:
df1[] <- setNames(df2$B, df2$A)[as.character(unlist(df1))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With