I have a dataframe with various columns, Some of the data within some columns contain double quotes, I want to remove these, for eg:
ID name value1 value2
"1 x a,"b,"c x"
"2 y d,"r" z"
I want this to look like this:
ID name value1 value2
1 x a,b,c x
2 y d,r z
You can use DataFrame. select_dtypes to select string columns and then apply function str. strip .
You can either use R base function gsub() or use str_replace() from stringr package to remove characters from a string or text. In this article, I will explain how to remove a single character or multiple characters from a String in R by using gsub() and str_replace() functions.
I would use lapply
to loop over the columns and then replace the "
using gsub
.
df1[] <- lapply(df1, gsub, pattern='"', replacement='')
df1
# ID name value1 value2
#1 1 x a,b,c x
#2 2 y d,r z
and if need the class
can be changed with type.convert
df1[] <- lapply(df1, type.convert)
df1 <- structure(list(ID = c("\"1", "\"2"), name = c("x", "y"),
value1 = c("a,\"b,\"c",
"d,\"r\""), value2 = c("x\"", "z\"")), .Names = c("ID", "name",
"value1", "value2"), class = "data.frame", row.names = c(NA, -2L))
One option would be to use apply()
along with the gsub()
function to remove all double quotation marks:
df <- data.frame(ID=c("\"1", "\"2"),
name=c("x", "y"),
value1=c("a,\"b,\"c", "d,\"r\""),
value2=c("x\"", "z\""))
df <- data.frame(apply(df, 2, function(x) {
x <- gsub("\"", "", x)
})
> df
ID name value1 value2
1 1 x a,b,c x
2 2 y d,r z
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With