Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String replace character with backslash and double quote in a column of an R dataframe

I have this dataframe:

df <- data.frame(
  ID = c("1", "2", "3"),
  option_json = c('{"thickness":"0.031 inches","tensile strength":"600 lb","size":"0.5 Inches x 7200 Feet"}', '{"thickness":"0.031 inches","tensile strength":"600 lb","size":"0.5 Inches x 7200 Feet"}' , '{"tensile strength":"600 lb","color":"Black","size":"0.5 Inches x 7200 Feet"}'))
  ID                                                                              option_json
1  1 {"thickness":"0.031 inches","tensile strength":"600 lb","size":"0.5 Inches x 7200 Feet"}
2  2 {"thickness":"0.031 inches","tensile strength":"600 lb","size":"0.5 Inches x 7200 Feet"}
3  3            {"tensile strength":"600 lb","color":"Black","size":"0.5 Inches x 7200 Feet"}

I want this dataframe:

  ID                                                                option_json
1  1 {"thickness":"0.031\"","tensile strength":"600 lb","size":"0.5\" x 7200'"}
2  2 {"thickness":"0.031\"","tensile strength":"600 lb","size":"0.5\" x 7200'"}
3  3       {"tensile strength":"600 lb","color":"Black","size":"0.5\" x 7200'"}

I tried using str_replace and gsub to replace the inches but I keep getting double backslashes behind the double quote. Not sure how to just string replace with just a single backslash.

like image 790
Chris Avatar asked Sep 18 '25 02:09

Chris


1 Answers

I think in R, they will always print two backslashes together if one of them are escaped. When two backslashes are shown together, it is only a syntax to show that these should be interpreted as a character "\" but not an escape character.

To confirm that, you can try to save your dataframe to a text file, you will see that there's actually only one backslash in the string.

df <- df %>% mutate(option_json = gsub(" inches", '\\\\"', option_json, ignore.case = T) %>% 
                      gsub(" Feet", "\\'", ., ignore.case = T))

write.table(df, "df.tsv", quote = F, row.names = F)

Output copied from "df.tsv"

ID option_json
1 {"thickness":"0.031\"","tensile strength":"600 lb","size":"0.5\" x 7200'"}
2 {"thickness":"0.031\"","tensile strength":"600 lb","size":"0.5\" x 7200'"}
3 {"tensile strength":"600 lb","color":"Black","size":"0.5\" x 7200'"}

Try printing the "option_json" column

You can see that before every double quote " character, there is a escape character \. And \\ is used to indicate a single \ character.

print(df$option_json)
[1] "{\"thickness\":\"0.031\\\"\",\"tensile strength\":\"600 lb\",\"size\":\"0.5\\\" x 7200'\"}"
[2] "{\"thickness\":\"0.031\\\"\",\"tensile strength\":\"600 lb\",\"size\":\"0.5\\\" x 7200'\"}"
[3] "{\"tensile strength\":\"600 lb\",\"color\":\"Black\",\"size\":\"0.5\\\" x 7200'\"}"    
like image 192
benson23 Avatar answered Sep 19 '25 18:09

benson23