I have a table called LOAN containing column named RATE in which the observations are given in percentage for example 14.49% how can i format the table so that all value in rate are edited and % is removed from the entries so that i can use plot function on it .I tried using strsplit.
strsplit(LOAN$RATE,"%")
but got error non character argument
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions.
Using 'str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.
Use str_replace() method from stringr package to replace part of a column string with another string in R DataFrame.
This can be achieved using the mutate
verb from the tidyverse
package. Which in my opinion is more readable.
So, to exemplify this, I create a dataset called LOAN
with a focus on the RATE
to mimic the problem above.
library(tidyverse)
LOAN <- data.frame("SN" = 1:4, "Age" = c(21,47,68,33),
"Name" = c("John", "Dora", "Ali", "Marvin"),
"RATE" = c('16%', "24.5%", "27.81%", "22.11%"),
stringsAsFactors = FALSE)
head(LOAN)
SN Age Name RATE
1 1 21 John 16%
2 2 47 Dora 24.5%
3 3 68 Ali 27.81%
4 4 33 Marvin 22.11%
In what follows, mutate
allows one to alter the column content, gsub
does the desired substitution (of %
with ""
) and as.numeric()
converts the RATE
column to numeric
value, keeping the data cleaning flow followable.
LOAN <- LOAN %>% mutate(RATE = as.numeric(gsub("%", "", RATE)))
head(LOAN)
SN Age Name RATE
1 1 21 John 16.00
2 2 47 Dora 24.50
3 3 68 Ali 27.81
4 4 33 Marvin 22.11
Items that appear to be character when printed but for which R thinks otherwise are generally factor classes objects. I'm also guessing that you are not going to be happy with the list output that strsplit will return. Try:
gsub( "%", "", as.character(LOAN$RATE) n)
Factors which are appear numeric can be a source of confusion as well:
> factor("14.9%")
[1] 14.9%
Levels: 14.9%
> as.character(factor("14.9%"))
[1] "14.9%"
> gsub("%", "", as.character(factor("14.9%")) )
[1] "14.9"
This is especially confusing since print.data.frame removes the quotes:
> data.frame(z=factor("14.9%"), zz=factor(14.9))
z zz
1 14.9% 14.9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With