Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read csv file in R where some values contain the percent symbol (%)

Tags:

r

csv

Is there a clean/automatic way to convert CSV values formatted with as percents (with trailing % symbol) in R?

Here is some example data:

actual,simulated,percent error
2.1496,8.6066,-300%
0.9170,8.0266,-775%
7.9406,0.2152,97%
4.9637,3.5237,29%

Which can be read using:

junk = read.csv("Example.csv")

But all of the % columns are read as strings and converted to factors:

> str(junk)
 'data.frame':  4 obs. of  3 variables:
 $ actual       : num  2.15 0.917 7.941 4.964
 $ simulated    : num  8.607 8.027 0.215 3.524
 $ percent.error: Factor w/ 4 levels "-300%","-775%",..: 1 2 4 3

but I would like them to be numeric values.

Is there an additional parameter for read.csv? Is there a way to easily post process the needed columns to convert to numeric values? Other solutions?

Note: of course in this example I could simply recompute the values, but in my real application with a larger data file this is not practical.

like image 630
Bryan P Avatar asked Jan 02 '14 22:01

Bryan P


People also ask

Can R read percentage?

Working with percentages in R can be a little tricky, but it's easy to change it to an integer, or numeric, and run the right statistics on it. Such as quartiles and mean and not frequencies. Essentially you are using the sub function and substituting the “%” for a blank. You don't lose any decimals either!

How do I read a CSV file in delimiter in R?

Read CSV with Custom Delimiter using sep Argument By default read. csv() function uses a comma delimiter however, you can use any custom delimiter by using sep argument. For example, use sep='|' to read a CSV file with data separated by a pipe, for tab use sep='\t' .


1 Answers

There is no "percentage" type in R. So you need to do some post-processing:

DF <- read.table(text="actual,simulated,percent error
2.1496,8.6066,-300%
0.9170,8.0266,-775%
7.9406,0.2152,97%
4.9637,3.5237,29%", sep=",", header=TRUE)

DF[,3] <- as.numeric(gsub("%", "",DF[,3]))/100

#  actual simulated percent.error
#1 2.1496    8.6066         -3.00
#2 0.9170    8.0266         -7.75
#3 7.9406    0.2152          0.97
#4 4.9637    3.5237          0.29
like image 128
Roland Avatar answered Sep 18 '22 10:09

Roland