Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read csv file in R with currency column as numeric

I'm trying to read into R a csv file that contains information on political contributions. From what I understand, the columns by default are imported as factors, but I need the the amount column ('CTRIB_AMT' in the dataset) to be imported as a numeric column so I can run a variety of functions that wouldn't work for factors. The column is formatted as a currency with a "$" as prefix.

I used a simple read command to import the file initially:

contribs <- read.csv('path/to/file')

And then tried to convert the CTRIB_AMT from currency to numeric:

as.numeric(as.character(sub("$","",contribs$CTRIB_AMT, fixed=TRUE)))

But that didn't work. The functions I'm trying to use for the CTRIB_AMT columns are:

vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)

See related question here.

Any thoughts on how to import file initially so column is numeric or how to convert it after importing?

like image 526
tchaymore Avatar asked Sep 07 '11 17:09

tchaymore


People also ask

How do I read a specific column in a CSV file in R?

Method 1: Using read. table() function. In this method of only importing the selected columns of the CSV file data, the user needs to call the read. table() function, which is an in-built function of R programming language, and then passes the selected column in its arguments to import particular columns from the data.

How do I convert integer to numeric in R?

To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.

Why is R reading my numbers as characters?

If your headings are all numbers, for example years, then R will convert them to character strings and prepend “X” to each heading. You can overcome this behaviour and force R to read the headings as they come. Your headings are still converted to character strings but these are easier to coerce to a numeric value.

How do you read CSV file in R?

In order to get a . csv file into R, you can use read. csv, and as the only argument, put the path to the file you want to read in within quotation marks. Ideally, the file should be close to, if not in the same folder as, this script.


2 Answers

I'm not sure how to read it in directly, but you can modify it once it's in:

> A <- read.csv("~/Desktop/data.csv")
> A
  id   desc price
1  0  apple $1.00
2  1 banana $2.25
3  2 grapes $1.97
> A$price <- as.numeric(sub("\\$","", A$price))
> A
  id   desc price
1  0  apple  1.00
2  1 banana  2.25
3  2 grapes  1.97
> str(A)
'data.frame':   3 obs. of  3 variables:
 $ id   : int  0 1 2
 $ desc : Factor w/ 3 levels "apple","banana",..: 1 2 3
 $ price: num  1 2.25 1.97

I think it might just have been a missing escape in your sub. $ indicates the end of a line in regular expressions. \$ is a dollar sign. But then you have to escape the escape...

like image 27
Zach Avatar answered Oct 16 '22 14:10

Zach


Taking advantage of the powerful parsers the readr package offers out of the box:

my_parser <- function(col) {
  # Try first with parse_number that handles currencies automatically quite well
  res <- suppressWarnings(readr::parse_number(col))
  if (is.null(attr(res, "problems", exact = TRUE))) {
    res
  } else {
    # If parse_number fails, fall back on parse_guess
    readr::parse_guess(col)
    # Alternatively, we could simply return col without further parsing attempt
  }
}

library(dplyr)

name <- c('john','carl', 'hank')
salary <- c('$23,456.33','$45,677.43','$76,234.88')
emp_data <- data.frame(name,salary)

emp_data %>% 
  mutate(foo = "USD13.4",
         bar = "£37") %>% 
  mutate_all(my_parser)

#   name   salary  foo bar
# 1 john 23456.33 13.4  37
# 2 carl 45677.43 13.4  37
# 3 hank 76234.88 13.4  37
like image 71
Aurèle Avatar answered Oct 16 '22 14:10

Aurèle