Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading csv file, having numbers and strings in one column

Tags:

r

csv

I am importing a 3 column CSV file. The final column is a series of entries which are either an integer, or a string in quotation marks.

Here are a series of example entries:

1,4,"m"
1,5,20
1,6,"Canada"
1,7,4
1,8,5

When I import this using read.csv, these are all just turned in to factors.

How can I set it up such that these are read as integers and strings?

Thank you!

like image 425
evt Avatar asked Oct 30 '11 19:10

evt


2 Answers

This is not possible, since a given vector can only have a single mode (e.g. character, numeric, or logical).

However, you could split the vector into two separate vectors, one with numeric values and the second with character values:

vec <- c("m", 20, "Canada", 4, 5)

vnum <- as.numeric(vec)
vchar <- ifelse(is.na(vnum), vec, NA)

vnum
[1] NA 20 NA  4  5

vchar
[1] "m"      NA       "Canada" NA       NA      
like image 98
Andrie Avatar answered Oct 12 '22 15:10

Andrie


EDIT Despite the OP's decision to accept this answer, @Andrie's answer is the preferred solution. My answer is meant only to inform about some odd features of data frames.

As others have pointed out, the short answer is that this isn't possible. data.frames are intended to contain columns of a single atomic type. @Andrie's suggestion is a good one, but just for kicks I thought I'd point out a way to shoehorn this type of data into a data.frame.

You can convert the offending column to a list (this code assumes you've set options(stringsAsFactors = FALSE)):

dat <- read.table(textConnection("1,4,'m'
1,5,20
1,6,'Canada'
1,7,4
1,8,5"),header = FALSE,sep = ",")

tmp <- as.list(as.numeric(dat$V3))
tmp[c(1,3)] <- dat$V3[c(1,3)]
dat$V3 <- tmp

str(dat)
'data.frame':   5 obs. of  3 variables:
 $ V1: int  1 1 1 1 1
 $ V2: int  4 5 6 7 8
 $ V3:List of 5
  ..$ : chr "m"
  ..$ : num 20
  ..$ : chr "Canada"
  ..$ : num 4
  ..$ : num 5

Now, there are all sorts of reasons why this is a bad idea. For one, lots of code that you'd expect to play nicely with data.frames will not like this and either fail, or behave very strangely. But I thought I'd point it out as a curiosity.

like image 39
joran Avatar answered Oct 12 '22 14:10

joran