Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data frames with mixed data types

I have been using R for a little while, but I am still struggling with factors and data frames. Here's my question.

I am trying to pre-allocate a data frame composed of several columns of different types, as follows:

cb <- data.frame(S=character(1000), I=numeric(1000), A=as.Date(rep(0,1000), origin = "1900-01-01"), SD=as.POSIXct(rep(0,1000), origin = "1900-01-01 00:00:00"), CC=numeric(1000), stringsAsFactors=FALSE)

which gets met the data frame types that I want (output of str(cb)):

'data.frame':   1000 obs. of  5 variables:
 $ S : chr  "" "" "" "" ...
 $ I : num  0 0 0 0 0 0 0 0 0 0 ...
 $ A : Date, format: "1900-01-01" "1900-01-01" "1900-01-01" "1900-01-01" ...
 $ SD: POSIXct, format: "1900-01-01" "1900-01-01" "1900-01-01" "1900-01-01" ...
 $ CC: num  0 0 0 0 0 0 0 0 0 0 ...

When I assign the first item in the data frame, CC and I become characters:

cb[1, ] <- c("ABCD", 4, "2005-12-12", "2008-04-03 20:30", 3)

output of str(cb):

'data.frame':   1000 obs. of  5 variables:
 $ S : chr  "ABCD" "" "" "" ...
 $ I : chr  "4" "0" "0" "0" ...
 $ A : Date, format: "2005-12-12" "1900-01-01" "1900-01-01" "1900-01-01" ...
 $ SD: POSIXct, format: "2008-04-03 20:30:00" "1900-01-01 00:00:00" "1900-01-01 00:00:00" "1900-01-01 00:00:00" ...
 $ CC: chr  "3" "0" "0" "0" ...

which makes it rather unusable for my purposes.

When I omit stringsAsFactors=FALSE in the data.frame definition, I (obviously) get a different error message (having set warn to 2):

Error in `[<-.factor`(`*tmp*`, iseq, value = "ABCD") : 
  (converted from warning) invalid factor level, NAs generated

which I understand but I am not sure how to overcome either.

What am I doing wrong? How can I make sure to keep the numeric type for columns I and SD? Thanks so much for your help.

Cheers

B

like image 372
bdu Avatar asked Apr 15 '13 21:04

bdu


People also ask

Can a data frame contain multiple data types?

A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype .

Can a data frame contain multiple data types in R?

The data in the data frame can be spread across various columns, having different data types.

Can series have multiple data types?

In the overview page of the pandas documentation the Series data structure is described as 'homogeneously-typed'. However it is possible to create Series objects with multiple data-types.

Can we can merge two data frames?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.


1 Answers

You can't mix types in a vector, so your vector is being coerced to character.

R> c("ABCD", 4, "2005-12-12", "2008-04-03 20:30", 3)
[1] "ABCD"             "4"               
[3] "2005-12-12"       "2008-04-03 20:30"
[5] "3"

[<-.data.frame then coerces the numeric columns of your data.frame to character, so the column will be one type; though I find it a bit inconsistent that it doesn't also convert the Date/POSIXt fields to character as well...

You can mix types in a list. This replacement works because data.frames are lists underneath.

cb[1, ] <- list("ABCD", 4, "2005-12-12", "2008-04-03 20:30", 3)

When you look back at your code later, it might make more sense to replace one row of your data.frame with a 1-row data.frame:

cb[1, ] <- data.frame("ABCD", 4, "2005-12-12", "2008-04-03 20:30", 3,
                      stringsAsFactors=FALSE)
like image 193
Joshua Ulrich Avatar answered Oct 01 '22 06:10

Joshua Ulrich