Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there something in R for automatic conversion of column( of data frame or table) into its original vector type

Actually i am worrying about the data, how it comes, in different vector types. Some columns are originally of type integer or numeric but are displayed as character type.

If i read a data frame by read.csv(), it guesses which type of vectors and automatically converts them. I could not find the same with fread() and data.table(). The data is attached here

structure(list(V1 = c("1", "2", "3", "4", "5", "6"), ID = c("109", 
"110", "111", "112", "113", "114"), SignalIntensity = c(7.58043495940162, 
11.2698560261255, 8.60063586764357, 9.54355755391806, 10.1812351379984, 
8.11689493952339), SNR = c(1.34218273720186, 9.75097840763912, 
1.80485348504829, 3.20137685049428, 4.64599368338536, 1.42263609838542
)), .Names = c("V1", "ID", "SignalIntensity", "SNR"), row.names = c(NA, 
6L), class = "data.frame")

when i read a data frame with read.csv()

str(df)

data.frame':    20469 obs. of  4 variables:
 $ X              : int  1 2 3 4 5 6 7 8 9 10 ...
 $ ID             : int  109 110 111 112 113 114 116 117 118 119 ...
 $ SignalIntensity: num  6.18 10.17 7.29 8.9 9.59 ...
 $ SNR            : num  0.845 4.384 1.073 2.319 3.713 ...

Same data frame read by fread() and read.table()

'data.frame':   20469 obs. of  4 variables:
 $ V1             : chr  "1" "2" "3" "4" ...
 $ ID             : chr  "109" "110" "111" "112" ...
 $ SignalIntensity: num  6.18 10.17 7.29 8.9 9.59 ...
 $ SNR            : num  0.845 4.384 1.073 2.319 3.713 ...


read.table()
'data.frame':   20470 obs. of  2 variables:
 $ V1: int  NA 1 2 3 4 5 6 7 8 9 ...
 $ V2: chr  ",\"ID\",\"SignalIntensity\",\"SNR\"" ",\"109\",6.18230893141024,0.845357691456258" ",\"110\",10.1727771385494,4.38370775906105" ",\"111\",7.29227469267823,1.07257511609212" ...

I would like to know anything that takes all this overhead of missing original vector types of data. Any automatic conversion other than read.csv()??

Edit: fread(....,verbose=TRUE)

Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000949 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Using line 30 to detect sep (the last non blank line in the first 'autostart') ... sep=','
Found 4 columns
First row with 4 fields occurs on line 1 (either column names or first row of data)
All the fields on line 1 are character fields. Treating as the column names.
Count of eol after first data row: 20470
Subtracted 1 for last eol and any trailing empty lines, leaving 20469 data rows
Type codes (   first 5 rows): 4433
Type codes (+ middle 5 rows): 4433
Type codes (+   last 5 rows): 4433
Type codes: 4433 (after applying colClasses and integer64)
Type codes: 4433 (after applying drop or select (if supplied)
Allocating 4 column slots (4 - 0 dropped)
   0.001s (  2%) Memory map (rerun may be quicker)
   0.000s (  1%) sep and header detection
   0.004s ( 12%) Count rows (wc -l)
   0.001s (  2%) Column type detection (first, middle and last 5 rows)
   0.000s (  0%) Allocation of 20469x4 result (xMB) in RAM
   0.025s ( 82%) Reading data
   0.000s (  0%) Allocation for type bumps (if any), including gc time if triggered
   0.000s (  0%) Coercing data already read in type bumps (if any)
   0.000s (  0%) Changing na.strings to NA
   0.030s        Total
like image 687
Agaz Wani Avatar asked May 12 '15 13:05

Agaz Wani


1 Answers

It seems like there is some bug(?) in fread with setting colClasses (I'll wait for a response from @Arun). In a meanwhile, you can fix this using type.convert after reading the data while reassigning the columns by reference

indx <- which(sapply(df, is.character))
df[, (indx) := lapply(.SD, type.convert), .SDcols = indx]
str(df)
# Classes ‘data.table’ and 'data.frame':  6 obs. of  4 variables:
# $ V1             : int  1 2 3 4 5 6
# $ ID             : int  109 110 111 112 113 114
# $ SignalIntensity: num  7.58 11.27 8.6 9.54 10.18 ...
# $ SNR            : num  1.34 9.75 1.8 3.2 4.65 ...
# - attr(*, ".internal.selfref")=<externalptr> 
like image 132
David Arenburg Avatar answered Oct 17 '22 00:10

David Arenburg