Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R how do I read a CSV file line by line and have the contents recognised as the correct data type?

Tags:

r

csv

I want to read in a CSV file whose first line is the variable names and subsequent lines are the contents of those variables. Some of the variables are numeric and some of them are text and some are even empty.

file = "path/file.csv"
f = file(file,'r')
varnames = strsplit(readLines(f,1),",")[[1]]
data = strsplit(readLines(f,1),",")[[1]]

Now that data contains all the variables, how do I make it so that data can recognise the data type being read in just like if I did read.csv.

I need to read the data line by line (or n lines at a time) as the whole dataset is too big to be read into R.

like image 874
xiaodai Avatar asked May 25 '11 04:05

xiaodai


People also ask

How do I read a csv file and display its contents?

Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python's built-in open() function, which returns a file object.

What is the difference between read csv and read_csv in R?

frames and tibbles by typing vignette("tibble") . read_csv() will always read variables containing text as character variables. In contrast, the base R function read. csv() will, by default, convert any character variable to a factor.

How do I display data in a CSV file?

Opening a CSV file is simpler than you may think. In almost any text editor or spreadsheet program, just choose File > Open and select the CSV file. For most people, it is best to use a spreadsheet program. Spreadsheet programs display the data in a way that is easier to read and work with than a text editor.


1 Answers

Based on DWin's comment, you can try something like this:

read.clump <- function(file, lines, clump){
    if(clump > 1){
        header <- read.csv(file, nrows=1, header=FALSE)
        p = read.csv(file, skip = lines*(clump-1), 
       #p = read.csv(file, skip = (lines*(clump-1))+1 if not a textConnection           
            nrows = lines, header=FALSE)

        names(p) = header
    } else {
        p = read.csv(file, skip = lines*(clump-1), nrows = lines)
    }
    return(p)
}

You should probably add some error handling/checking to the function, too.

Then with

x = "letter1, letter2
a, b
c, d
e, f
g, h
i, j
k, l"


>read.clump(textConnection(x), lines = 2, clump = 1)
  letter1 letter2
1       a       b
2       c       d

> read.clump(textConnection(x), lines = 2, clump = 2)
  letter1  letter2
1       e        f
2       g        h

> read.clump(textConnection(x), lines = 3, clump = 1)
  letter1 letter2
1       a       b
2       c       d
3       e       f


> read.clump(textConnection(x), lines = 3, clump = 2)
  letter1  letter2
1       g        h
2       i        j
3       k        l

Now you just have to *apply over clumps

like image 126
Greg Avatar answered Oct 12 '22 12:10

Greg