I want to read in a CSV file whose first line is the variable names and subsequent lines are the contents of those variables. Some of the variables are numeric and some of them are text and some are even empty.
file = "path/file.csv"
f = file(file,'r')
varnames = strsplit(readLines(f,1),",")[[1]]
data = strsplit(readLines(f,1),",")[[1]]
Now that data contains all the variables, how do I make it so that data can recognise the data type being read in just like if I did read.csv
.
I need to read the data line by line (or n lines at a time) as the whole dataset is too big to be read into R.
Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python's built-in open() function, which returns a file object.
frames and tibbles by typing vignette("tibble") . read_csv() will always read variables containing text as character variables. In contrast, the base R function read. csv() will, by default, convert any character variable to a factor.
Opening a CSV file is simpler than you may think. In almost any text editor or spreadsheet program, just choose File > Open and select the CSV file. For most people, it is best to use a spreadsheet program. Spreadsheet programs display the data in a way that is easier to read and work with than a text editor.
Based on DWin's comment, you can try something like this:
read.clump <- function(file, lines, clump){
if(clump > 1){
header <- read.csv(file, nrows=1, header=FALSE)
p = read.csv(file, skip = lines*(clump-1),
#p = read.csv(file, skip = (lines*(clump-1))+1 if not a textConnection
nrows = lines, header=FALSE)
names(p) = header
} else {
p = read.csv(file, skip = lines*(clump-1), nrows = lines)
}
return(p)
}
You should probably add some error handling/checking to the function, too.
Then with
x = "letter1, letter2
a, b
c, d
e, f
g, h
i, j
k, l"
>read.clump(textConnection(x), lines = 2, clump = 1)
letter1 letter2
1 a b
2 c d
> read.clump(textConnection(x), lines = 2, clump = 2)
letter1 letter2
1 e f
2 g h
> read.clump(textConnection(x), lines = 3, clump = 1)
letter1 letter2
1 a b
2 c d
3 e f
> read.clump(textConnection(x), lines = 3, clump = 2)
letter1 letter2
1 g h
2 i j
3 k l
Now you just have to *apply over clumps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With