Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading Tab Delimited Data in to R

I am trying to read a large tab delimited file in to R.

First I tried this:

data <- read.table("data.csv", sep="\t")

But it is reading some of the numeric variables in as factors

So I tried to read in the data based on what type I want each variable to be like this:

data <- read.table("data.csv", sep="\t", colClasses=c("character","numeric","numeric","character","boolean","numeric"))

But when I try this it gives me an error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '"4"'

I think it might be that there are quotes around some of the numeric values in the original raw file, but I'm not sure.

like image 705
Ford Avatar asked Jul 26 '12 18:07

Ford


People also ask

How do you read tab separated values?

To read tab-separated values files with Python, we'll take advantage of the fact that they're similar to CSVs. We'll use Python's csv library and tell it to split things up with tabs instead of commas. Just set the delimiter argument to "\t" . That's it!

How do I import a tab into R?

First, as with many things in R, there are many ways of bringing data into your workspace. A flexible way to import data is to click on the Environment tab in the upper right window of RStudio and then click the Import Dataset tab. Multiple file type options are shown, such as text, Excel, SPSS, SAS, and Stata.

How do I read a pipe delimited file in R?

The pipe-delimited text files can be read using the read. table() function in base R.

What is the function used in R to read delimited file?

Example 1: Using read. delim() function to read a space-separated text file. The read. delim() function is used to read delimited text files in the R Language.


1 Answers

Without seeing your data, you have one of a few things: you don't have all tabs separating the data; there are embeded tabs in single observations; or a litnay of others.

The way you can sort this out is to set options(stringsAsFactors=FALSE) then use your first line.

Check out str(data) and try to figure out which rows are the culprits. The reason some of the numeric values are reading as factors is because there is something in that column that R is interpreting as a character and so it coerces the whole column to character. It usually takes some digging but the problem is almost surely with your input file.

This is a common data munging issue, good luck!

like image 167
Justin Avatar answered Sep 23 '22 11:09

Justin