Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading text file with multiple space as delimiter in R

I have big data set which consist of around 94 columns and 3 Million rows. This file have single as well as multiple spaces as delimiter between columns. I need to read some columns from this file in R. For this I tried using read.table() with options which can be seen in the code below, the code is pasted below-

### Defining the columns to be read from the file, the first 5 column, then we do not read next 24, after this we read next 5 columns. Last 60 columns are not read in-      col_classes = c(rep("character",2), rep("numeric", 3), rep("NULL",24), rep("numeric", 5), rep("NULL", 60))     ### Reading first 100 rows of the data      data <- read.table(file, sep = " ",header = F, nrows = 100, na.strings ="", stringsAsFactors= F) 

Since, the file which has to read in have more than one space as the delimiter between some of the column, the above method does not work. Is there any method using which we can read in this file efficiently.

like image 800
Pawan Avatar asked Jun 07 '13 08:06

Pawan


People also ask

How do you read a space separated text file in R?

The “sep” argument is used to specify the delimiter of the text file cell values. The “sep” argument of the data. table can also be used to read a text file containing data with single or multiple spaces as delimiters.

Can R Read spaces?

The R programming language knows by default how to handle multiple spaces. The previous table shows that our example data consists of four rows and three columns.

What is a delimited file in R?

In delimited data, data fields are separated by characters to indicate a structure of columns and rows. This is commonly used to portray data in an unbiased fashion. Any character can be used as a delimiter; however, the comma, tab, and colon are the most widely used, and such data files can be read in R as follows.


1 Answers

You need to change your delimiter. " " refers to one whitespace character. "" refers to any length whitespace as being the delimiter

 data <- read.table(file, sep = "" , header = F , nrows = 100,                      na.strings ="", stringsAsFactors= F) 

From the manual:

If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns.

Also, with a large datafile you may want to consider data.table:::fread to quickly read data straight into a data.table. I was myself using this function this morning. It is still experimental, but I find it works very well indeed.

like image 190
Simon O'Hanlon Avatar answered Sep 21 '22 18:09

Simon O'Hanlon