Exceptions to sep = " " when reading table into R? Dealing with whitespace within fields

Question

I need to import a table into R that is separated by spaces. Unfortunately, within some of the fields, there are spaces which cause R to separate into a new row. Is there any way of making those fields 'stick together'?

For example, the table looks like this:

V1    V2    V3    V4
Text  More  0.11  (a)kdfs hdfa ag$
Text  More  1.12  a
Text  More  0.21  v
Text  More  1222  (a)sdfs sdfa->g
Text  More  1232  (a)sdfs sdfa->g

But gets turned into this when R reads it (using read.delim)

V1    V2    V3    V4
Text  More  0.11  (a)kdfs 
hdfa  ag$
Text  More  1.12  a
Text  More  0.21  v
Text  More  1222  (a)sdfs 
sdfa->g
Text  More  1232  (a)sdfs 
sdfa->g

Those fields all have weird characters that aren't all shared with the other columns/rows. However, as seen, the spaces aren't flanked by the same characters.

In the original file, the rows are separated properly. Is there a way to do any of the following?

Stop separating by spaces after the fourth column is created
Have fields starting/ending with certain characters be stuck together as a string/add a non-space character where the spaces are
Generically, allow exceptions to sep

Quite new to R so sorry if this is very naive. Here is what my script looks like up to then:

strs <- readLines("file")
dat <- read.delim(text = strs, 
            skip = 17, 
            col.names = c("V1", "V2", "V3", "V4"),
            sep = " ", header = F)

Is there anything I can add to either read.delim or readLines or in between those to fix this problem? As there is fluff that needs to be cut out (hence the skip) I can't use read.table (correct me if I'm wrong).

Some of the characters around the spaces are shared, so I would be willing to use a more tedious method to put other characters in place of the spaces in between e.g. 's' and 's'. Would that be possible with gsub if there isn't an easier method?

Thanks so much!

EDIT: Flash of insight, would it be possible to make the fourth column a new table (that's of course not separated by spaces), then replace all spaces in that table with something else? How would I go about 'breaking off' the fourth column/columns after the third column?

G. Grothendieck · Accepted Answer

1) Try this:

for(i in 1:3) strs <- sub(" +", ",", strs)
read.csv(text = strs)

The result of the last line is:

    V1   V2      V3               V4
1 Text More    0.11 (a)kdfs hdfa ag$
2 Text More    1.12                a
3 Text More    0.21                v
4 Text More 1222.00  (a)sdfs sdfa->g
5 Text More 1232.00  (a)sdfs sdfa->g

2) Here is a second solution:

strs.comma <- sub("^(\S+) +(\S+) +(\S+) +", "\1,\2,\3,", strs)
read.csv(text = strs.comma)

Exceptions to sep = " " when reading table into R? Dealing with whitespace within fields

Tags:

import

r

questionmark

1 Answers

G. Grothendieck

Recent Activity

Donate For Us

Exceptions to sep = " " when reading table into R? Dealing with whitespace within fields

Tags:

import

r

questionmark

1 Answers

G. Grothendieck

Related questions

Recent Activity

Donate For Us