I need to import a table into R that is separated by spaces. Unfortunately, within some of the fields, there are spaces which cause R to separate into a new row. Is there any way of making those fields 'stick together'?
For example, the table looks like this:
V1 V2 V3 V4
Text More 0.11 (a)kdfs hdfa ag$
Text More 1.12 a
Text More 0.21 v
Text More 1222 (a)sdfs sdfa->g
Text More 1232 (a)sdfs sdfa->g
But gets turned into this when R reads it (using read.delim
)
V1 V2 V3 V4
Text More 0.11 (a)kdfs
hdfa ag$
Text More 1.12 a
Text More 0.21 v
Text More 1222 (a)sdfs
sdfa->g
Text More 1232 (a)sdfs
sdfa->g
Those fields all have weird characters that aren't all shared with the other columns/rows. However, as seen, the spaces aren't flanked by the same characters.
In the original file, the rows are separated properly. Is there a way to do any of the following?
sep
Quite new to R so sorry if this is very naive. Here is what my script looks like up to then:
strs <- readLines("file")
dat <- read.delim(text = strs,
skip = 17,
col.names = c("V1", "V2", "V3", "V4"),
sep = " ", header = F)
Is there anything I can add to either read.delim
or readLines
or in between those to fix this problem? As there is fluff that needs to be cut out (hence the skip
) I can't use read.table
(correct me if I'm wrong).
Some of the characters around the spaces are shared, so I would be willing to use a more tedious method to put other characters in place of the spaces in between e.g. 's' and 's'. Would that be possible with gsub
if there isn't an easier method?
Thanks so much!
EDIT: Flash of insight, would it be possible to make the fourth column a new table (that's of course not separated by spaces), then replace all spaces in that table with something else? How would I go about 'breaking off' the fourth column/columns after the third column?
1) Try this:
for(i in 1:3) strs <- sub(" +", ",", strs)
read.csv(text = strs)
The result of the last line is:
V1 V2 V3 V4
1 Text More 0.11 (a)kdfs hdfa ag$
2 Text More 1.12 a
3 Text More 0.21 v
4 Text More 1222.00 (a)sdfs sdfa->g
5 Text More 1232.00 (a)sdfs sdfa->g
2) Here is a second solution:
strs.comma <- sub("^(\\S+) +(\\S+) +(\\S+) +", "\\1,\\2,\\3,", strs)
read.csv(text = strs.comma)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With