Reading text into data.frame where string values contain spaces

Question

What's the easiest way to read text from a printed data.frame into a data.frame when there are string values containing spaces that interfere with read.table? For instance, this data.frame excerpt does not pose a problem:

     candname party elecVotes
1 BarackObama     D       365
2  JohnMcCain     R       173

I can paste it into a read.table call without a problem:

dat <- read.table(text = "     candname party elecVotes
1 BarackObama     D       365
2  JohnMcCain     R       173", header = TRUE)

But if the data has strings with spaces like this:

      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173

Then read.table throws an error as it interprets "Barack" and "Obama" as two separate variables.

G. Grothendieck · Accepted Answer

Read the file into L, remove the row numbers and use sub with the indicated regular expression to insert commas between the remaining fields. (Note that "\d" matches any digit and "\S" matches any non-whitespace character.) Now re-read it using read.csv:

Lines <- "      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173"

# L <- readLines("myfile")  # read file; for demonstration use next line instead
L <- readLines(textConnection(Lines))

L2 <- sub("^ *\d+ *", "", L)  # remove row numbers
read.csv(text = sub("^ *(.*\S) +(\S+) +(\S+)$", "\1,\2,\3", L2), as.is = TRUE)

giving:

      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173

Here is a visualization of the regular expression:

^ *(.*\S) +(\S+) +(\S+)$

Regular expression visualization

Debuggex Demo

Reading text into data.frame where string values contain spaces

Tags:

r

read.table

Sam Firke

1 Answers

G. Grothendieck

Recent Activity

Donate For Us

Reading text into data.frame where string values contain spaces

Tags:

r

read.table

Sam Firke

1 Answers

G. Grothendieck

Related questions

Recent Activity

Donate For Us