Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading text into data.frame where string values contain spaces

Tags:

r

read.table

What's the easiest way to read text from a printed data.frame into a data.frame when there are string values containing spaces that interfere with read.table? For instance, this data.frame excerpt does not pose a problem:

     candname party elecVotes
1 BarackObama     D       365
2  JohnMcCain     R       173

I can paste it into a read.table call without a problem:

dat <- read.table(text = "     candname party elecVotes
1 BarackObama     D       365
2  JohnMcCain     R       173", header = TRUE)

But if the data has strings with spaces like this:

      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173

Then read.table throws an error as it interprets "Barack" and "Obama" as two separate variables.

like image 822
Sam Firke Avatar asked Mar 15 '23 22:03

Sam Firke


1 Answers

Read the file into L, remove the row numbers and use sub with the indicated regular expression to insert commas between the remaining fields. (Note that "\\d" matches any digit and "\\S" matches any non-whitespace character.) Now re-read it using read.csv:

Lines <- "      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173"

# L <- readLines("myfile")  # read file; for demonstration use next line instead
L <- readLines(textConnection(Lines))

L2 <- sub("^ *\\d+ *", "", L)  # remove row numbers
read.csv(text = sub("^ *(.*\\S) +(\\S+) +(\\S+)$", "\\1,\\2,\\3", L2), as.is = TRUE)

giving:

      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173

Here is a visualization of the regular expression:

^ *(.*\S) +(\S+) +(\S+)$

Regular expression visualization

Debuggex Demo

like image 69
G. Grothendieck Avatar answered Mar 18 '23 14:03

G. Grothendieck