Import raw data into R

Question

please anyone can help me to import this data into R from a text or dat file. It has space delimited, but cities names should not considered as two names. Like NEW YORK.

1 NEW YORK  7,262,700
2 LOS ANGELES  3,259,340
3 CHICAGO  3,009,530
4 HOUSTON  1,728,910
5 PHILADELPHIA  1,642,900
6 DETROIT  1,086,220
7 SAN DIEGO  1,015,190
8 DALLAS  1,003,520
9 SAN ANTONIO  914,350
10 PHOENIX  894,070

Hugh · Accepted Answer

For your particular data frame, where true spaces only occur between capital letters, consider using a regular expression:

gsub("(*[A-Z]) ([A-Z]+)", "\1-\2", "1 NEW YORK  7,262,700")
# [1] "1 NEW-YORK 7,262,700"
gsub("(*[A-Z]) ([A-Z]+)", "\1-\2", "3 CHICAGO  3,009,530")
# [1] "3 CHICAGO  3,009,530"

You can then interpret spaces as field separators.

A5C1D2H2I1M1N2O1R2T1 · Answer

A variation on a theme... but first, some sample data:

cat("1 NEW YORK  7,262,700",
    "2 LOS ANGELES  3,259,340",
    "3 CHICAGO  3,009,530",
    "4 HOUSTON  1,728,910",
    "5 PHILADELPHIA  1,642,900",
    "6 DETROIT  1,086,220",
    "7 SAN DIEGO  1,015,190",
    "8 DALLAS  1,003,520",
    "9 SAN ANTONIO  914,350",
    "10 PHOENIX  894,070", sep = "
", file = "test.txt")

Step 1: Read the data in with readLines

x <- readLines("test.txt")

Step 2: Figure out a regular expression that you can use to insert delimiters. Here, the pattern seems to be (looking from the end of the lines) a set of numbers and commas preceded by space preceded by some words in ALL CAPS. We can capture those groups and insert some "tab" delimiters ( ). The extra slashes are to properly escape them.

gsub("([A-Z ]+)(\s?[0-9,]+$)", "\	\1\	\2", x)
#  [1] "1	 NEW YORK  	7,262,700"     "2	 LOS ANGELES  	3,259,340" 
#  [3] "3	 CHICAGO  	3,009,530"      "4	 HOUSTON  	1,728,910"     
#  [5] "5	 PHILADELPHIA  	1,642,900" "6	 DETROIT  	1,086,220"     
#  [7] "7	 SAN DIEGO  	1,015,190"    "8	 DALLAS  	1,003,520"      
#  [9] "9	 SAN ANTONIO  	914,350"    "10	 PHOENIX  	894,070"

Step 3: Since we know our gsub is working, and we know that read.delim has a "text" argument that can be used instead of a "file" argument, we can use read.delim directly on the result of gsub:

out <- read.delim(text = gsub("([A-Z ]+)(\s?[0-9,]+$)", "\	\1\	\2", x), 
                  header = FALSE, strip.white = TRUE)
out
#    V1           V2        V3
# 1   1     NEW YORK 7,262,700
# 2   2  LOS ANGELES 3,259,340
# 3   3      CHICAGO 3,009,530
# 4   4      HOUSTON 1,728,910
# 5   5 PHILADELPHIA 1,642,900
# 6   6      DETROIT 1,086,220
# 7   7    SAN DIEGO 1,015,190
# 8   8       DALLAS 1,003,520
# 9   9  SAN ANTONIO   914,350
# 10 10      PHOENIX   894,070

One possible last step would be to convert the third column to numeric:

out$V3 <- as.numeric(gsub(",", "", out$V3))

Import raw data into R

Tags:

import

r

read.table

Mike

2 Answers

Hugh

A5C1D2H2I1M1N2O1R2T1

Recent Activity

Donate For Us

Import raw data into R

Tags:

import

r

read.table

Mike

2 Answers

Hugh

A5C1D2H2I1M1N2O1R2T1

Related questions

Recent Activity

Donate For Us