Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import txt file in R ignoring first few lines

downloaded data from the MET office on rainfall for Scotland.

First few lines:

Scotland Rainfall (mm)
Areal series, starting from 1910
Allowances have been made for topographic, coastal and urban effects where relationships are found to exist.
Seasons: Winter=Dec-Feb, Spring=Mar-May, Summer=June-Aug, Autumn=Sept-Nov. (Winter: Year refers to Jan/Feb).
Values are ranked and displayed to 1 dp. Where values are equal, rankings are based in order of year descending.
Data are provisional from February 2015 & Winter 2015. Last updated 26/11/2015

     JAN  Year     FEB  Year     MAR  Year     APR  Year     MAY  Year     JUN  Year     JUL  Year     AUG  Year    SEP   Year     OCT  Year     NOV  Year     DEC  Year     WIN  Year     SPR  Year     SUM  Year     AUT  Year     ANN  Year
   293.8  1993   278.1  1990   238.5  1994   191.1  1947   191.4  2011   155.0  1938   185.6  1940   216.5  1985   267.6  1950   258.1  1935   262.0  2009   300.7  2013   743.6  2014   409.5  1986   455.6  1985   661.2  1981  1886.4  2011
   292.2  1928   258.8  1997   233.4  1990   149.0  1910   168.7  1986   137.9  2002   181.4  1988   211.9  1992   221.2  1981   254.0  1954   244.8  1938   268.5  1986   649.5  1995   401.3  2015   435.6  1948   633.8  1954  1828.1  1990
   275.6  2008   244.7  2002   201.3  1992   146.8  1934   155.9  1925   137.8  1948   170.1  1939   202.3  2009   193.9  1982   248.8  2014   242.2  2006   267.2  1929   645.4  2000   393.7  1994   427.8  2009   615.8  1938  1756.8  2014

I am trying to read in this txt file into R and tried the following:

fileURL <- "http://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainfall/ranked/Scotland.txt"

if(!file.exists("scotland_rainfall.txt")){
        #this will download the file in the current working directory
        download.file(fileURL,destfile = "scotland_rainfall.txt")
        dateDownload <- Sys.Date() #30-11-2015
}

scotland_weather <- read.table("scotland_rainfall.txt",skip = 8,header = F,sep = "\t",na.strings = "") 

It interprets everything in factor with various levels:

> head(scotland_weather)
                                                                                                                                                                                                                                              V1
1    293.8  1993   278.1  1990   238.5  1994   191.1  1947   191.4  2011   155.0  1938   185.6  1940   216.5  1985   267.6  1950   258.1  1935   262.0  2009   300.7  2013   743.6  2014   409.5  1986   455.6  1985   661.2  1981  1886.4  2011
2    292.2  1928   258.8  1997   233.4  1990   149.0  1910   168.7  1986   137.9  2002   181.4  1988   211.9  1992   221.2  1981   254.0  1954   244.8  1938   268.5  1986   649.5  1995   401.3  2015   435.6  1948   633.8  1954  1828.1  1990
3    275.6  2008   244.7  2002   201.3  1992   146.8  1934   155.9  1925   137.8  1948   170.1  1939   202.3  2009   193.9  1982   248.8  2014   242.2  2006   267.2  1929   645.4  2000   393.7  1994   427.8  2009   615.8  1938  1756.8  2014
4    252.3  2015   227.9  1989   200.2  1967   142.1  1949   149.5  2015   137.7  1931   165.8  2010   191.4  1962   189.7  2011   247.7  1938   231.3  1917   265.4  2011   638.3  2007   393.2  1967   422.6  1956   594.5  1935  1735.8  1938
5    246.2  1974   224.9  2014   180.2  1979   133.5  1950   137.4  2003   135.0  1966   162.9  1956   190.3  2014   189.7  1927   242.3  1983   229.9  1981   264.0  2006   608.9  1990   391.7  1992   397.0  2004   590.6  1982  1720.0  2008
6    245.0  1975   195.6  1995   180.0  1989   132.9  1932   129.7  2007   131.7  2004   159.9  1985   189.1  2004   189.6  1985   240.9  2001   224.9  1951   261.0  1912   592.8  2015   389.1  1913   390.1  1938   589.2  2006  1716.5  1954

> str(scotland_weather)
'data.frame':   106 obs. of  1 variable:
 $ V1: Factor w/ 106 levels "    38.6  1963    10.3  1932    28.7  1929    14.0  1974    22.5  1984    30.1  1988    32.7  1913     5.1  1947    31.7  1972 "| __truncated__,..: 106 105 104 103 102 101 100 99 98 97 ...

Also tried Header=T

> scotland_weather <- read.table("scotland_rainfall.txt",skip = 8,header = T,sep = "\t",na.strings = "")
> head(scotland_weather)
    X293.8..1993...278.1..1990...238.5..1994...191.1..1947...191.4..2011...155.0..1938...185.6..1940...216.5..1985...267.6..1950...258.1..1935...262.0..2009...300.7..2013...743.6..2014...409.5..1986...455.6..1985...661.2..1981..1886.4..2011
1    292.2  1928   258.8  1997   233.4  1990   149.0  1910   168.7  1986   137.9  2002   181.4  1988   211.9  1992   221.2  1981   254.0  1954   244.8  1938   268.5  1986   649.5  1995   401.3  2015   435.6  1948   633.8  1954  1828.1  1990
2    275.6  2008   244.7  2002   201.3  1992   146.8  1934   155.9  1925   137.8  1948   170.1  1939   202.3  2009   193.9  1982   248.8  2014   242.2  2006   267.2  1929   645.4  2000   393.7  1994   427.8  2009   615.8  1938  1756.8  2014
3    252.3  2015   227.9  1989   200.2  1967   142.1  1949   149.5  2015   137.7  1931   165.8  2010   191.4  1962   189.7  2011   247.7  1938   231.3  1917   265.4  2011   638.3  2007   393.2  1967   422.6  1956   594.5  1935  1735.8  1938
4    246.2  1974   224.9  2014   180.2  1979   133.5  1950   137.4  2003   135.0  1966   162.9  1956   190.3  2014   189.7  1927   242.3  1983   229.9  1981   264.0  2006   608.9  1990   391.7  1992   397.0  2004   590.6  1982  1720.0  2008
5    245.0  1975   195.6  1995   180.0  1989   132.9  1932   129.7  2007   131.7  2004   159.9  1985   189.1  2004   189.6  1985   240.9  2001   224.9  1951   261.0  1912   592.8  2015   389.1  1913   390.1  1938   589.2  2006  1716.5  1954
6    241.9  2005   194.8  1998   179.6  1921   132.3  1927   129.6  1920   130.4  1980   158.0  1953   188.8  1948   187.5  1935   238.1  2008   223.2  1986   260.8  1949   580.6  1920   386.5  1947   387.5  2012   587.8  1984  1696.7  2004
> str(scotland_weather)
'data.frame':   105 obs. of  1 variable:
 $ X293.8..1993...278.1..1990...238.5..1994...191.1..1947...191.4..2011...155.0..1938...185.6..1940...216.5..1985...267.6..1950...258.1..1935...262.0..2009...300.7..2013...743.6..2014...409.5..1986...455.6..1985...661.2..1981..1886.4..2011: Factor w/ 105 levels "    38.6  1963    10.3  1932    28.7  1929    14.0  1974    22.5  1984    30.1  1988    32.7  1913     5.1  1947    31.7  1972 "| __truncated__,..: 105 104 103 102 101 100 99 98 97 96 ...

I wish to retain the same column names as the txt file.

Any other ideas would be appreciated.

Thanks

like image 575
Shery Avatar asked Nov 30 '15 14:11

Shery


People also ask

How do I read a specific line in a text file in R?

Read Lines from a File in R Programming – readLines() Function. readLines() function in R Language reads text lines from an input file. The readLines() function is perfect for text files since it reads the text line by line and creates character objects for each of the lines.

Can you import a TXT file into R?

R programming language can load TXT files.


1 Answers

It seems that the file really has fixed width fields but the header is not consistent with the data rows so read the header and data separately like this. No packages needed.

hdr <- read.table(fileURL, skip = 7, nrow = 1, as.is = TRUE)
widths <- rep(c(8, 6), times = 17) # 8, 6, 8, 6, ..., 8, 6
dd <- read.fwf(fileURL, widths, skip = 8, col.names = hdr, check.names = FALSE)

Note: It would be possible to compute widths from the first line of the data like this:

one.line <- readLines(fileURL, n = 9)[9] # char string with 1st line of data
widths <- diff(c(0, gregexpr("\\S(?=\\s)", paste(one.line, ""), perl = TRUE)[[1]]))
like image 75
G. Grothendieck Avatar answered Nov 01 '22 18:11

G. Grothendieck