Read fixed width text file

Tags:

r

fixed-width

I'm trying to load this ugly-formatted data-set into my R session: http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for

Weekly SST data starts week centered on 3Jan1990

Nino1+2      Nino3        Nino34        Nino4
Week          SST SSTA     SST SSTA     SST SSTA     SST SSTA 
03JAN1990     23.4-0.4     25.1-0.3     26.6 0.0     28.6 0.3 
10JAN1990     23.4-0.8     25.2-0.3     26.6 0.1     28.6 0.3 
17JAN1990     24.2-0.3     25.3-0.3     26.5-0.1     28.6 0.3

So far, i can read the lines with

  x = readLines(path)

But the file mixes 'white space' with '-' as separators, and i'm not a regex expert. I Appreciate any help on turning this into a nice and clean R data-frame. thanks!

802

asked Jan 17 '13 16:01

Fernando

2 Answers

This is a fixed width file. Use read.fwf() to read it:

x <- read.fwf(
  file=url("http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for"),
  skip=4,
  widths=c(12, 7, 4, 9, 4, 9, 4, 9, 4))

head(x)

            V1   V2   V3   V4   V5   V6   V7   V8  V9
1  03JAN1990   23.4 -0.4 25.1 -0.3 26.6  0.0 28.6 0.3
2  10JAN1990   23.4 -0.8 25.2 -0.3 26.6  0.1 28.6 0.3
3  17JAN1990   24.2 -0.3 25.3 -0.3 26.5 -0.1 28.6 0.3
4  24JAN1990   24.4 -0.5 25.5 -0.4 26.5 -0.1 28.4 0.2
5  31JAN1990   25.1 -0.2 25.8 -0.2 26.7  0.1 28.4 0.2
6  07FEB1990   25.8  0.2 26.1 -0.1 26.8  0.1 28.4 0.3

Update

The package readr (released April, 2015) provides a simple and fast alternative.

library(readr)

x <- read_fwf(
  file="http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for",   
  skip=4,
  fwf_widths(c(12, 7, 4, 9, 4, 9, 4, 9, 4)))

Speed comparison: readr::read_fwf() was ~2x faster than utils::read.fwf ().

123

answered Oct 23 '22 07:10

Andrie

Another way to determine widths...

df <- read.fwf(
  file=url("http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for"),
  widths=c(-1, 9, -5, 4, 4, -5, 4, 4, -5, 4, 4, -5, 4, 4),
  skip=4
)

The -1 in the widths argument says there is a one-character column that should be ignored,the -5 in the widths argument says there is a five-character column that should be ignored, likewise...

ref : https://www.inkling.com/read/r-cookbook-paul-teetor-1st/chapter-4/recipe-4-6

answered Oct 23 '22 06:10

Pavithra Gunasekara

Related questions
                            
                                How to create a numeric vector of zero length in R
                            
                                Create sequence of repeated values, in sequence?
                            
                                Pass a string as variable name in dplyr::filter
                            
                                Apply a function to every specified column in a data.table and update by reference
                            
                                How to tell what is in one vector and not another?
                            
                                Arbitrary sections in roxygen docs
                            
                                Explain ggplot2 warning: "Removed k rows containing missing values"
                            
                                break/exit script
                            
                                What are 'user' and 'system' times measuring in R system.time(exp) output?
                            
                                Split a large dataframe into a list of data frames based on common value in column
                            
                                Select multiple elements from a list
                            
                                Why is message() a better choice than print() in R for writing a package?
                            
                                Skipping error in for-loop
                            
                                dplyr mutate/replace several columns on a subset of rows
                            
                                Extract the first (or last) n characters of a string
                            
                                Plot correlation matrix into a graph
                            
                                How to add a ggplot2 subtitle with different size and colour?
                            
                                Plot labels at ends of lines
                            
                                Read an Excel file directly from a R script
                            
                                How to fit a smooth curve to my data in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With