Import fixed width data file with no line separator

Tags:

I have fixed width data files (.dbf) that don't have line separators. Here is what two lines of that datafile looks like:

20141101 77h  3.210                                  0    3 20141102 76h  3.090                                  0    3

The widths of one line is c(8,4,7,41) for date (8), some time measure (4), the data point (7), and some other columns that i can summarize in one "rest" column (41). After one line there is no separator and the next line is just appended to the first line. All time steps are basically written consecutively in one massive line. There is exclusively numbers, characters and white space in this file.

With read.fwf('filepath', widths = c(8,4,7,41)) R stops reading after the first line due to lack of line separator.

Is there an argument to tell read.fwf() when to start reading the new line when there is no line separator? Or should i use a different read command?

Thanks in advance.

384

asked Feb 05 '16 10:02

Ben

2 Answers

Maybe not the best idea but this should work:

content <- scan('filepath','character',sep='~') # Warning choose a sep not appearing in datas to get the whole file.
# Split content in lines:
lines <- regmatches(content,gregexpr('.{60}',content))[[1]]
x <- tempfile()
write(lines,x)
data <- read.fwf(x, widths = c(8,4,7,41))
unlink(x)

The idea is to read the whole file, get each occurence of 60 chars into a single entry, write this to a tempfile, and read the data from this tempfile before deleting the temporary file.

Another approach is doable with regexes and package stringr (still with content resulting from scan above):

library(stringr)
d <- data.frame( str_match_all( content, "(.{8})(.{4})(.{7})(.{41})")[[1]][,2:5], stringsAsFactors=FALSE)

which gives:

        V1   V2      V3                                        V4
1 20141101  77h   3.210                                   0    3 
2 20141102  76h   3.090                                   0    3

str_match_all return a list, here with 1 element because there's only one line as input, so we remove it with [[1]].

Now the return is 5 columns, the first one being the full match, others being the capture groups so we subset the matrix on columns 2 to 5 to get only the 4 columns we need and wrap it in as.data.frame to get a data.frame at end.

you can then name the columns with colnames(d) <- c('date','time','data_point','rest')

If you wish to clean up the white spaces you can wrap the str_extract_all result in trimws (thanks to @jaap for the remind of this function) like this:

td <- data.frame( trimws( str_match_all( content, "(.{8})(.{4})(.{7})(.{41})")[[1]][,2:5] ), stringsAsFactors=FALSE)

Output:

        X1  X2    X3     X4
1 20141101 77h 3.210 0    3
2 20141102 76h 3.090 0    3

165

answered Oct 24 '22 08:10

Tensibai

A different, and probably less elegant, solution with readLines, substr, trimws, separate (tidyr) and mutate_all (dplyr):

txt <- readLines('filepath')
dfx <- data.frame(V1 = sapply(seq(from=1, to=nchar(txt), by=60),
                              function(x) substr(txt, x, x+59)))

library(dplyr)
library(tidyr)
dfx %>% 
  separate(V1, c(paste0("V",LETTERS[1:5])), c(8,12,19,55)) %>% 
  mutate_all(trimws)

which gives:

        VA  VB    VC VD VE
1 20141101 77h 3.210  0  3
2 20141102 76h 3.090  0  3

To get different column names , just replace c(paste0("V",LETTERS[1:5]) with a vector of columnnames you want.

If you want to transform the columns into the correct classes instead of into character, you can use funs(ul = type.convert(trimws(.))) inside mutate_all.

answered Oct 24 '22 08:10

Jaap

Related questions
                            
                                Optimize R code to create distance matrix based on customized distance function
                            
                                R: Remove leading zeroes from the beginning of a character string
                            
                                How to extract numbers inbetween characters in R
                            
                                incorporate code listings from an external file in knitr/markdown
                            
                                I can't see the result of silhouette plot except for the axis(in R)
                            
                                mutate and rowSums exclude columns
                            
                                Sorting a list of unequal-size vectors in r
                            
                                Determine file type in R based on the content
                            
                                How do I get annotation_custom() grob to display along with scale_y_reverse() using R and ggplot2?
                            
                                Can I use the R data.table join capability to select rows and perform some operation?
                            
                                How to define S4 method for taking the opposite of the object?
                            
                                no more geom_label( ) in ggplot2 1.01?
                            
                                How to omit NA values while pasting numerous column values together?
                            
                                R-Error: data_frames can only contain 1d atomic vectors and lists
                            
                                How to set tick labels to edges of continuous ggplot2 legend
                            
                                Edit 2 stat_hex_bin geoms separately ggplot2
                            
                                Interactively show/hide code R Markdown/Knitr report
                            
                                GridExtra: Align text to right
                            
                                How do I subset datetimes and pivot the measurement column in R
                            
                                How to do a data.table rolling join?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Import fixed width data file with no line separator

Tags:

import

r

dbf

Ben

People also ask

2 Answers

Tensibai

Jaap

Recent Activity

Donate For Us