Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Reading lines from a .txt-file after a specific line

Tags:

import

r

I have a bunch of output .txt-files that consists of a large parameter list and a X-Y-coordinate set. I need to extract these coordinates from all files so that only those lines are imported to a vector. This would work fine with

impcoord<-read.table("file.txt",skip= ,nrow= ,...)

but the files print the coordinate sets after different lengths of supporting parameters.

Luckily the coordinates always start after a line containing certain words.

Thus my question is, how do I start reading the .txt-file after these words? Let's say they are:

coordinatesXY

Thanks alot for your time and help!

-Olli

--Edit--

Sorry for the confusion.

The part of the file is as follows:

##XYDATA= (X++(Y..Y))
131071    -2065
131070    -4137
131069    -6408
131068    -8043 
...       ...
...       ...

The first line being the one where skip should end and the following coordinates need to be imported to a vector. As you can see the X-coordinates start from 131071 and end to 0.

like image 409
Olli J Avatar asked Sep 05 '14 07:09

Olli J


1 Answers

1) read.pattern read.pattern in gsubfn can be used to read only lines matching a specific pattern. In this example we match beginning of line, optional space(s), 1 or more digits, 1 or more spaces, an optional minus followed by 1 or more digits, optional space(s), end of line. The portions matching the parenthesized portions of the regexp are returned as columns in a data.frame. text = Lines in this self contained example can be replaced with "myfile.txt", say, if the data is coming from a file. Modify the pattern to suit.

Lines <- "junk
junk
##XYDATA= (X++(Y..Y))
131071    -2065
131070    -4137
131069    -6408
131068    -8043"

library(gsubfn)
DF <- read.pattern(text = Lines, pattern = "^ *(\\d+) +(-?\\d+) *$")

giving:

> DF
      V1    V2
1 131071 -2065
2 131070 -4137
3 131069 -6408
4 131068 -8043

2) read twice Another possibility using only base R is simply to read it once to determine the value of skip= and a second time to do the actual read using that value. To read from a file myfile.txt replace text = Lines and textConnection(Lines) with "myfile.txt" .

read.table(text = Lines, 
    skip = grep("##XYDATA=", readLines(textConnection(Lines))))

Added Some revisions and added second approach.

like image 159
G. Grothendieck Avatar answered Sep 21 '22 23:09

G. Grothendieck