Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skipping rows until row with a certain value

I need to to read a .txt file from an URL, but would like to skip the rows until a row with a certain value. The URL is https://fred.stlouisfed.org/data/HNOMFAQ027S.txt and the data takes the following form:

"

... (number of rows)

... (number of rows)

... (number of rows)

DATE VALUE

1945-01-01 144855

1946-01-01 138515

1947-01-01 136405

1948-01-01 135486

1949-01-01 142455

"

I would like to skip all rows until the row with "DATE // VALUE" and start importing the data from this line onwards (including "DATE // VALUE"). Is there a way to do this with data.table's fread() - or any other way, such as with dplyr?

Thank you very much in advance for your effort and your time!

Best,

c.

like image 343
cthulhukk Avatar asked Feb 26 '26 01:02

cthulhukk


2 Answers

Here's a way to get to extract that info from those text files using readr::read_lines, dplyr, and string handling from stringr.

library(tidyverse)
library(stringr)

df <- data_frame(lines = read_lines("https://fred.stlouisfed.org/data/HNOMFAQ027S.txt")) %>%
  filter(str_detect(lines, "^\\d{4}-\\d{2}-\\d{2}")) %>%
  mutate(date = str_extract(lines, "^\\d{4}-\\d{2}-\\d{2}"),
         value = as.numeric(str_extract(lines, "[\\d-]+$"))) %>%
  select(-lines)

df
#> # A tibble: 286 x 2
#>          date value
#>         <chr> <dbl>
#>  1 1945-10-01  1245
#>  2 1946-01-01    NA
#>  3 1946-04-01    NA
#>  4 1946-07-01    NA
#>  5 1946-10-01  1298
#>  6 1947-01-01    NA
#>  7 1947-04-01    NA
#>  8 1947-07-01    NA
#>  9 1947-10-01  1413
#> 10 1948-01-01    NA
#> # ... with 276 more rows

I filtered for all the lines you want to keep using stringr::str_detect, then extracted out the info you want from the string using stringr::str_extract and regexes.

like image 92
Julia Silge Avatar answered Feb 28 '26 15:02

Julia Silge


Combining fread with unix tools:

> fread("curl -s https://fred.stlouisfed.org/data/HNOMFAQ027S.txt | sed -n -e '/^DATE.*VALUE/,$p'")
           DATE   VALUE
  1: 1945-10-01    1245
  2: 1946-01-01       .
  3: 1946-04-01       .
  4: 1946-07-01       .
  5: 1946-10-01    1298
 ---                   
282: 2016-01-01 6566888
283: 2016-04-01 6741075
284: 2016-07-01 7022321
285: 2016-10-01 6998898
286: 2017-01-01 7448792
> 
like image 43
Clayton Stanley Avatar answered Feb 28 '26 14:02

Clayton Stanley