I am trying to make a loop that reads in multiple CSV files that all have the same type of air temperature data. However there are rows that I want to skip above the data. These are "alarms" in the dataset. Each file may have a different amount of alarms, thus a different amount of rows to skip. See below:
-------------First CSV file---------------
Logger 001
Alarm 1
Alarm 2
Alarm 3
Alarm 4
Date, Temp
01/01/2011, -1.2
01/02/2011, -1.3
01/03/2011, -1.1
01/04/2011, -1.2
-------------Second CSV file---------------
Logger 001
Alarm 1
Alarm 2
Alarm 3
Alarm 4
Alarm 5
Alarm 6
Alarm 7
Date, Temp
01/01/2011, -1.2
01/02/2011, -1.3
01/03/2011, -1.1
01/04/2011, -1.2
How can I get the index of Date
, Temp
to tell read.csv
to skip to that row?
for (i in 1:length(csv.list)) {
df = read.csv(csv.list[i], header = T, skip=????????)
}
You could add a couple of lines prior to your read.table
in your loop
Use readLines
to read in the data
r <- readLines(textConnection("Logger 001
Alarm 1
Alarm 2
Alarm 3
Alarm 4
Date, Temp
01/01/2011, -1.2
01/02/2011, -1.3
01/03/2011, -1.1
01/04/2011, -1.2"))
[but without the textConnection for you ie r <- readLines("yourcsv")
]
Find the row number where the actual headers begin - using grep
dt <- grep("Date",r)
Read in your data - skipping the lines prior to the headers
read.table(text=r , header=TRUE, sep="," , skip = dt-1)
So to read in your multiple csv files - these will be stored in a list of data,frames
df.lst <- lapply(csv.list , function(i) {
r <- readLines(i)
dt <- grep("Date",r)
read.table(text=r , header=TRUE, sep="," , skip = dt-1)
})
fread
from the "data.table" package might be useful for you since it is pretty good at auto-detecting "junk" header rows. Here's an example:
First, create two sample csv files
cat("Logger 001
Alarm 1
Alarm 2
Alarm 3
Alarm 4
Date, Temp
01/01/2011, -1.2
01/02/2011, -1.3
01/03/2011, -1.1
01/04/2011, -1.2
", file = "socsv1.csv", sep = "\n")
cat("Logger 001
Alarm 1
Alarm 2
Alarm 3
Alarm 4
Alarm 5
Alarm 6
Alarm 7
Date, Temp
01/01/2011, -1.2
01/02/2011, -1.3
01/03/2011, -1.1
01/04/2011, -1.2", file = "socsv2.csv", sep = "\n")
## Check that they were created
list.files(pattern = "socsv")
# [1] "socsv1.csv" "socsv2.csv"
Now, just use fread
and specify the sep
instead of letting fread
guess.
library(data.table)
lapply(list.files(pattern = "socsv"), fread, sep = ",")
# [[1]]
# Date Temp
# 1: 01/01/2011 -1.2
# 2: 01/02/2011 -1.3
# 3: 01/03/2011 -1.1
# 4: 01/04/2011 -1.2
#
# [[2]]
# Date Temp
# 1: 01/01/2011 -1.2
# 2: 01/02/2011 -1.3
# 3: 01/03/2011 -1.1
# 4: 01/04/2011 -1.2
count.fields
can identify the first line of each file that has two comma-separated fields. You can then use this to specify skip
. For example:
writeLines('Logger 001
Alarm 1
Alarm 2
Alarm 3
Alarm 4
Date, Temp
01/01/2011, -1.2
01/02/2011, -1.3
01/03/2011, -1.1
01/04/2011, -1.2', f <- tempfile())
writeLines('Logger 001
Alarm 1
Alarm 2
Alarm 3
Alarm 4
Alarm 5
Alarm 6
Alarm 7
Date, Temp
01/01/2011, -1.2
01/02/2011, -1.3
01/03/2011, -1.1
01/04/2011, -1.2', f2 <- tempfile())
for (x in c(f, f2)) {
ind <- match(2, count.fields(x, ','))
df <- read.csv(x, header = T, skip=ind - 1)
print(df)
}
# Date Temp
# 1 01/01/2011 -1.2
# 2 01/02/2011 -1.3
# 3 01/03/2011 -1.1
# 4 01/04/2011 -1.2
# Date Temp
# 1 01/01/2011 -1.2
# 2 01/02/2011 -1.3
# 3 01/03/2011 -1.1
# 4 01/04/2011 -1.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With