What is the best way to read a file into R when the header has two necessary lines for the header?
This happens to me all the time, as people often use one line for the column name and then include another line underneath it for the unit of measurement. I don't want to skip anything. I want the names and the units to carry through.
Here is what a typical file with two headers might look like:
trt biomass yield crop Mg/ha bu/ac C2 17.76 205.92 C2 17.96 207.86 CC 17.72 197.22 CC 18.42 205.20 CCW 18.15 200.51 CCW 17.45 190.59 P 3.09 0.00 P 3.34 0.00 S2 5.13 49.68 S2 5.36 49.72
I would do two steps, assuming we know that the first row contains the labels, and there are always two headers.
header <- scan("file.txt", nlines = 1, what = character()) data <- read.table("file.txt", skip = 2, header = FALSE)
Then add the character vector header
on as the names
component:
names(data) <- header
For your data this would be
header <- scan("data.txt", nlines = 1, what = character()) data <- read.table("data.txt", skip = 2, header = FALSE) names(data) <- header head(data) > head(data) trt biomass yield 1 C2 17.76 205.92 2 C2 17.96 207.86 3 CC 17.72 197.22 4 CC 18.42 205.20 5 CCW 18.15 200.51 6 CCW 17.45 190.59
If you want the units, as per @DWin's answer, then do a second scan()
on line 2
header2 <- scan("data.txt", skip = 1, nlines = 1, what = character()) names(data) <- paste0(header, header2) > head(data) trtcrop biomassMg/ha yieldbu/ac 1 C2 17.76 205.92 2 C2 17.96 207.86 3 CC 17.72 197.22 4 CC 18.42 205.20 5 CCW 18.15 200.51 6 CCW 17.45 190.59
Use readLines
with 2 for the limit, parse it, paste0
them together, then read in with read.table
with skip =2
and header=FALSE
(the default). Finish the process off with assignment of the column names:
dat <- "trt biomass yield crop Mg/ha bu/ac C2 17.76 205.92 C2 17.96 207.86 CC 17.72 197.22 CC 18.42 205.20 CCW 18.15 200.51 CCW 17.45 190.59 P 3.09 0.00 P 3.34 0.00 S2 5.13 49.68 S2 5.36 49.72 "
You would probably use a file argument but using the text
argument to the read-functions makes this more self-contained:
readLines(textConnection(dat),n=2) #[1] "trt\tbiomass\tyield" "crop\tMg/ha\tbu/ac" head2 <- read.table(text=readLines(textConnection(dat),n=2), sep="\t", stringsAsFactors=FALSE) with(head2, paste0(head2[1,],head2[2,]) ) # [1] "trtcrop" "biomassMg/ha" "yieldbu/ac" joinheadrs <- with(head2, paste0(head2[1,],head2[2,]) ) newdat <- read.table(text=dat, sep="\t",skip=2) colnames(newdat)<- joinheadrs #------------------- > newdat trtcrop biomassMg/ha yieldbu/ac 1 C2 17.76 205.92 2 C2 17.96 207.86 3 CC 17.72 197.22 4 CC 18.42 205.20 5 CCW 18.15 200.51 6 CCW 17.45 190.59 7 P 3.09 0.00 8 P 3.34 0.00 9 S2 5.13 49.68 10 S2 5.36 49.72
Might be better to use paste with an underscore-sep:
joinheadrs <- with(head2, paste(head2[1,],head2[2,] ,sep="_") ) joinheadrs #[1] "trt_crop" "biomass_Mg/ha" "yield_bu/ac"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With