I have a large text file with a variable number of fields in each row. The first entry in each row corresponds to a biological pathway, and each subsequent entry corresponds to a gene in that pathway. The first few lines might look like this
path1 gene1 gene2 path2 gene3 gene4 gene5 gene6 path3 gene7 gene8 gene9
I need to read this file into R as a list, with each element being a character vector, and the name of each element in the list being the first element on the line, for example:
> pathways <- list( + path1=c("gene1","gene2"), + path2=c("gene3","gene4","gene5","gene6"), + path3=c("gene7","gene8","gene9") + ) > > str(pathways) List of 3 $ path1: chr [1:2] "gene1" "gene2" $ path2: chr [1:4] "gene3" "gene4" "gene5" "gene6" $ path3: chr [1:3] "gene7" "gene8" "gene9" > > str(pathways$path1) chr [1:2] "gene1" "gene2" > > print(pathways) $path1 [1] "gene1" "gene2" $path2 [1] "gene3" "gene4" "gene5" "gene6" $path3 [1] "gene7" "gene8" "gene9"
...but I need to do this automatically for thousands of lines. I saw a similar question posted here previously, but I couldn't figure out how to do this from that thread.
Thanks in advance.
You can use the sink() function to quickly export a list to a CSV file or text file in R.
How to Create Lists in R? We can use the list() function to create a list. Another way to create a list is to use the c() function. The c() function coerces elements into the same type, so, if there is a list amongst the elements, then all elements are turned into components of a list.
Here's one way to do it:
# Read in the data x <- scan("data.txt", what="", sep="\n") # Separate elements by one or more whitepace y <- strsplit(x, "[[:space:]]+") # Extract the first vector element and set it as the list element name names(y) <- sapply(y, `[[`, 1) #names(y) <- sapply(y, function(x) x[[1]]) # same as above # Remove the first vector element from each list element y <- lapply(y, `[`, -1) #y <- lapply(y, function(x) x[-1]) # same as above
One solution is to read the data in via read.table()
, but use the fill = TRUE
argument to pad the rows with fewer "entries", convert the resulting data frame to a list and then clean up the "empty" elements.
First, read your snippet of data in:
con <- textConnection("path1 gene1 gene2 path2 gene3 gene4 gene5 gene6 path3 gene7 gene8 gene9 ") dat <- read.table(con, fill = TRUE, stringsAsFactors = FALSE) close(con)
Next we drop the first column, first saving it for the names of the list later
nams <- dat[, 1] dat <- dat[, -1]
Convert the data frame to a list. Here I just split the data frame on the indices 1,2,...,n where n is the number of rows:
ldat <- split(dat, seq_len(nrow(dat)))
Clean up the empty cells:
ldat <- lapply(ldat, function(x) x[x != ""])
Finally, apply the names
names(ldat) <- nams
Giving:
> ldat $path1 [1] "gene1" "gene2" $path2 [1] "gene3" "gene4" "gene5" "gene6" $path3 [1] "gene7" "gene8" "gene9"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With