Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text file to list in R

I have a large text file with a variable number of fields in each row. The first entry in each row corresponds to a biological pathway, and each subsequent entry corresponds to a gene in that pathway. The first few lines might look like this

path1   gene1 gene2 path2   gene3 gene4 gene5 gene6 path3   gene7 gene8 gene9 

I need to read this file into R as a list, with each element being a character vector, and the name of each element in the list being the first element on the line, for example:

> pathways <- list( +     path1=c("gene1","gene2"),  +     path2=c("gene3","gene4","gene5","gene6"), +     path3=c("gene7","gene8","gene9") + ) >  > str(pathways) List of 3  $ path1: chr [1:2] "gene1" "gene2"  $ path2: chr [1:4] "gene3" "gene4" "gene5" "gene6"  $ path3: chr [1:3] "gene7" "gene8" "gene9" >  > str(pathways$path1)  chr [1:2] "gene1" "gene2" >  > print(pathways) $path1 [1] "gene1" "gene2"  $path2 [1] "gene3" "gene4" "gene5" "gene6"  $path3 [1] "gene7" "gene8" "gene9" 

...but I need to do this automatically for thousands of lines. I saw a similar question posted here previously, but I couldn't figure out how to do this from that thread.

Thanks in advance.

like image 938
Stephen Turner Avatar asked Jul 06 '11 20:07

Stephen Turner


People also ask

How do I export a list from text to R?

You can use the sink() function to quickly export a list to a CSV file or text file in R.

How do I write a list in R?

How to Create Lists in R? We can use the list() function to create a list. Another way to create a list is to use the c() function. The c() function coerces elements into the same type, so, if there is a list amongst the elements, then all elements are turned into components of a list.


2 Answers

Here's one way to do it:

# Read in the data x <- scan("data.txt", what="", sep="\n") # Separate elements by one or more whitepace y <- strsplit(x, "[[:space:]]+") # Extract the first vector element and set it as the list element name names(y) <- sapply(y, `[[`, 1) #names(y) <- sapply(y, function(x) x[[1]]) # same as above # Remove the first vector element from each list element y <- lapply(y, `[`, -1) #y <- lapply(y, function(x) x[-1]) # same as above 
like image 122
Joshua Ulrich Avatar answered Sep 23 '22 23:09

Joshua Ulrich


One solution is to read the data in via read.table(), but use the fill = TRUE argument to pad the rows with fewer "entries", convert the resulting data frame to a list and then clean up the "empty" elements.

First, read your snippet of data in:

con <- textConnection("path1   gene1 gene2 path2   gene3 gene4 gene5 gene6 path3   gene7 gene8 gene9 ") dat <- read.table(con, fill = TRUE, stringsAsFactors = FALSE) close(con) 

Next we drop the first column, first saving it for the names of the list later

nams <- dat[, 1] dat <- dat[, -1] 

Convert the data frame to a list. Here I just split the data frame on the indices 1,2,...,n where n is the number of rows:

ldat <- split(dat, seq_len(nrow(dat))) 

Clean up the empty cells:

ldat <- lapply(ldat, function(x) x[x != ""]) 

Finally, apply the names

names(ldat) <- nams 

Giving:

> ldat $path1 [1] "gene1" "gene2"  $path2 [1] "gene3" "gene4" "gene5" "gene6"  $path3 [1] "gene7" "gene8" "gene9" 
like image 23
Gavin Simpson Avatar answered Sep 19 '22 23:09

Gavin Simpson