I have a large text file with a variable number of fields in each row. The first entry in each row corresponds to a biological pathway, and each subsequent entry corresponds to a gene in that pathway. The first few lines might look like this <pre class="prettyprint"><code>path1 gene1 gene2 path2 gene3 gene4 gene5 gene6 path3 gene7 gene8 gene9 </code></pre> I need to read this file into R as a list, with each element being a character vector, and the name of each element in the list being the first element on the line, for example: <pre class="prettyprint"><code>> pathways <- list( + path1=c("gene1","gene2"), + path2=c("gene3","gene4","gene5","gene6"), + path3=c("gene7","gene8","gene9") + ) > > str(pathways) List of 3 $ path1: chr [1:2] "gene1" "gene2" $ path2: chr [1:4] "gene3" "gene4" "gene5" "gene6" $ path3: chr [1:3] "gene7" "gene8" "gene9" > > str(pathways$path1) chr [1:2] "gene1" "gene2" > > print(pathways) $path1 [1] "gene1" "gene2" $path2 [1] "gene3" "gene4" "gene5" "gene6" $path3 [1] "gene7" "gene8" "gene9" </code></pre> ...but I need to do this automatically for thousands of lines. I saw a similar question posted here previously, but I couldn't figure out how to do this from that thread. Thanks in advance.

One solution is to read the data in via <code>read.table()</code>, but use the <code>fill = TRUE</code> argument to pad the rows with fewer "entries", convert the resulting data frame to a list and then clean up the "empty" elements. First, read your snippet of data in: <pre class="prettyprint"><code>con <- textConnection("path1 gene1 gene2 path2 gene3 gene4 gene5 gene6 path3 gene7 gene8 gene9 ") dat <- read.table(con, fill = TRUE, stringsAsFactors = FALSE) close(con) </code></pre> Next we drop the first column, first saving it for the names of the list later <pre class="prettyprint"><code>nams <- dat[, 1] dat <- dat[, -1] </code></pre> Convert the data frame to a list. Here I just split the data frame on the indices 1,2,...,n where n is the number of rows: <pre class="prettyprint"><code>ldat <- split(dat, seq_len(nrow(dat))) </code></pre> Clean up the empty cells: <pre class="prettyprint"><code>ldat <- lapply(ldat, function(x) x[x != ""]) </code></pre> Finally, apply the names <pre class="prettyprint"><code>names(ldat) <- nams </code></pre> Giving: <pre class="prettyprint"><code>> ldat $path1 [1] "gene1" "gene2" $path2 [1] "gene3" "gene4" "gene5" "gene6" $path3 [1] "gene7" "gene8" "gene9" </code></pre>

Text file to list in R

I have a large text file with a variable number of fields in each row. The first entry in each row corresponds to a biological pathway, and each subsequent entry corresponds to a gene in that pathway. The first few lines might look like this

path1   gene1 gene2 path2   gene3 gene4 gene5 gene6 path3   gene7 gene8 gene9

I need to read this file into R as a list, with each element being a character vector, and the name of each element in the list being the first element on the line, for example:

> pathways <- list( +     path1=c("gene1","gene2"),  +     path2=c("gene3","gene4","gene5","gene6"), +     path3=c("gene7","gene8","gene9") + ) >  > str(pathways) List of 3  $ path1: chr [1:2] "gene1" "gene2"  $ path2: chr [1:4] "gene3" "gene4" "gene5" "gene6"  $ path3: chr [1:3] "gene7" "gene8" "gene9" >  > str(pathways$path1)  chr [1:2] "gene1" "gene2" >  > print(pathways) $path1 [1] "gene1" "gene2"  $path2 [1] "gene3" "gene4" "gene5" "gene6"  $path3 [1] "gene7" "gene8" "gene9"

...but I need to do this automatically for thousands of lines. I saw a similar question posted here previously, but I couldn't figure out how to do this from that thread.

Thanks in advance.

How do I export a list from text to R?

You can use the sink() function to quickly export a list to a CSV file or text file in R.

How do I write a list in R?

How to Create Lists in R? We can use the list() function to create a list. Another way to create a list is to use the c() function. The c() function coerces elements into the same type, so, if there is a list amongst the elements, then all elements are turned into components of a list.

Here's one way to do it:

# Read in the data x <- scan("data.txt", what="", sep="\n") # Separate elements by one or more whitepace y <- strsplit(x, "[[:space:]]+") # Extract the first vector element and set it as the list element name names(y) <- sapply(y, `[[`, 1) #names(y) <- sapply(y, function(x) x[[1]]) # same as above # Remove the first vector element from each list element y <- lapply(y, `[`, -1) #y <- lapply(y, function(x) x[-1]) # same as above

One solution is to read the data in via read.table(), but use the fill = TRUE argument to pad the rows with fewer "entries", convert the resulting data frame to a list and then clean up the "empty" elements.

First, read your snippet of data in:

con <- textConnection("path1   gene1 gene2 path2   gene3 gene4 gene5 gene6 path3   gene7 gene8 gene9 ") dat <- read.table(con, fill = TRUE, stringsAsFactors = FALSE) close(con)

Next we drop the first column, first saving it for the names of the list later

nams <- dat[, 1] dat <- dat[, -1]

Convert the data frame to a list. Here I just split the data frame on the indices 1,2,...,n where n is the number of rows:

ldat <- split(dat, seq_len(nrow(dat)))

Clean up the empty cells:

ldat <- lapply(ldat, function(x) x[x != ""])

Finally, apply the names

names(ldat) <- nams

Giving:

> ldat $path1 [1] "gene1" "gene2"  $path2 [1] "gene3" "gene4" "gene5" "gene6"  $path3 [1] "gene7" "gene8" "gene9"

Text file to list in R

Tags:

text

list

r

statistics

Stephen Turner

People also ask

2 Answers

Joshua Ulrich

Gavin Simpson

Recent Activity

Donate For Us

Text file to list in R

Tags:

text

list

r

statistics

Stephen Turner

People also ask

2 Answers

Joshua Ulrich

Gavin Simpson

Related questions

Recent Activity

Donate For Us