How to prevent 'read.table' from changing underscores and hyphens to dots?

Question

I have a bunch of files which I'm merging in one data frame. The file names are as such: unc.edu.b6530750-0410-43ec-bb79-f862ca3424a6.1918120.rsem.genes.results

And I want the file names to be the column names. I'm using the following code:

for (file in file_list){

  if (!exists("dataset")){
      dataset <- read.table(file, header=TRUE, colClasses = c(rep("character", 2),                     rep("NULL", 2)), col.names = c("gene_id", deparse(substitute(file)), "NuLL", "NULL"), sep="	")
      print(deparse(substitute(file)))
    }

    if (exists("dataset")){
      temp_dataset <-read.table(file, header=TRUE, colClasses = c(rep("character", 2), rep("NULL", 2)), col.names = c("gene_id", deparse(substitute(file)), "NuLL", "NULL"), sep="	")
      print(deparse(substitute(file)))
      dataset<-merge(dataset, temp_dataset, by = "gene_id")
      rm(temp_dataset)
    }
}

All goes well except that the column names now have underscores replaced by dots.

colnames(data)

[1] "gene_id"                                                                       
[2] "X...unc.edu.02cb8dbe.ef56.471c.b52d.41c29219fd95.1794854.rsem.genes.results..x"
[3] "X...unc.edu.02cb8dbe.ef56.471c.b52d.41c29219fd95.1794854.rsem.genes.results..y"
[4] "X...unc.edu.02f5dcba.bdcc.4424.aed4.195a8d551325.2085643.rsem.genes.results."

Any explanation as to what causes this would be helpful because I will need to change these names, using another file, later on.

Nick Kennedy · Accepted Answer

As @akrun stated in the comments, read.table(file, ..., check.names=FALSE) will solve the immediate problem.

However, there are now neater ways to achieve what you're trying to do using some of the tidyverse packages.

First let's load packages and generate some sample data:

library(purrr)
library(readr)
data <- c("gene_id	result	random_a	random_b
TNF	1e-8	1.7	4.3
IL8	0.4	-0.3	8.6",
"gene_id	result	random_a	random_b
TNF	2.4e-7	1.7	4.3
IL8	0.9	0.8	8.3",
"gene_id	result	random_a	random_b
TNSF8	0.003	2.1	9.7
IL8	0.02	1.9	4.6")
file_list <- sprintf("file_%d.csv", 1:3)
walk2(data, file_list, ~write_tsv(read_tsv(.x), .y))

Now here's the actual bit that reads and merges the data:

library(purrr)
library(readr)
library(dplyr)
dataset <- file_list %>%
  map(~read_tsv(.x, col_types = "cc__", col_names = c("gene_id", .x), skip = 1)) %>%
  reduce(full_join, by = "gene_id")

This uses map to read in each file one by one, skipping the first presumably header row and the third and fourth columns, and renames the resulting columns as gene_id and with the name of the file. These are then sequentially joined using dplyr::full_join and purrr::reduce.

Although this question was asked a long time ago, this type of task is common, so I thought a tidyverse-based answer would still be useful. (And it's still in the 'unanswered questions with votes' filter.)

How to prevent 'read.table' from changing underscores and hyphens to dots?

Tags:

r

character

read.table

paul_dg

1 Answers

Nick Kennedy

Recent Activity

Donate For Us

How to prevent 'read.table' from changing underscores and hyphens to dots?

Tags:

r

character

read.table

paul_dg

1 Answers

Nick Kennedy

Related questions

Recent Activity

Donate For Us