Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read tab delimited file with unusual characters, then write an exact copy

Tags:

r

The problem

I have a tab delimited input file that looks like so:

Variable [1]    Variable [2]
111    Something
Nothing    222

The first row represents column names and the two next rows represents column values. As you can see, the column names includes both spaces and some tricky signs.

Now, what I want to do is to import this file into R and then output it again to a new text file, making it look exactly the same as the input. For this purpose I have created the following script (assuming that the input file is called "Test.txt"):

file <- "Test.txt"
x <- read.table(file, header = TRUE, sep = "\t")
write.table(x, file = "TestOutput.txt", sep = "\t", col.names = TRUE, row.names = FALSE)

From this, I get an output that looks like this:

"Variable..1."  "Variable..2."
"1"    "111"    "Something"
"2"    "Nothing"    "222"

Now, there are a couple of problems with this output.

  1. The "[" and "]" signs have been converted to dots.
  2. The spaces have been converted to dots.
  3. Quote signs have appeared everywhere.

How can I make the output file look exactly the same as the input file?

What I've tried so far

Regarding problem number one and two, I've tried specifying the column names through creating an internal vector, c("Variable [1]", "Variable [2]"), and then using the col.names option for read.table(). This gives me the exact same output. I've also tried different encodings, through the encoding option for table.read(). If I look at the internally created vector, mentioned above, it prints the variable names as they should be printed so I guess there is a problem with the conversion between the "text -> R" and the "R -> text" phases of the process. That is, if I look at the data frame created by read.table() without any internally created vectors, the column names are wrong.

As for problem number three, I'm pretty much lost and haven't been able to figure out what I should try.

like image 228
Speldosa Avatar asked Nov 23 '11 12:11

Speldosa


People also ask

How do I read a tab-delimited text file in Python?

To read tab-separated values files with Python, we'll take advantage of the fact that they're similar to CSVs. We'll use Python's csv library and tell it to split things up with tabs instead of commas. Just set the delimiter argument to "\t" . That's it!

What is a tab-delimited text file?

A tab-delimited file contains rows of data. Each row of data contains one or more pieces of data. Each piece of data is called a field. Tab-delimited files to be read by the Data Integrator must contain the same number of fields in every row, although not every field necessarily needs a value.


1 Answers

Given the following input file as test.txt:

Variable [1]    Variable [2]
111 Something
Nothing 222

Where the columns are tab-separated you can use the following code to create an exact copy:

a <- read.table(file='test.txt', check.names=F, sep='\t', header=T, 
    stringsAsFactors=F)
write.table(x=a, file='test_copy.txt', quote=F, row.names=F, 
    col.names=T, sep='\t')
like image 108
SeeLittle Avatar answered Sep 18 '22 22:09

SeeLittle