Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R's read.csv prepending 1st column name with junk text [duplicate]

I have exported data from a result grid in SQL Server Management Studio to a csv file. The csv file looks correct.

But when I read the data into an R dataframe using read.csv, the first column name is prepended with "ï..". How do I get rid of this junk text?

Example:

str(trainData)  'data.frame':   64169 obs. of  20 variables:      $ ï..Column1             : int  3232...     $ Column2                : int  4242... 

The data looks something like this (nothing special) :

Column1,Column2
100116577,100116577
100116698,100116702

like image 833
Daniel PP Cabral Avatar asked Jul 04 '14 06:07

Daniel PP Cabral


People also ask

Why does ï appear in R?

It is the byte order mark (or BOM) and it's telling the computer that the characters that follow are encoded in Unicode. However, text editors might interpret this character as something else: namely .

How do I read a csv file in R?

To load a. csv file into the current script and operate with it, use the read. csv() method in base R. The output is delivered as a data frame, with row numbers given to integers starting at 1.


1 Answers

You've got a Unicode UTF-8 BOM at the start of the file:

http://en.wikipedia.org/wiki/Byte_order_mark

A text editor or web browser interpreting the text as ISO-8859-1 or CP1252 will display the characters  for this

R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters.

Here:

http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html

Duncan Murdoch suggests:

You can declare a file to be in encoding "UTF-8-BOM" if you want to ignore a BOM on input

So try your read.csv with fileEncoding="UTF-8-BOM" or persuade your SQL wotsit to not output a BOM.

Otherwise you may as well test if the first name starts with ï.. and strip it with substr (as long as you know you'll never have a column that does start like that genuinely...)

like image 112
Spacedman Avatar answered Sep 23 '22 00:09

Spacedman