Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When I import text file into R, I get a special character appended to the first value of the first column

Tags:

r

Sometimes when I import text file into R, i get the character "" appended to the first value of the first column. Does anyone know why this is?

For example a text file with the values:

2011_21,3130
2010_51,4153
2011_16,3168
2010_20,3945
2012_38,2099
2012_17,2436
2010_40,2090
2011_2 ,1462

bring up the following results in R:

1st I read the file in:

ts_data <- read.csv("yr_wk sales.csv", header=FALSE)
head(ts_data)

This is the data that's returned:

 V1   V2
1 2011_21 3130
2    2010_51 4153
3    2011_16 3168
4    2010_20 3945
5    2012_38 2099
6    2012_17 2436

How to avoid this?

like image 692
hamel Avatar asked Mar 06 '13 22:03

hamel


3 Answers

You need to use the following:

ts_data <- read.csv("yr_wk sales.csv", fileEncoding="UTF-8-BOM", header=FALSE)
head(ts_data)
like image 93
Northernlad Avatar answered Oct 21 '22 10:10

Northernlad


I got this problem when I worked with the txt file on Microsoft Word. I copied the data from the txt saved by MS Word to a new txt file using Notepad and the problem was solved.

like image 44
Eduardo Tiecher Avatar answered Oct 21 '22 09:10

Eduardo Tiecher


As I've noted in the comments, this is the Byte Order Mark. There is discussion here (http://cran.r-project.org/doc/manuals/R-data.html) about dealing with it.

If you know the file encoding, you can sort it out. Assuming it is UTF-8:

ts_data <- read.table("yr_wk sales.csv", fileEncoding = "UTF-8")
like image 2
alexwhan Avatar answered Oct 21 '22 11:10

alexwhan