Sometimes when I import text file into R, i get the character "" appended to the first value of the first column. Does anyone know why this is?
For example a text file with the values:
2011_21,3130
2010_51,4153
2011_16,3168
2010_20,3945
2012_38,2099
2012_17,2436
2010_40,2090
2011_2 ,1462
bring up the following results in R:
1st I read the file in:
ts_data <- read.csv("yr_wk sales.csv", header=FALSE)
head(ts_data)
This is the data that's returned:
V1 V2
1 2011_21 3130
2 2010_51 4153
3 2011_16 3168
4 2010_20 3945
5 2012_38 2099
6 2012_17 2436
How to avoid this?
You need to use the following:
ts_data <- read.csv("yr_wk sales.csv", fileEncoding="UTF-8-BOM", header=FALSE)
head(ts_data)
I got this problem when I worked with the txt file on Microsoft Word. I copied the data from the txt saved by MS Word to a new txt file using Notepad and the problem was solved.
As I've noted in the comments, this is the Byte Order Mark. There is discussion here (http://cran.r-project.org/doc/manuals/R-data.html) about dealing with it.
If you know the file encoding, you can sort it out. Assuming it is UTF-8:
ts_data <- read.table("yr_wk sales.csv", fileEncoding = "UTF-8")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With