I have exported data from a result grid in SQL Server Management Studio to a csv file. The csv file looks correct. But when I read the data into an R dataframe using read.csv, the first column name is prepended with "ï..". How do I get rid of this junk text? Example: <pre class="prettyprint"><code>str(trainData) 'data.frame': 64169 obs. of 20 variables: $ ï..Column1 : int 3232... $ Column2 : int 4242... </code></pre> The data looks something like this (nothing special) : Column1,Column2 100116577,100116577 100116698,100116702

You've got a Unicode UTF-8 BOM at the start of the file: http://en.wikipedia.org/wiki/Byte_order_mark <blockquote> A text editor or web browser interpreting the text as ISO-8859-1 or CP1252 will display the characters ï»¿ for this </blockquote> R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters. Here: http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html Duncan Murdoch suggests: <blockquote> You can declare a file to be in encoding "UTF-8-BOM" if you want to ignore a BOM on input </blockquote> So try your <code>read.csv</code> with <code>fileEncoding="UTF-8-BOM"</code> or persuade your SQL wotsit to not output a BOM. Otherwise you may as well test if the first name starts with <code>ï..</code> and strip it with <code>substr</code> (as long as you know you'll never have a column that does start like that genuinely...)

R's read.csv prepending 1st column name with junk text [duplicate]

Tags:

r

utf-8

byte-order-mark

I have exported data from a result grid in SQL Server Management Studio to a csv file. The csv file looks correct.

But when I read the data into an R dataframe using read.csv, the first column name is prepended with "ï..". How do I get rid of this junk text?

Example:

str(trainData)  'data.frame':   64169 obs. of  20 variables:      $ ï..Column1             : int  3232...     $ Column2                : int  4242...

The data looks something like this (nothing special) :

Column1,Column2
100116577,100116577
100116698,100116702

833

asked Jul 04 '14 06:07

Daniel PP Cabral

1 Answers

You've got a Unicode UTF-8 BOM at the start of the file:

http://en.wikipedia.org/wiki/Byte_order_mark

A text editor or web browser interpreting the text as ISO-8859-1 or CP1252 will display the characters ï»¿ for this

R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters.

Here:

http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html

Duncan Murdoch suggests:

You can declare a file to be in encoding "UTF-8-BOM" if you want to ignore a BOM on input

So try your read.csv with fileEncoding="UTF-8-BOM" or persuade your SQL wotsit to not output a BOM.

Otherwise you may as well test if the first name starts with ï.. and strip it with substr (as long as you know you'll never have a column that does start like that genuinely...)

112

answered Sep 23 '22 00:09

Spacedman

Related questions
                            
                                dplyr summarise() with multiple return values from a single function
                            
                                Greek letters, symbols, and line breaks inside a ggplot legend label
                            
                                How to generate a vector containing a numeric sequence with a given step?
                            
                                Importing Functions into Current Namespace
                            
                                Poor resolution in knitr using Rmd
                            
                                Linear Regression with a known fixed intercept in R
                            
                                How to create an R function programmatically?
                            
                                data.table and parallel computing
                            
                                What is a good way to read line-by-line in R?
                            
                                Removing certain characters from a string in R
                            
                                R Shiny - disable / able shinyUI elements
                            
                                Force no default selection in selectInput()
                            
                                Passing string variable facet_wrap() in ggplot using R [duplicate]
                            
                                How do I get the absolute path of an input file in R
                            
                                The same width of the bars in geom_bar(position = "dodge")
                            
                                How do you determine the namespace of a function?
                            
                                multiplying all elements of a vector in R
                            
                                How to split the Main title of a plot in 2 or more lines?
                            
                                predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading
                            
                                wrap long text in kable table column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With