Why are Xs added to data frame variable names when using read.csv?

Tags:

When I use the read.csv() function in R to load data, I often find that an X has been added to variable names. I think I just about always see it it in the first variable, but I could be wrong.

At first, I thought R might be doing this because I had a space at the beginning of the variable name - I don't.

Second, I had read somewhere that if you have a variable that starts with a number, or is a very short variable name, R would add the X. The variable name is all text and the length of the name of this variable is 12 characters, so it's not short.

Now, this is purely an annoyance. I can rename the column, but it does add a step, albeit a small one.

Is there a way to prevent this from rogue X from infiltrating my data frame?

Here is my original code:

df <- read.csv("/file/location.filecsv", header=T, sep=",")

Here is the variable in question:

str(orders)
'data.frame':   2620276 obs. of  26 variables:
 $ X.OrderDetailID    : Factor w/ 2620193 levels "(2620182 row(s) affected)",..: 105845

457

asked Feb 01 '12 15:02

mikebmassey

3 Answers

read.table and read.csv have a check.names= argument that you can set to FALSE.

For example, try it with this input consisting of just a header:

> read.csv(text = "a,1,b")
[1] a  X1 b 
<0 rows> (or 0-length row.names)

versus

> read.csv(text = "a,1,b", check.names = FALSE)
[1] a 1 b
<0 rows> (or 0-length row.names)

199

answered Oct 13 '22 13:10

G. Grothendieck

It is surprising behavior, but I think we would need a reproducible example. Perhaps you have some invisible/special characters hiding in your file?

names(read.csv(textConnection(
"abcdefghijkl, a1,2x")))

behaves fine. Can you make an example along these lines that demonstrates your problem?

As described in the other answer, check.names=FALSE is a possible workaround. You can experiment with make.names to determine the behavior ...

answered Oct 13 '22 14:10

Ben Bolker

As Gabor said, by default read.csv deafults to converting the names in your header row to be valid variable names (use check.names = FALSE to turn this off). This is done using the function make.names. The help page for that function explains what constitutes a valid variable name.

A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words.

The list of reserved words is found on the help page ?reserved.

The other condition is that the variable name must be 10000 characters or less, but make.names won't shorten it. So be careful of being really verbose with your variable names.

You can check for valid variable names using

library(assertive.code)
is_valid_variable_name(x)

answered Oct 13 '22 14:10

Richie Cotton

Related questions
                            
                                Draw a box around a legend ggplot2
                            
                                Formatting a date in R without leading zeros
                            
                                Error in unserialize(socklist[[n]]) : error reading from connection on Unix
                            
                                Meaning of objects being masked by the global environment
                            
                                Variable width bars in ggplot2 barplot in R
                            
                                Colorize parts of the title in a plot
                            
                                Rank variable by group (dplyr)
                            
                                Data input via shinyTable in R shiny application
                            
                                Apply a function to each row in a data frame in R [duplicate]
                            
                                How to use subscripts in ggplot2 legends [R]
                            
                                Using Roxygen2 Template tags
                            
                                data.table join then add columns to existing data.frame without re-copy
                            
                                List files in R that do NOT match a pattern
                            
                                Handling missing/incomplete data in R--is there function to mask but not remove NAs?
                            
                                Package inputenc Error: Unicode char \u8 in RStudio
                            
                                Change arrowhead of arrows()
                            
                                Applying a function to two lists?
                            
                                Remove legend entries for some factors levels
                            
                                How to split Shiny app code over multiple files in RStudio? [closed]
                            
                                R - ordering in boxplot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are Xs added to data frame variable names when using read.csv?

Tags:

dataframe

r

names

read.table