Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid that space in column name is replaced with period (".") when using read.csv()

Tags:

r

names

read.csv

I am using R to do some data pre-processing, and here is the problem that I am faced with: I input the data using read.csv(filename,header=TRUE), and then the space in variable names became ".", for example, a variable named Full Code became Full.Code in the generated dataframe. After the processing, I use write.xlsx(filename) to export the results, while the variable names are changed. How to address this problem?

Besides, in the output .xlsx file, the first column become indices(i.e., 1 to N), which is not what I am expecting.

like image 756
zeno tsang Avatar asked Jun 17 '13 16:06

zeno tsang


3 Answers

If your set check.names=FALSE in read.csv when you read the data in then the names will not be changed and you will not need to edit them before writing the data back out. This of course means that you would need quote the column names (back quotes in some cases) or refer to the columns by location rather than name while editing.

like image 93
Greg Snow Avatar answered Oct 23 '22 10:10

Greg Snow


To get spaces back in the names, do this (right before you export - R does let you have spaces in variable names, but it's a pain):

# A simple regular expression to replace dots with spaces
# This might have unintended consequences, so be sure to check the results
names(yourdata) <- gsub(x = names(yourdata),
                        pattern = "\\.",
                        replacement = " ")

To drop the first-column index, just add row.names = FALSE to your write.xlsx(). That's a common argument for functions that write out data in tabular format (write.csv() has it, too).

like image 36
Matt Parker Avatar answered Oct 23 '22 09:10

Matt Parker


Here's a function (sorry, I know it could be refactored) that makes nice column names even if there are multiple consecutive dots and trailing dots:

makeColNamesUserFriendly <- function(ds) {
  # FIXME: Repetitive.

  # Convert any number of consecutive dots to a single space.
  names(ds) <- gsub(x = names(ds),
                    pattern = "(\\.)+",
                    replacement = " ")

  # Drop the trailing spaces.
  names(ds) <- gsub(x = names(ds),
                    pattern = "( )+$",
                    replacement = "")
  ds
}

Example usage:

ds <- makeColNamesUserFriendly(ds)
like image 4
Marcin Bilski Avatar answered Oct 23 '22 09:10

Marcin Bilski