Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove '.' from column names in a dataframe?

Tags:

r

My dataframe which I read from a csv file has column names like this

abc.def, ewf.asd.fkl, qqit.vsf.addw.coil

I want to remove the '.' from all the names and convert them to

abcdef, eqfasdfkl, qqitvsfaddwcoil.

I tried using the sub command sub(".","",colnames(dataframe)) but this command took out the first letter of each column name and the column names changed to

bc.def, wf.asd.fkl, qit.vsf.addw.coil

Anyone know another command to do this. I can change the column name one by one, but I have a lot of files with 30 or more columns in each file.

Again, I want to remove the "." from all the colnames. I am trying to do this so I can use "sqldf" commands, which don't deal well with "."

Thank you for your help

like image 434
Amit Singh Parihar Avatar asked Apr 26 '15 17:04

Amit Singh Parihar


People also ask

How do I remove a prefix from a column name in Python?

How do I remove a column prefix in Python? You can use the string lstrip() function or the string replace() function to remove prefix from column names.

How do I remove column names?

Select any column header, and then select Column settings > Show/hide columns. Select the column header you want to delete and select Column settings > Edit > Delete.

How do I remove suffix from columns in Pandas?

To remove suffix from column labels in Pandas DataFrame, use the str. rstrip(~) method.

How do I remove special characters from a data frame?

Add df = df. astype(float) after the replace and you've got it. I'd skip inplace and just do df = df. replace('\*', '', regex=True).


2 Answers

UPDATE dplyr 0.8.0

As of dplyr 0.8 funs() is soft deprecated, use formula notation.

a dplyr way to do this using stringr.

library(dplyr)
library(stringr)

data <- data.frame(abc.def = 1, ewf.asd.fkl = 2, qqit.vsf.addw.coil = 3)
renamed_data <- data %>%
  rename_all(~str_replace_all(.,"\\.","_")) # note we have to escape the '.' character with \\

Make sure you install the packages with install.packages().

Remember you have to escape the . character with \\. in regex, which functions like str_replace_all use, . is a wildcard.

like image 83
blakiseskream Avatar answered Sep 21 '22 17:09

blakiseskream


1) sqldf can deal with names having dots in them if you quote the names:

library(sqldf)
d0 <- read.csv(text = "A.B,C.D\n1,2")
sqldf('select "A.B", "C.D" from d0')

giving:

  A.B C.D
1   1   2

2) When reading the data using read.table or read.csv use the check.names=FALSE argument.

Compare:

Lines <- "A B,C D
1,2
3,4"
read.csv(text = Lines)
##   A.B C.D
## 1   1   2
## 2   3   4
read.csv(text = Lines, check.names = FALSE)
##   A B C D
## 1   1   2
## 2   3   4

however, in this example it still leaves a name that would have to be quoted in sqldf since the names have embedded spaces.

3) To simply remove the periods, if DF is a data frame:

names(DF) <- gsub(".", "", names(DF), fixed = TRUE)

or it might be nicer to convert the periods to underscores so that it is reversible:

names(DF) <- gsub(".", "_", names(DF), fixed = TRUE)

This last line could be alternatively done like this:

names(DF) <- chartr(".", "_", names(DF))
like image 38
G. Grothendieck Avatar answered Sep 18 '22 17:09

G. Grothendieck