As usual, I got some SPSS file that I've imported into R with <code>spss.get</code> function from <code>Hmisc</code> package. I'm bothered with <code>labelled</code> class that <code>Hmisc::spss.get</code> adds to all variables in <code>data.frame</code>, hence want to remove it. <code>labelled</code> class gives me headaches when I try to run <code>ggplot</code> or even when I want to do some menial analysis! One solution would be to remove <code>labelled</code> class from each variable in <code>data.frame</code>. How can I do that? Is that possible at all? If not, what are my other options? I really want to bypass reediting variables "from scratch" with <code>as.data.frame(lapply(x, as.numeric))</code> and <code>as.character</code> where applicable... And I certainly don't want to run SPSS and remove labels manually (don't like SPSS, nor care to install it)! Thanks!

Here's how I get rid of the labels altogether. Similar to Jyotirmoy's solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell) <pre class="prettyprint"><code>clear.labels <- function(x) { if(is.list(x)) { for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL } else { class(x) <- setdiff(class(x), "labelled") attr(x, "label") <- NULL } return(x) } </code></pre> Use as follows: <pre class="prettyprint"><code>my.unlabelled.df <- clear.labels(my.labelled.df) </code></pre> EDIT Here's a bit of a cleaner version of the function, same results: <pre class="prettyprint"><code>clear.labels <- function(x) { if(is.list(x)) { for(i in seq_along(x)) { class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') attr(x[[i]],"label") <- NULL } } else { class(x) <- setdiff(class(x), "labelled") attr(x, "label") <- NULL } return(x) } </code></pre>

A belated note/warning regarding class membership in R objects. The correct method for identification of "labelled" is not to test for with an <code>is</code> function or equality {<code>==</code>) but rather with <code>inherits</code>. Methods that test for a specific location will not pick up cases where the order of existing classes are not the ones assumed. You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE. <pre class="prettyprint"><code>w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate')) </code></pre> The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been: <pre class="prettyprint"><code>class(x[[i]]) <- NULL # no error from assignment of empty vector </code></pre> The error you report can be reproduced with this code: <pre class="prettyprint"><code>> b <- 4:6 > label(b) <- 'B Label' > str(b) Class 'labelled' atomic [1:3] 4 5 6 ..- attr(*, "label")= chr "B Label" > class(b) <- class(b)[-1] Error in class(b) <- class(b)[-1] : invalid replacement object to be a class string </code></pre>

You can try out the <code>read.spss</code> function from the <code>foreign</code> package. A rough and ready way to get rid of the <code>labelled</code> class created by <code>spss.get</code> <pre class="prettyprint"><code>for (i in 1:ncol(x)) { z<-class(x[[i]]) if (z[[1]]=='labelled'){ class(x[[i]])<-z[-1] attr(x[[i]],'label')<-NULL } } </code></pre> But can you please give an example where <code>labelled</code> causes problems? If I have a variable <code>MAED</code> in a data frame <code>x</code> created by <code>spss.get</code>, I have: <pre class="prettyprint"><code>> class(x$MAED) [1] "labelled" "factor" > is.factor(x$MAED) [1] TRUE </code></pre> So well-written code that expects a factor (say) should not have any problems.

Remove variable labels attached with foreign/Hmisc SPSS import functions

Tags:

import

class

r

label

spss

As usual, I got some SPSS file that I've imported into R with spss.get function from Hmisc package. I'm bothered with labelled class that Hmisc::spss.get adds to all variables in data.frame, hence want to remove it.

labelled class gives me headaches when I try to run ggplot or even when I want to do some menial analysis! One solution would be to remove labelled class from each variable in data.frame. How can I do that? Is that possible at all? If not, what are my other options?

I really want to bypass reediting variables "from scratch" with as.data.frame(lapply(x, as.numeric)) and as.character where applicable... And I certainly don't want to run SPSS and remove labels manually (don't like SPSS, nor care to install it)!

Thanks!

292

asked Mar 07 '10 02:03

aL3xa

3 Answers

Here's how I get rid of the labels altogether. Similar to Jyotirmoy's solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell)

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
    for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
  }
  else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

Use as follows:

my.unlabelled.df <- clear.labels(my.labelled.df)

EDIT

Here's a bit of a cleaner version of the function, same results:

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in seq_along(x)) {
      class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
      attr(x[[i]],"label") <- NULL
    } 
  } else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

answered Oct 14 '22 10:10

Dominic Comtois

A belated note/warning regarding class membership in R objects. The correct method for identification of "labelled" is not to test for with an is function or equality {==) but rather with inherits. Methods that test for a specific location will not pick up cases where the order of existing classes are not the ones assumed.

You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE.

w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate'))

The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been:

class(x[[i]]) <- NULL  # no error from assignment of empty vector

The error you report can be reproduced with this code:

> b <- 4:6
> label(b) <- 'B Label'
> str(b)
Class 'labelled'  atomic [1:3] 4 5 6
  ..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] : 
  invalid replacement object to be a class string

answered Oct 14 '22 11:10

IRTFM

You can try out the read.spss function from the foreign package.

A rough and ready way to get rid of the labelled class created by spss.get

for (i in 1:ncol(x)) {
    z<-class(x[[i]])
    if (z[[1]]=='labelled'){
       class(x[[i]])<-z[-1]
       attr(x[[i]],'label')<-NULL
    }
}

But can you please give an example where labelled causes problems?

If I have a variable MAED in a data frame x created by spss.get, I have:

> class(x$MAED)
[1] "labelled" "factor"  
> is.factor(x$MAED)
[1] TRUE

So well-written code that expects a factor (say) should not have any problems.

answered Oct 14 '22 12:10

Jyotirmoy Bhattacharya

Related questions
                            
                                Haml - how to put ruby variable into the name of class identifier
                            
                                Java: How to limit access of a method to a specific class?
                            
                                Attribute created in one method doesn't exist in other method
                            
                                static const member variable initialization
                            
                                Pool within a Class in Python
                            
                                Python classes: method has same name as property
                            
                                Why is a method of a Python class declared without "self" and without decorators not raising an exception?
                            
                                Private class with Public method?
                            
                                how to find all methods called in a method?
                            
                                Whats the point of accessing private variables through getter and setter (accessor) functions?
                            
                                Cast an object to class type passed as parameter
                            
                                how to resolve 'this is not defined' when extending EventEmitter? [duplicate]
                            
                                Why does Write-Output not work inside a PowerShell class method?
                            
                                Get number of parameters a function requires
                            
                                How to find the package name given a class name?
                            
                                Sharing static variables across files: namespace vs class
                            
                                Combining & and * operators
                            
                                Java converting from Object to Subclass
                            
                                Dictionary with class as Key
                            
                                Java - Method accessibility inside package-private class?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With