As usual, I got some SPSS file that I've imported into R with spss.get
function from Hmisc
package. I'm bothered with labelled
class that Hmisc::spss.get
adds to all variables in data.frame
, hence want to remove it.
labelled
class gives me headaches when I try to run ggplot
or even when I want to do some menial analysis! One solution would be to remove labelled
class from each variable in data.frame
. How can I do that? Is that possible at all? If not, what are my other options?
I really want to bypass reediting variables "from scratch" with as.data.frame(lapply(x, as.numeric))
and as.character
where applicable... And I certainly don't want to run SPSS and remove labels manually (don't like SPSS, nor care to install it)!
Thanks!
How do I have SPSS show the variable names instead of the variable labels in the list of variables? Edit Options… Under the “General” tab, in the upper left corner, click on the radio button to change from showing the variable labels to showing the variable names.
There's two main options. 2) Read the xls file into SPSS => declare a long empty string variable => CONCAT lines of ADD VALUE LABELS commands into this string variable => save this variable as text with ". sps" as file extension instead of ". txt" => use INSERT FILE for running the syntax.
Here's how I get rid of the labels altogether. Similar to Jyotirmoy's solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell)
clear.labels <- function(x) {
if(is.list(x)) {
for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled')
for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
}
else {
class(x) <- setdiff(class(x), "labelled")
attr(x, "label") <- NULL
}
return(x)
}
Use as follows:
my.unlabelled.df <- clear.labels(my.labelled.df)
EDIT
Here's a bit of a cleaner version of the function, same results:
clear.labels <- function(x) {
if(is.list(x)) {
for(i in seq_along(x)) {
class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled')
attr(x[[i]],"label") <- NULL
}
} else {
class(x) <- setdiff(class(x), "labelled")
attr(x, "label") <- NULL
}
return(x)
}
A belated note/warning regarding class membership in R objects. The correct method for identification of "labelled" is not to test for with an is
function or equality {==
) but rather with inherits
. Methods that test for a specific location will not pick up cases where the order of existing classes are not the ones assumed.
You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE.
w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate'))
The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been:
class(x[[i]]) <- NULL # no error from assignment of empty vector
The error you report can be reproduced with this code:
> b <- 4:6
> label(b) <- 'B Label'
> str(b)
Class 'labelled' atomic [1:3] 4 5 6
..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] :
invalid replacement object to be a class string
You can try out the read.spss
function from the foreign
package.
A rough and ready way to get rid of the labelled
class created by spss.get
for (i in 1:ncol(x)) {
z<-class(x[[i]])
if (z[[1]]=='labelled'){
class(x[[i]])<-z[-1]
attr(x[[i]],'label')<-NULL
}
}
But can you please give an example where labelled
causes problems?
If I have a variable MAED
in a data frame x
created by spss.get
, I have:
> class(x$MAED)
[1] "labelled" "factor"
> is.factor(x$MAED)
[1] TRUE
So well-written code that expects a factor (say) should not have any problems.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With