I have the following data frame in R with 274569 rows and 15 columns:
> str(x2)
'data.frame': 274569 obs. of 15 variables:
$ ykod : int 99 99 99 99 99 99 99 99 99 99 ...
$ yad : Factor w/ 43 levels "BAKUGAN","BARBIE",..: 2 2 2 2 2 2 2 2 2 2 ...
$ per : Factor w/ 3 levels "2 AYLIK","3 AYLIK",..: 3 3 3 3 3 3 3 3 3 3 ...
$ donem: int 201106 201106 201106 201106 201106 201106 201106 201106 201106 201106 ...
$ sayi : int 201106 201106 201106 201106 201106 201106 201106 201106 201106 201106 ...
$ mkod : int 359 361 362 363 366 847 849 850 1505 1506 ...
$ mad : Factor w/ 11045 levels " Hilal Gida ",..: 5163 3833 10840 8284 10839 2633 10758 10293 6986 6984 ...
$ mtip : Factor w/ 30 levels "Abone Bürosu ",..: 20 20 20 20 20 2 2 2 11 11 ...
$ kanal: Factor w/ 2 levels "OB","SS": 2 2 2 2 2 2 2 2 1 1 ...
$ bkod : int 110006 110006 110006 110006 110006 110006 110006 110006 110006 110006 ...
$ bad : Factor w/ 213 levels "4. Levent","500 Evler",..: 25 25 25 25 25 25 25 25 25 25 ...
$ bolge: Factor w/ 12 levels "Adana Şehiriçi",..: 7 7 7 7 7 7 7 7 7 7 ...
$ sevk : int 5 2 2 2 10 0 4 3 13 32 ...
$ iade : int 0 2 1 2 4 0 3 2 0 8 ...
$ satis: int 5 0 1 0 6 0 1 1 13 24 ...
I create a sub-matrix and display its structure:
> msub <- x2[x2$ykod == 99,]
> str(msub)
'data.frame': 14367 obs. of 15 variables:
$ ykod : int 99 99 99 99 99 99 99 99 99 99 ...
$ yad : Factor w/ 43 levels "BAKUGAN","BARBIE",..: 2 2 2 2 2 2 2 2 2 2 ...
$ per : Factor w/ 3 levels "2 AYLIK","3 AYLIK",..: 3 3 3 3 3 3 3 3 3 3 ...
$ donem: int 201106 201106 201106 201106 201106 201106 201106 201106 201106 201106 ...
$ sayi : int 201106 201106 201106 201106 201106 201106 201106 201106 201106 201106 ...
$ mkod : int 359 361 362 363 366 847 849 850 1505 1506 ...
$ mad : Factor w/ 11045 levels " Hilal Gida ",..: 5163 3833 10840 8284 10839 2633 10758 10293 6986 6984 ...
$ mtip : Factor w/ 30 levels "Abone Bürosu ",..: 20 20 20 20 20 2 2 2 11 11 ...
$ kanal: Factor w/ 2 levels "OB","SS": 2 2 2 2 2 2 2 2 1 1 ...
$ bkod : int 110006 110006 110006 110006 110006 110006 110006 110006 110006 110006 ...
$ bad : Factor w/ 213 levels "4. Levent","500 Evler",..: 25 25 25 25 25 25 25 25 25 25 ...
$ bolge: Factor w/ 12 levels "Adana Şehiriçi",..: 7 7 7 7 7 7 7 7 7 7 ...
$ sevk : int 5 2 2 2 10 0 4 3 13 32 ...
$ iade : int 0 2 1 2 4 0 3 2 0 8 ...
$ satis: int 5 0 1 0 6 0 1 1 13 24 ...
Now I have a sub-matrix with 14367 rows and 15 columns, but the levels of factors are still there. They should have been decreased. For example, for yad
, there should be only one factor.
How can I easily make str() to show correct info for factor levels so that when I type str(msub)
it gives me correct values?
This is expected behavior. Factor levels that have no representation in your subset do not "disappear" until you tell them to. As of recently, you can use droplevels()
.
In fact str
is showing you the correct structural information: the factor has the ability to have the levels shown. Imagine concatenating two of your submatrices where one contained some of the levels and the other another set: it would be somewhat of a hassle to merge this! This is simply how factors work in R.
If you want to know which factors are 'present' in your data, one of the options is using table
to count the occurrences.
If you want your factor reduced, so it only contains the levels that are actually present, you can reapply factor to it:
myfact<-factor(rep(1:2,5), levels=1:3, labels=letters[1:3])
myfact
# [1] a b a b a b a b a b
#Levels: a b c
factor(myfact)
# [1] a b a b a b a b a b
#Levels: a b
You can simply apply this to all the factor columns of your data.frame to get what you say you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With