I have a df of numbers and am doing some ordering. The output is placing 7 next to 70 as if 7 is 70. Why is this happening. The pasted stuff below is the actual output. Notice how 263 is treated smaller than 27 as if there is a 0 behind the 7 in 27. 4 is after 38 as if 4 means 40. I'm using the order().
feat_1 25
feat_10 26
feat_24 263
feat_48 27
feat_55 27
feat_75 36
feat_16 37
feat_53 38
feat_89 38
feat_28 4
This is happening because you are sorting characters instead of number. It is a common problem, although not a visible one. For starters, it's easy to use order
to sort a data.frame
, that's what I'll be using to demonstrate a solution in my test case.
You should try this:
col1 <- c('a', 'b', 'c')
col2 <- c("25", "42" ,"4")
df <- data.frame(col1, col2)
## This is the wrong approach:
df[order(df$col2),]
col1 col2
1 a 25
3 c 4
2 b 42
## This is the right approach, conver the second vector to numeric vector:
df$col2 <- as.numeric(as.character(df$col2))
df[order(df$col2),]
col1 col2
3 c 4
1 a 25
2 b 42
You could also use mixedsort
or mixedorder
from the gtools
package (for a fast alternative) and there is no need to convert the column to numeric because it deals with either character numbers or alphanumeric strings:
Data
df <- read.table(text='feat_1 25
feat_10 "26"
feat_24 "263"
feat_48 "27"
feat_55 "27"
feat_75 "36"
feat_16 "37"
feat_53 "38"
feat_89 "38"
feat_28 "4"')
Solution
library(gtools)
#you use mixedorder in exactly the same way as base order
> df[mixedorder(df$V2),]
V1 V2
10 feat_28 4
1 feat_1 25
2 feat_10 26
4 feat_48 27
5 feat_55 27
6 feat_75 36
7 feat_16 37
8 feat_53 38
9 feat_89 38
3 feat_24 263
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With