Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an equivalent R function to Stata 'order' command?

Tags:

r

stata

'order' in R seems like 'sort' in Stata. Here's a dataset for example (only variable names listed):

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18

and here's the output I expect:

v1 v2 v3 v4 v5 v7 v8 v9 v10 v11 v12 v17 v18 v13 v14 v15 v6 v16

In R, I have 2 ways:

data <- data[,c(1:5,7:12,17:18,13:15,6,16)]

OR

names <- c("v1", "v2", "v3", "v4", "v5", "v7", "v8", "v9", "v10", "v11", "v12",  "v17", "v18", "v13", "v14", "v15", "v6", "v16")
data <- data[names]

To get the same output in Stata, I may run 2 lines:

order v17 v18, before(v13)
order v6 v16, last

In the ideal data above, we can know the positions of the variables we want to deal with. But in most real cases, we have variables like 'age' 'gender' with no position indicators and we may have more than 50 variables in one dataset. Then the advantage of 'order' in Stata could be more obvious. We don't need to know the exact place of the variable and just type its name:

order age, after(gender)

Is there a base function in R to deal with this issue or could I get a package? Thanks in advance.

tweetinfo <- data.frame(uid=1:50, mid=2:51, annotations=3:52, bmiddle_pic=4:53, created_at=5:54, favorited=6:55, geo=7:56, in_reply_to_screen_name=8:57, in_reply_to_status_id=9:58, in_reply_to_user_id=10:59, original_pic=11:60, reTweetId=12:61, reUserId=13:62, source=14:63, thumbnail_pic=15:64, truncated=16:65)
noretweetinfo <- data.frame(uid=21:50, mid=22:51, annotations=23:52, bmiddle_pic=24:53, created_at=25:54, favorited=26:55, geo=27:56, in_reply_to_screen_name=28:57, in_reply_to_status_id=29:58, in_reply_to_user_id=30:59, original_pic=31:60, reTweetId=32:61, reUserId=33:62, source=34:63, thumbnail_pic=35:64, truncated=36:65)
retweetinfo <- data.frame(uid=41:50, mid=42:51, annotations=43:52, bmiddle_pic=44:53, created_at=45:54, deleted=46:55, favorited=47:56, geo=48:57, in_reply_to_screen_name=49:58, in_reply_to_status_id=50:59, in_reply_to_user_id=51:60, original_pic=52:61, source=53:62, thumbnail_pic=54:63, truncated=55:64)
tweetinfo$type <- "ti"
noretweetinfo$type <- "nr"
retweetinfo$type <- "rt"
gtinfo <- rbind(tweetinfo, noretweetinfo)
gtinfo$deleted=""
gtinfo <- gtinfo[,c(1:16,18,17)]
retweetinfo <- transform(retweetinfo, reTweetId="", reUserId="")
retweetinfo <- retweetinfo[,c(1:5,7:12,17:18,13:15,6,16)]
gtinfo <- rbind(gtinfo, retweetinfo)
write.table(gtinfo, file="C:/gtinfo.txt", row.names=F, col.names=T, sep="\t", quote=F)
# rm(list=ls(all=T))
like image 520
leoce Avatar asked Sep 22 '12 14:09

leoce


People also ask

Can you use R in Stata?

Running R from StataThe trick to running R from within your do-file is first to save the data you want to pass to R, then call the . R file with the commands you want to run in R (the “R script”), then—if necessary—reload the R output into Stata.

What does order do in Stata?

order relocates varlist to a position depending on which option you specify. If no option is specified, order relocates varlist to the beginning of the dataset in the order in which the variables are specified.

Is R similar to Stata?

R and Stata are very different with regard to storing, accessing, and modifying data and variables. Stata only loads one data set at a time. R stores all object including data as objects and can have an arbitrary number loaded at once. Below I create an interaction term between x and z and a discrete version of x .

What does Varlist mean in Stata?

In this diagram, varlist denotes a list of variable names, command denotes a Stata command, exp denotes an algebraic expression, range denotes an observation range, weight denotes a weighting expression, and options denotes a list of options.


1 Answers

Because I'm procrastinating and experimenting with different things, here's a function that I whipped up. Ultimately, it depends on append:

moveme <- function(invec, movecommand) {
  movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]], ",|\\s+"), 
                        function(x) x[x != ""])
  movelist <- lapply(movecommand, function(x) {
    Where <- x[which(x %in% c("before", "after", "first", "last")):length(x)]
    ToMove <- setdiff(x, Where)
    list(ToMove, Where)
  })
  myVec <- invec
  for (i in seq_along(movelist)) {
    temp <- setdiff(myVec, movelist[[i]][[1]])
    A <- movelist[[i]][[2]][1]
    if (A %in% c("before", "after")) {
      ba <- movelist[[i]][[2]][2]
      if (A == "before") {
        after <- match(ba, temp)-1
      } else if (A == "after") {
        after <- match(ba, temp)
      }    
    } else if (A == "first") {
      after <- 0
    } else if (A == "last") {
      after <- length(myVec)
    }
    myVec <- append(temp, values = movelist[[i]][[1]], after = after)
  }
  myVec
}

Here's some sample data representing the names of your dataset:

x <- paste0("v", 1:18)

Imagine now that we wanted "v17" and "v18" before "v3", "v6" and "v16" at the end, and "v5" at the beginning:

moveme(x, "v17, v18 before v3; v6, v16 last; v5 first")
#  [1] "v5"  "v1"  "v2"  "v17" "v18" "v3"  "v4"  "v7"  "v8"  "v9"  "v10" "v11" "v12"
# [14] "v13" "v14" "v15" "v6"  "v16"

So, the obvious usage would be, for a data.frame named "df":

df[moveme(names(df), "how you want to move the columns")]

And, for a data.table named "DT" (which, as @mnel points out, would be more memory efficient):

setcolorder(DT, moveme(names(DT), "how you want to move the columns"))

Note that compound moves are specified by semicolons.

The recognized moves are:

  • before (move the specified columns to before another named column)
  • after (move the specified columns to after another named column)
  • first (move the specified columns to the first position)
  • last (move the specified columns to the last position)
like image 135
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 15 '22 07:11

A5C1D2H2I1M1N2O1R2T1