sorting of dataframe based on condition and return all the group by assigning to random vector dynamically

Question

For example: INPUT DATAFRAME is:-

INPUT     group
4000       1
4000       1
2000       2
3000       3
2000       4
2000       4
2000       4

Output :- dynamically assign to any random vector and generate output as:- First dataframe with maximum repeated group number, Second dataframe with second highest element of repeated group,and so on…

OUTPUT
1.First

INPUT     group
2000        4
2000        4
2000        4

2.second

INPUT        group
4000        1
4000        1

3.third

INPUT        group
2000        2

4.fourth

INPUT        group
3000        3

in more simple words- i want to divide dataframe into some groups and in results i want all the groups in ascending order by assigning to some variables. I tried with so far is this:

x<-setDT(df)[, group := rleid(df$INPUT)]

this will sort the group. i tried one more comand, that is:

y<-x[x$group == which.max(tabulate(x$group)), ]

but this returns only group with maximum number of repeated group element.

Tal J. Levy · Accepted Answer

I am not sure whether you need all your outputs at once or not. But here is an idea that might help. I am using the dplyr package for this. So first let me recreate the dataset you provided as input:

library(dplyr)
DF <- data.frame(INPUT = c(4000,4000,2000,3000,2000,2000,2000), group = c(1,1,2,3,4,4,4))
df <- tbl_df(DF)
df

output

  INPUT group
  (dbl) (dbl)
1  4000     1
2  4000     1
3  2000     2
4  3000     3
5  2000     4
6  2000     4
7  2000     4

Now I will create an auxiliary table which will tell me how many rows I have of each group, this table will be already ordered from max to min:

aux <- df %>% group_by(group) %>% summarise(n = n()) %>% arrange(-n)
aux

output

  group     n
  (dbl) (int)
1     4     3
2     1     2
3     2     1
4     3     1

So we see that group 4 appears 3 times, group 1 appears twice and so on and so forth. Now I can easily "extract" the groups I want from max to min:

ymax <- df %>% filter(group == aux$group[1])
y2 <- df %>% filter(group == aux$group[2])
y3 <- df %>% filter(group == aux$group[3])
ymin <- df %>% filter(group == aux$group[4])

output

ymax
  INPUT group
  (dbl) (dbl)
1  2000     4
2  2000     4  
3  2000     4  

y2
  INPUT group
  (dbl) (dbl)
1  4000     1
2  4000     1  

y3
  INPUT group
  (dbl) (dbl)
1  2000     2  

ymin
  INPUT group
  (dbl) (dbl)
1  3000     3

I hope this helps.
I just want to add that you can get all of them at once of course:

ylist <- lapply(1:nrow(aux), function(x) {filter(df, group == aux$group[x])})

output

[[1]]
Source: local data frame [3 x 2]

  INPUT group
  (dbl) (dbl)
1  2000     4
2  2000     4
3  2000     4

[[2]]
Source: local data frame [2 x 2]

  INPUT group
  (dbl) (dbl)
1  4000     1
2  4000     1

[[3]]
Source: local data frame [1 x 2]

  INPUT group
  (dbl) (dbl)
1  2000     2

[[4]]
Source: local data frame [1 x 2]

  INPUT group
  (dbl) (dbl)
1  3000     3

Ezer K · Answer

In Python Pandas you could do the following:

create the DF:

import pandas as pd
df = pd.DataFrame()
df['INPUT'] = [4000,4000,2000,3000,2000,2000,2000]
df['group'] = [1,1,2,3,4,4,4]

Group by group and get the size of each group, add this size to DF as a column and sort by it in ascending order:

df = df.merge(pd.DataFrame(df.groupby('group').size()).reset_index()).sort_values(0,ascending=False)

Then, loop through the DF to get the part you need each time:

for i,x in enumerate(df['group'].unique()):
print 'ouput',i
print df[df['group']==x].ix[:,:-1].reset_index(drop=True)
print

This give you the following:

    ouput 0
    INPUT  group
0   2000      4
1   2000      4
2   2000      4

    ouput 1
    INPUT  group
0   4000      1
1   4000      1

    ouput 2
    INPUT  group
0   2000      2

    ouput 3
    INPUT  group
0   3000      3

sorting of dataframe based on condition and return all the group by assigning to random vector dynamically

Tags:

python

dataframe

r

azad

2 Answers

output

output

output

output

Tal J. Levy

Ezer K

Recent Activity

Donate For Us

sorting of dataframe based on condition and return all the group by assigning to random vector dynamically

Tags:

python

dataframe

r

azad

2 Answers

output

output

output

output

Tal J. Levy

Ezer K

Related questions

Recent Activity

Donate For Us