For example: INPUT DATAFRAME is:-
INPUT group
4000 1
4000 1
2000 2
3000 3
2000 4
2000 4
2000 4
Output :- dynamically assign to any random vector and generate output as:- First dataframe with maximum repeated group number, Second dataframe with second highest element of repeated group,and so on…
OUTPUT
1.First
INPUT group
2000 4
2000 4
2000 4
2.second
INPUT group
4000 1
4000 1
3.third
INPUT group
2000 2
4.fourth
INPUT group
3000 3
in more simple words- i want to divide dataframe into some groups and in results i want all the groups in ascending order by assigning to some variables. I tried with so far is this:
x<-setDT(df)[, group := rleid(df$INPUT)]
this will sort the group. i tried one more comand, that is:
y<-x[x$group == which.max(tabulate(x$group)), ]
but this returns only group with maximum number of repeated group element.
I am not sure whether you need all your outputs at once or not. But here is an idea that might help. I am using the dplyr package for this. So first let me recreate the dataset you provided as input:
library(dplyr)
DF <- data.frame(INPUT = c(4000,4000,2000,3000,2000,2000,2000), group = c(1,1,2,3,4,4,4))
df <- tbl_df(DF)
df
INPUT group
(dbl) (dbl)
1 4000 1
2 4000 1
3 2000 2
4 3000 3
5 2000 4
6 2000 4
7 2000 4
Now I will create an auxiliary table which will tell me how many rows I have of each group, this table will be already ordered from max to min:
aux <- df %>% group_by(group) %>% summarise(n = n()) %>% arrange(-n)
aux
group n
(dbl) (int)
1 4 3
2 1 2
3 2 1
4 3 1
So we see that group 4 appears 3 times, group 1 appears twice and so on and so forth. Now I can easily "extract" the groups I want from max to min:
ymax <- df %>% filter(group == aux$group[1])
y2 <- df %>% filter(group == aux$group[2])
y3 <- df %>% filter(group == aux$group[3])
ymin <- df %>% filter(group == aux$group[4])
ymax
INPUT group
(dbl) (dbl)
1 2000 4
2 2000 4
3 2000 4
y2
INPUT group
(dbl) (dbl)
1 4000 1
2 4000 1
y3
INPUT group
(dbl) (dbl)
1 2000 2
ymin
INPUT group
(dbl) (dbl)
1 3000 3
I hope this helps.
I just want to add that you can get all of them at once of course:
ylist <- lapply(1:nrow(aux), function(x) {filter(df, group == aux$group[x])})
[[1]]
Source: local data frame [3 x 2]
INPUT group
(dbl) (dbl)
1 2000 4
2 2000 4
3 2000 4
[[2]]
Source: local data frame [2 x 2]
INPUT group
(dbl) (dbl)
1 4000 1
2 4000 1
[[3]]
Source: local data frame [1 x 2]
INPUT group
(dbl) (dbl)
1 2000 2
[[4]]
Source: local data frame [1 x 2]
INPUT group
(dbl) (dbl)
1 3000 3
In Python Pandas you could do the following:
create the DF:
import pandas as pd
df = pd.DataFrame()
df['INPUT'] = [4000,4000,2000,3000,2000,2000,2000]
df['group'] = [1,1,2,3,4,4,4]
Group by group and get the size of each group, add this size to DF as a column and sort by it in ascending order:
df = df.merge(pd.DataFrame(df.groupby('group').size()).reset_index()).sort_values(0,ascending=False)
Then, loop through the DF to get the part you need each time:
for i,x in enumerate(df['group'].unique()):
print 'ouput',i
print df[df['group']==x].ix[:,:-1].reset_index(drop=True)
print
This give you the following:
ouput 0
INPUT group
0 2000 4
1 2000 4
2 2000 4
ouput 1
INPUT group
0 4000 1
1 4000 1
ouput 2
INPUT group
0 2000 2
ouput 3
INPUT group
0 3000 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With