I have big dataset (but the following is small one for example). I can split the dataframe and then I want to output to multiple text file corresponding to lavel used to split.
mydata <- data.frame (var1 = rep(c("k", "l", "c"), each = 5), var2 = rnorm(5),
var3 = rnorm(5))
mydata
var1 var2 var3
1 k 0.5406022 0.3654706
2 k -0.6356879 -0.9160001
3 k 0.2946240 -0.1072241
4 k -0.2609121 0.1036626
5 k 0.6206579 0.6111655
6 l 0.5406022 0.3654706
7 l -0.6356879 -0.9160001
8 l 0.2946240 -0.1072241
9 l -0.2609121 0.1036626
10 l 0.6206579 0.6111655
11 c 0.5406022 0.3654706
12 c -0.6356879 -0.9160001
13 c 0.2946240 -0.1072241
14 c -0.2609121 0.1036626
15 c 0.6206579 0.6111655
Now split
> spt1 <- split(mydata, mydata$var1)
> spt1
$c
var1 var2 var3
11 c 0.5406022 0.3654706
12 c -0.6356879 -0.9160001
13 c 0.2946240 -0.1072241
14 c -0.2609121 0.1036626
15 c 0.6206579 0.6111655
$k
var1 var2 var3
1 k 0.5406022 0.3654706
2 k -0.6356879 -0.9160001
3 k 0.2946240 -0.1072241
4 k -0.2609121 0.1036626
5 k 0.6206579 0.6111655
$l
var1 var2 var3
6 l 0.5406022 0.3654706
7 l -0.6356879 -0.9160001
8 l 0.2946240 -0.1072241
9 l -0.2609121 0.1036626
10 l 0.6206579 0.6111655
I want to write.table in name of outputc
, outputk
, and outputl
. Thus output is common prefix followed by name of label for grouping variable.
write.table (spt1)
You can use the groupby command, if you already have some labels for your data. Show activity on this post. it converts a DataFrame to multiple DataFrames, by selecting each unique value in the given column and putting all those entries into a separate DataFrame.
Using the iloc() function to split DataFrame in Python We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns. Using this function, we can split a DataFrame based on rows or columns.
DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/). The div() method provides the fill_value parameter which is used for replacing the np.
Using lapply over the names of spt1 will allow us to access the dataframes in spt1 and the name that we can use in paste to create our files.
lapply(names(spt1), function(x){write.table(spt1[[x]], file = paste("output", x, sep = ""))})
You could add a common extension in the paste if you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With