I'm new to R. I have a data frame with column names of such type:
file_001 file_002 block_001 block_002 red_001 red_002 ....etc'
0.05 0.2 0.4 0.006 0.05 0.3
0.01 0.87 0.56 0.4 0.12 0.06
I want to split them into groups by the column name, to get a result like this:
group_file
file_001 file_002
0.05 0.2
0.01 0.87
group_block
block_001 block_002
0.4 0.006
0.56 0.4
group_red
red_001 red_002
0.05 0.3
0.12 0.06
...etc'
My file is huge. I don't have a certain number of groups. It needs to be just by the column name's start.
In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.
Use underscore as delimiter to split the column into two columns. # Adding two new columns to the existing dataframe. # splitting is done on the basis of underscore.
In base R, you can use sub
and split.default
like this to return a list of data.frames:
myDfList <- split.default(dat, sub("_\\d+", "", names(dat)))
this returns
myDfList
$block
block_001 block_002
1 0.40 0.006
2 0.56 0.400
$file
file_001 file_002
1 0.05 0.20
2 0.01 0.87
$red
red_001 red_002
1 0.05 0.30
2 0.12 0.06
split.default
will split data.frames by variable according to its second argument. Here, we use sub
and the regular expression "_\d+" to remove the underscore and all numeric values following it in order to return the splitting values "block", "file", and "red".
As a side note, it is typically a good idea to keep these data.frames in a list and work with them through functions like lapply
. See gregor's answer to this post for some motivating examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With