I'm new to R. I have a data frame with column names of such type:
file_001 file_002 block_001 block_002 red_001 red_002 ....etc'
0.05 0.2 0.4 0.006 0.05 0.3
0.01 0.87 0.56 0.4 0.12 0.06
I want to split them into groups by the column name, to get a result like this:
group_file
file_001 file_002
0.05 0.2
0.01 0.87
group_block
block_001 block_002
0.4 0.006
0.56 0.4
group_red
red_001 red_002
0.05 0.3
0.12 0.06
...etc'
My file is huge. I don't have a certain number of groups. It needs to be just by the column name's start.
In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.
Use underscore as delimiter to split the column into two columns. # Adding two new columns to the existing dataframe. # splitting is done on the basis of underscore.
In base R, you can use sub and split.default like this to return a list of data.frames:
myDfList <- split.default(dat, sub("_\\d+", "", names(dat)))
this returns
myDfList
$block
block_001 block_002
1 0.40 0.006
2 0.56 0.400
$file
file_001 file_002
1 0.05 0.20
2 0.01 0.87
$red
red_001 red_002
1 0.05 0.30
2 0.12 0.06
split.default will split data.frames by variable according to its second argument. Here, we use sub and the regular expression "_\d+" to remove the underscore and all numeric values following it in order to return the splitting values "block", "file", and "red".
As a side note, it is typically a good idea to keep these data.frames in a list and work with them through functions like lapply. See gregor's answer to this post for some motivating examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With