Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: how to expand a row containing a "list" to several rows...one for each list member?

I am sure there is a simple solution to this, but i am going nuts trying to find it. Any help is very much appreciated.

I have a data frame with 2 columns; "pro" and "pep". pro is formatted as factors and contains entries in the form 220;300;4 sometimes more numbers (seperated by ";") and sometimes just a single number (and no ";"). The pep column is formatted as integers and contains single numbers, e.g. 20. What i would like to do is to "expand" e.g. the row pro: 220;300;4 and pep: 20 to three rows one with pro: 220 and pep: 20, one with pro: 300 and pep: 20 and one with pro: 4 and pep: 20.

I want to do this for the whole data frame and thus end up with a data frame with two character formatted columns where all the rows originally containing multiple ";" seperated numbers have been expanded.

I would prefer to avoid loops since the data frame is fairly large (>100000 rows)

I am sorry that i havent been able to post this in a more case-representative way...i am new here and got lost in the code format.

On a much appreciated request from simon:

    > dput( head( dat , 10 ) )
structure(list(Protein.Group.IDs = structure(c(1095L, 60L, 299L, 
242L, 1091L, 147L, 161L, 884L, 783L, 1040L), .Label = c("0", 
"1", "10", "100", "101", "102", "103", "104", "105", "106", "107", 
"108", "109", "11", "110", "111", "112", "113", "114", "114;680", 
"115", "116", "117", "118", "119", "12", "120", "121", "121;920;530", 
"121;920;530;589", "121;920;530;589;934", "121;920;589", "121;920;934", 
"122;351", "122;351;950", "122;351;950;224;904", "122;351;950;687", 
"122;901;224;904", "122;901;351", "122;901;351;950", "122;901;351;950;224", 
"122;901;351;950;224;890;904", "122;901;351;950;224;890;904;687", 
"122;901;351;950;890;687", "122;901;950", "122;901;950;904;687", 
"122;950", "123", "124", "125", "126", "127", "127;952", "128", 
"129", "13", "130", "131", "131;204", "132", "133", "134", "135", 
"136", "137", "138", "139", "14", "140", "140;259;436", "141", 
"142", "143", "144", "145", "146", "147", "148", "149", "15", 
"150", "151", "152", "153", "154", "155", "156", "157", "158", 
"159", "16", "16;331", "16;331;329", "16;331;329;62", "16;331;329;910", 
"16;331;329;910;62", "16;331;62", "16;331;910", "160", "161", 
"162", "163", "164", "165", "166", "166;743", "167", "167;595", 
"168", "169", "17", "170", "170;48", "171", "172", "173", "174", 
"175", "176", "177", "178", "179", "18", "180", "181", "182", 
"183", "184", "185", "186", "187", "188", "188;813", "188;813;852", 
"189", "19", "19;14", "19;6;9;14;11", "19;884;6;9;14;20;26;11;1", 
"19;9", "19;9;14", "190", "190;260", "191", "192", "193", "194", 
"195", "196", "197", "198", "199", "2", "20", "20;26", "200", 
"201", "202", "203", "204", "205", "206", "207", "208", "209", 
"21", "21;4", "210", "211", "212", "213", "214", "215", "216", 
"217", "218", "219", "22", "220", "221", "222", "223", "224", 
"224;890", "224;890;904", "225", "225;221", "225;221;308", "225;295", 
"226", "227", "228", "228;396", "228;396;73", "228;73", "229", 
"23", "23;137", "23;17;137", "230", "231", "232", "233", "234", 
"235", "236", "237", "238", "239", "24", "240", "241", "242", 
"242;171", "243", "244", "245", "246", "247", "248", "249", "25", 
"250", "251", "252", "253", "254", "255", "256", "257", "258", 
"259", "26", "260", "261", "262", "263", "264", "265", "266", 
"267", "268", "269", "27", "270", "271", "272", "273", "273;541;905", 
"273;905", "274", "275", "276", "277", "278", "279", "28", "280", 
"281", "281;192", "282", "283", "284", "285", "286", "287", "288", 
"289", "29", "290", "291", "292", "293", "294", "295", "296", 
"297", "298", "299", "3", "30", "300", "301", "302", "303", "304", 
"304;770", "305", "306", "307", "308", "309", "31", "310", "311", 
"312", "313;293", "314", "314;658", "315", "316", "317", "318", 
"319", "32", "320", "321", "322", "323", "324", "324;34;564;637;282;229;565", 
"324;564;282", "324;637;229;565", "325", "326", "327", "328", 
"328;586", "329", "33", "330", "331", "332", "333", "334", "335", 
"336", "337", "338", "339", "34", "340", "341", "342", "343", 
"344", "345", "346", "346;523", "347", "348", "349", "35", "350", 
"351", "351;890", "352", "353", "353;277", "354", "355", "356", 
"357", "358", "359", "36", "360", "361", "362", "363", "364", 
"365", "366", "367", "368", "369", "37", "370", "371", "372", 
"373", "374", "375", "376", "377", "377;938", "378", "379", "38", 
"380", "381", "382", "382;147", "383", "384", "385", "386", "387", 
"388", "389", "39", "39;417", "390", "391", "392", "393", "394", 
"395", "396", "397", "398", "399", "399;955", "4", "40", "400", 
"401", "402", "403", "404", "405", "406", "407", "408", "409", 
"41", "410", "411", "412", "413", "414", "415", "416", "417", 
"418", "419", "42", "420", "421", "422", "423", "424", "424;640", 
"425", "426", "427", "427;930", "428", "429", "43", "430", "431", 
"432", "433", "434", "435", "436", "437", "438", "438;178", "439", 
"44", "440", "441", "442", "443", "444", "445", "446", "447", 
"448", "449", "45", "450", "451", "452", "453", "454", "455", 
"456", "457", "458", "459", "46", "460", "461", "462", "463", 
"464", "465", "466", "467", "468", "469", "47", "470", "471", 
"472", "473", "474", "475", "476", "477", "478", "479", "48", 
"480", "481", "482", "483", "484", "485", "486", "487", "488", 
"488;648", "489", "49", "490", "491", "492", "493", "494", "495", 
"496", "497", "498", "499", "5", "50", "500", "501", "502", "503", 
"504", "505", "506", "507", "508", "509", "51", "510", "511", 
"512", "513", "514", "515", "516", "516;603;845", "516;603;845;837", 
"517", "518", "519", "52", "520", "521", "522", "523", "524", 
"525", "526", "527", "527;509", "528", "529", "53", "530", "531", 
"532", "533", "534", "535", "536", "537", "538", "539", "54", 
"540", "540;67", "541", "542", "543", "544", "545", "546", "547", 
"548", "549", "55", "550", "550;549", "551", "552", "553", "554", 
"555", "556", "557", "558", "559", "56", "560", "561", "562", 
"563", "564", "564;282", "564;637", "565", "566", "567", "568", 
"568;569", "568;569;286", "568;569;574", "568;569;574;286", "568;574", 
"569", "57", "570", "571", "572", "573", "574", "575", "576", 
"577", "578", "579", "579;577;578", "579;577;580", "579;577;580;578", 
"58", "580", "581", "582", "583", "584", "585", "585;609", "586", 
"587", "587;167", "587;167;595", "587;167;595;557", "588", "589", 
"59", "590", "591", "592", "593", "594", "595", "596", "597", 
"598", "599", "6", "60", "600", "601", "601;10", "602", "603", 
"604", "605", "606", "607", "608", "609", "61", "610", "611", 
"612", "613", "614", "615", "615;269", "615;926;269", "616", 
"617", "618", "619", "62", "620", "621", "622", "623", "624", 
"625", "626", "627", "628", "629", "63", "63;397", "630", "631", 
"632", "633", "634", "635", "636", "637", "638", "639", "64", 
"64;72", "640", "641", "642", "643", "643;529", "644", "645", 
"646", "647", "648", "649", "65", "650", "651", "652", "653", 
"654", "655", "656", "657", "658", "659", "66", "660", "661", 
"662", "663", "663;819", "664", "665", "666", "667", "668", "669", 
"67", "670", "671", "672", "673", "674", "675", "676", "677", 
"678", "679", "68", "680", "681", "681;97", "682", "683", "684", 
"685", "686", "687", "688", "689", "69", "690", "691", "692", 
"693", "694", "695", "696", "697", "698", "699", "7", "7;25;5", 
"7;752", "7;752;24", "7;752;25;24;8", "70", "700", "701", "702", 
"703", "704", "705", "706", "707", "708", "709", "71", "710", 
"711", "712", "713", "714", "715", "716", "717", "718", "719", 
"72", "72;746;944", "72;746;944;772", "72;772", "72;927", "720", 
"721", "722", "723", "724", "725", "726", "727", "728", "729", 
"73", "730", "731", "732", "733", "734", "735", "735;522", "735;665", 
"735;665;522", "735;665;876", "735;876", "735;876;522", "736", 
"737", "738", "739", "74", "740", "741", "742", "743", "744", 
"745", "746", "746;944", "746;944;772", "747", "748", "749", 
"75", "750", "751", "752", "752;24", "753", "754", "755", "756", 
"757", "758", "759", "76", "76;313", "76;313;293", "760", "761", 
"762", "763", "764", "765", "766", "767", "768", "769", "77", 
"770", "771", "772", "773", "774", "775", "776", "777", "778", 
"779", "78", "780", "781", "782", "783", "784", "785", "786", 
"787", "788", "789", "79", "790", "790;552", "791", "792", "793", 
"793;863", "794", "795", "796", "797", "798", "799", "8", "80", 
"800", "801", "802", "803", "804", "805", "806", "807", "808", 
"808;21", "809", "81", "810", "811", "812", "813", "814", "815", 
"815;413", "815;777", "815;777;339", "815;777;838", "815;838", 
"816", "817", "818", "818;7;752", "818;7;752;23;25;17;8", "819", 
"82", "820", "821", "822", "823", "824", "824;957", "825", "826", 
"827", "828", "829", "83", "830", "831", "832", "833", "834", 
"835", "836", "837", "838", "839", "84", "840", "841", "842", 
"843", "844", "845", "846", "847", "847;560;590", "848", "849", 
"85", "850", "850;817", "851", "852", "853", "853;420", "854", 
"855", "856", "857", "858", "858;638", "858;638;409", "859", 
"86", "860", "861", "861;593", "862", "863", "864", "865", "866", 
"867", "868", "869", "869;614", "87", "870", "871", "872", "873", 
"874", "875", "876", "877", "878", "879", "88", "880", "881", 
"882", "883", "884", "884;6", "884;6;9", "885", "886", "887", 
"888", "888;189", "889", "89", "890", "890;904", "891", "891;953", 
"892", "892;941", "893", "894", "895", "896", "897", "898", "899", 
"9", "90", "900", "901", "901;224", "902", "903", "904", "905", 
"906", "907", "908", "909", "91", "910", "911", "912", "913", 
"914", "915", "916", "917", "918", "918;947", "919", "92", "920;530;589", 
"920;530;589;934", "921", "922", "923", "924", "924;576", "925", 
"926", "927", "928", "929", "93", "930", "931", "932", "933", 
"934", "935", "936", "937", "938", "939", "94", "940", "941", 
"942", "943", "944", "945", "946", "947", "948", "949", "95", 
"950", "951", "952", "953", "954", "955", "956", "957", "958", 
"959", "96", "960", "961", "962", "963", "964", "965", "966", 
"967", "97", "98", "99", "99;392"), class = "factor"), Mod..Peptide.ID = c(23L, 
24L, 25L, 26L, 27L, 29L, 30L, 31L, 32L, 33L)), .Names = c("Protein.Group.IDs", 
"Mod..Peptide.ID"), row.names = c(318L, 344L, 380L, 406L, 409L, 
417L, 436L, 462L, 494L, 505L), class = "data.frame")

Kind Regards Mads

like image 839
user2938867 Avatar asked Oct 30 '13 20:10

user2938867


People also ask

What does rowwise () do in R?

rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.

How do you append rows in R?

You can quickly append one or more rows to a data frame in R by using one of the following methods: Method 1: Use rbind() to append data frames. Method 2: Use nrow() to append a row.

Can you subset rows in R?

Subsetting both rows and columnsIt is possible to subset both rows and columns using the subset function. The select argument lets you subset variables (columns).

How do I combine rows in a Dataframe in R?

To merge two data frames (datasets) horizontally, use the merge() function in the R language. To bind or combine rows in R, use the rbind() function. The rbind() stands for row binding.


1 Answers

I've grown to really love data.table for this kind of task. It is so very simple. But first, let's make some sample data (which you should provide idealy!)

#  Sample data
set.seed(1)
df = data.frame( pep = replicate( 3 , paste( sample(999,3) , collapse=";") ) , pro = sample(3) , stringsAsFactors = FALSE )

Now we use the data.table package to do the reshaping in a couple of lines...

#  Load data.table package
require(data.table)

#  Turn data.frame into data.table, which looks like..
dt <- data.table(df)
#           pep pro
#1: 266;372;572   1
#2: 908;202;896   3
#3: 944;660;628   2

# Transform it in one line like this...
dt[ , list( pep = unlist( strsplit( pep , ";" ) ) ) , by = pro ]
#   pro pep
#1:   1 266
#2:   1 372
#3:   1 572
#4:   3 908
#5:   3 202
#6:   3 896
#7:   2 944
#8:   2 660
#9:   2 628
like image 83
Simon O'Hanlon Avatar answered Nov 10 '22 00:11

Simon O'Hanlon