Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use Ruby to combine several CSV files into one big CSV file?

I have been using SmarterCSV to convert bed format file to csv file and changing the column names.

Now I have collected several CSV files, and want to combine them into one big CSV file.

In test3.csv, there are three columns, chromosome, start_site and end_site that will be used, and the other three columns, binding_site_pattern,score and strand that will be removed.

By adding three new columns to the test3.csv file, the data are all the same in the transcription_factor column: Cmyc, in the cell_type column: PWM, in the project_name column: JASPAR.

Anyone have any ideas on this one?

test1.csv

transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE  
Cmyc,GM12878,11,6704236,6704683,ENCODE  

test2.csv

transcription_factor,cell_type,chromosome,start_site,end_site,project_name  
Cmyc,H1ESC,19,9710417,9710587,ENCODE  
Cmyc,H1ESC,11,541754,542137,ENCODE  

test3.csv

chromosome,start_site,end_site,binding_site_pattern,score,strand  
chr1,21942,21953,AAGCACGTGGT,1752,+    
chr1,21943,21954,AACCACGTGCT,1335,-  

Desired combined result:

transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE  
Cmyc,GM12878,11,6704236,6704683,ENCODE  
Cmyc,H1ESC,19,9710417,9710587,ENCODE    
Cmyc,H1ESC,11,541754,542137,ENCODE   
Cmyc,PWM,1,21942,21953,JASPAR  
Cmyc,PWM,1,21943,21954,JASPAR
like image 473
Michael Avatar asked Feb 12 '23 10:02

Michael


1 Answers

hs = %w{ transcription_factor cell_type chromosome start_site end_site project_name }

CSV.open('result.csv','w') do |csv|
  csv << hs
  CSV.foreach('test1.csv', headers: true) {|row| csv << row.values_at(*hs) }
  CSV.foreach('test2.csv', headers: true) {|row| csv << row.values_at(*hs) }
  CSV.foreach('test3.csv', headers: true) do |row|
    csv << ['Cmyc', 'PWM', row['chromosome'].match(/\d+/).to_s] + row.values_at('start_site', 'end_site') + ['JASPAR']
  end
end
like image 91
Jacob Brown Avatar answered Feb 15 '23 10:02

Jacob Brown