I have been using SmarterCSV to convert bed format file to csv file and changing the column names.
Now I have collected several CSV files, and want to combine them into one big CSV file.
In test3.csv, there are three columns, chromosome
, start_site
and end_site
that will be used, and the other three columns, binding_site_pattern
,score
and strand
that will be removed.
By adding three new columns to the test3.csv file, the data are all the same in the transcription_factor
column: Cmyc
, in the cell_type
column: PWM
, in the project_name
column: JASPAR
.
Anyone have any ideas on this one?
test1.csv
transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE
Cmyc,GM12878,11,6704236,6704683,ENCODE
test2.csv
transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,H1ESC,19,9710417,9710587,ENCODE
Cmyc,H1ESC,11,541754,542137,ENCODE
test3.csv
chromosome,start_site,end_site,binding_site_pattern,score,strand
chr1,21942,21953,AAGCACGTGGT,1752,+
chr1,21943,21954,AACCACGTGCT,1335,-
Desired combined result:
transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE
Cmyc,GM12878,11,6704236,6704683,ENCODE
Cmyc,H1ESC,19,9710417,9710587,ENCODE
Cmyc,H1ESC,11,541754,542137,ENCODE
Cmyc,PWM,1,21942,21953,JASPAR
Cmyc,PWM,1,21943,21954,JASPAR
hs = %w{ transcription_factor cell_type chromosome start_site end_site project_name }
CSV.open('result.csv','w') do |csv|
csv << hs
CSV.foreach('test1.csv', headers: true) {|row| csv << row.values_at(*hs) }
CSV.foreach('test2.csv', headers: true) {|row| csv << row.values_at(*hs) }
CSV.foreach('test3.csv', headers: true) do |row|
csv << ['Cmyc', 'PWM', row['chromosome'].match(/\d+/).to_s] + row.values_at('start_site', 'end_site') + ['JASPAR']
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With