Ruby's CSV
class makes it pretty easy to iterate over each row:
CSV.foreach(file) { |row| puts row }
However, this always includes the header row, so I'll get as output:
header1, header2
foo, bar
baz, yak
I don't want the headers though. Now, when I call …
CSV.foreach(file, :headers => true)
I get this result:
#<CSV::Row:0x10112e510
@header_row = false,
attr_reader :row = [
[0] [
[0] "header1",
[1] "foo"
],
[1] [
[0] "header2",
[1] "bar"
]
]
>
Of course, because the documentation says:
This setting causes #shift to return rows as CSV::Row objects instead of Arrays
But, how can I skip the header row, returning the row as a simple array? I don't want the complicated CSV::Row
object to be returned.
I definitely don't want to do this:
first = true
CSV.foreach(file) do |row|
if first
puts row
first = false
else
# code for other rows
end
end
To read CSV file without header, use the header parameter and set it to “None” in the read_csv() method.
To make it skip one item before your loop, simply call next(reader, None) and ignore the return value. You can also simplify your code a little; use the opened files as context managers to have them closed automatically: with open("tmob_notcleaned. csv", "rb") as infile, open("tmob_cleaned.
A header of the CSV file is an array of values assigned to each of the columns. It acts as a row header for the data. Initially, the CSV file is converted to a data frame and then a header is added to the data frame. The contents of the data frame are again stored back into the CSV file.
Look at #shift from CSV Class:
The primary read method for wrapped Strings and IOs, a single row is pulled from the data source, parsed and returned as an Array of fields (if header rows are not used)
An Example:
require 'csv'
# CSV FILE
# name, surname, location
# Mark, Needham, Sydney
# David, Smith, London
def parse_csv_file_for_names(path_to_csv)
names = []
csv_contents = CSV.read(path_to_csv)
csv_contents.shift
csv_contents.each do |row|
names << row[0]
end
return names
end
You might want to consider CSV.parse(csv_file, { :headers => false })
and passing a block, as mentioned here
A cool way to ignore the headers is to read it as an array and ignore the first row:
data = CSV.read("dataset.csv")[1 .. -1]
# => [["first_row", "with data"],
["second_row", "and more data"],
...
["last_row", "finally"]]
The problem with the :headers => false
approach is that CSV
won't try to read the first row as a header, but will consider it part of the data. So, basically, you have a useless first row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With