read.csv, header on first line, skip second line [duplicate]

Q: How do I skip a line in CSV reader?

In Python, while reading a CSV using the CSV module you can skip the first line using next() method.

Q: How do I skip a header in a CSV file?

To make it skip one item before your loop, simply call next(reader, None) and ignore the return value. You can also simplify your code a little; use the opened files as context managers to have them closed automatically: with open("tmob_notcleaned. csv", "rb") as infile, open("tmob_cleaned.

Q: Is it necessary to have a line as first line in CSV file?

The first row is only mandatory when the import template has the setting use "Use column headers as configuration" enabled. However having the first row in the CSV file helps knowing what data is in the file.

Tags:

r

header

skip

read.csv

I have a CSV file with two header rows, the first row I want to be the header, but the second row I want to discard. If I do the following command:

data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE)

The first row becomes the header and the second row of the file becomes the first row of my data frame:

  Xaaaaaaaaa       X X.1     Xbbbbbbbbbb     X.2 X.3 1         Date PX_LAST  NA         Date PX_LAST  NA 2   31/12/2002  38.855  NA   31/12/2002  19.547  NA 3   02/01/2003  38.664  NA   02/01/2003  19.547  NA 4   03/01/2003  40.386  NA   03/01/2003  19.547  NA 5   06/01/2003  40.386  NA   06/01/2003  19.609  NA 6   07/01/2003  40.195  NA   07/01/2003  19.609  NA

I want to skip this second row of the CSV file and just get

  X1.HK.Equity       X X.1 X2.HK.Equity     X.2 X.3 2   31/12/2002  38.855  NA   31/12/2002  19.547  NA 3   02/01/2003  38.664  NA   02/01/2003  19.547  NA 4   03/01/2003  40.386  NA   03/01/2003  19.547  NA 5   06/01/2003  40.386  NA   06/01/2003  19.609  NA 6   07/01/2003  40.195  NA   07/01/2003  19.609  NA

I tried data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE, skip = 1) but that returns:

        Date PX_LAST  X     Date.1 PX_LAST.1 X.1 1 31/12/2002  38.855 NA 31/12/2002    19.547  NA 2 02/01/2003  38.664 NA 02/01/2003    19.547  NA 3 03/01/2003  40.386 NA 03/01/2003    19.547  NA 4 06/01/2003  40.386 NA 06/01/2003    19.609  NA 5 07/01/2003  40.195 NA 07/01/2003    19.609  NA 6 08/01/2003  40.386 NA 08/01/2003    19.547  NA

The header row comes from the second line of my CSV file, not the first line.

Thank you.

315

asked Apr 07 '13 07:04

mchangun

1 Answers

This should do the trick:

all_content = readLines("file.csv") skip_second = all_content[-2] dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE)

The first step using readLines reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative indexing in R means select all but this index. Finally, we feed this data to read.csv to process it into a data.frame.

156

answered Sep 21 '22 09:09

Paul Hiemstra

Related questions
                            
                                Finding rows containing a value (or values) in any column
                            
                                How to use superscript with ggplot2
                            
                                Apply list of functions to list of values
                            
                                How to find the highest (latest) and lowest (earliest) date [R]
                            
                                Splitting a large data frame into smaller segments
                            
                                Non-standard evaluation (NSE) in dplyr's filter_ & pulling data from MySQL
                            
                                Where in R do I permanently store my custom functions?
                            
                                How to add line breaks to plotly hover labels
                            
                                remove row with nan value
                            
                                How to compute error rate from a decision tree?
                            
                                parallel execution of random forest in R
                            
                                Making square axes in R
                            
                                Insert a logo in upper right corner of R markdown html document
                            
                                Filtering a data frame on a vector [duplicate]
                            
                                Remove all punctuation except apostrophes in R
                            
                                Why reshape2's Melt cannot capture rownames in the transformation?
                            
                                Floating point math in different programming languages
                            
                                Custom sorting (non-alphabetical)
                            
                                R: How to extract dates from a time series
                            
                                Changing values when converting column type to numeric

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With