Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.csv, header on first line, skip second line [duplicate]

I have a CSV file with two header rows, the first row I want to be the header, but the second row I want to discard. If I do the following command:

data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE) 

The first row becomes the header and the second row of the file becomes the first row of my data frame:

  Xaaaaaaaaa       X X.1     Xbbbbbbbbbb     X.2 X.3 1         Date PX_LAST  NA         Date PX_LAST  NA 2   31/12/2002  38.855  NA   31/12/2002  19.547  NA 3   02/01/2003  38.664  NA   02/01/2003  19.547  NA 4   03/01/2003  40.386  NA   03/01/2003  19.547  NA 5   06/01/2003  40.386  NA   06/01/2003  19.609  NA 6   07/01/2003  40.195  NA   07/01/2003  19.609  NA 

I want to skip this second row of the CSV file and just get

  X1.HK.Equity       X X.1 X2.HK.Equity     X.2 X.3 2   31/12/2002  38.855  NA   31/12/2002  19.547  NA 3   02/01/2003  38.664  NA   02/01/2003  19.547  NA 4   03/01/2003  40.386  NA   03/01/2003  19.547  NA 5   06/01/2003  40.386  NA   06/01/2003  19.609  NA 6   07/01/2003  40.195  NA   07/01/2003  19.609  NA 

I tried data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE, skip = 1) but that returns:

        Date PX_LAST  X     Date.1 PX_LAST.1 X.1 1 31/12/2002  38.855 NA 31/12/2002    19.547  NA 2 02/01/2003  38.664 NA 02/01/2003    19.547  NA 3 03/01/2003  40.386 NA 03/01/2003    19.547  NA 4 06/01/2003  40.386 NA 06/01/2003    19.609  NA 5 07/01/2003  40.195 NA 07/01/2003    19.609  NA 6 08/01/2003  40.386 NA 08/01/2003    19.547  NA 

The header row comes from the second line of my CSV file, not the first line.

Thank you.

like image 315
mchangun Avatar asked Apr 07 '13 07:04

mchangun


People also ask

How do I skip a line in CSV reader?

In Python, while reading a CSV using the CSV module you can skip the first line using next() method.

How do I skip a header in a CSV file?

To make it skip one item before your loop, simply call next(reader, None) and ignore the return value. You can also simplify your code a little; use the opened files as context managers to have them closed automatically: with open("tmob_notcleaned. csv", "rb") as infile, open("tmob_cleaned.

Is it necessary to have a line as first line in CSV file?

The first row is only mandatory when the import template has the setting use "Use column headers as configuration" enabled. However having the first row in the CSV file helps knowing what data is in the file.


1 Answers

This should do the trick:

all_content = readLines("file.csv") skip_second = all_content[-2] dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE) 

The first step using readLines reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative indexing in R means select all but this index. Finally, we feed this data to read.csv to process it into a data.frame.

like image 156
Paul Hiemstra Avatar answered Sep 21 '22 09:09

Paul Hiemstra