I have a CSV file with two header rows, the first row I want to be the header, but the second row I want to discard. If I do the following command:
data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE)
The first row becomes the header and the second row of the file becomes the first row of my data frame:
Xaaaaaaaaa X X.1 Xbbbbbbbbbb X.2 X.3 1 Date PX_LAST NA Date PX_LAST NA 2 31/12/2002 38.855 NA 31/12/2002 19.547 NA 3 02/01/2003 38.664 NA 02/01/2003 19.547 NA 4 03/01/2003 40.386 NA 03/01/2003 19.547 NA 5 06/01/2003 40.386 NA 06/01/2003 19.609 NA 6 07/01/2003 40.195 NA 07/01/2003 19.609 NA
I want to skip this second row of the CSV file and just get
X1.HK.Equity X X.1 X2.HK.Equity X.2 X.3 2 31/12/2002 38.855 NA 31/12/2002 19.547 NA 3 02/01/2003 38.664 NA 02/01/2003 19.547 NA 4 03/01/2003 40.386 NA 03/01/2003 19.547 NA 5 06/01/2003 40.386 NA 06/01/2003 19.609 NA 6 07/01/2003 40.195 NA 07/01/2003 19.609 NA
I tried data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE, skip = 1)
but that returns:
Date PX_LAST X Date.1 PX_LAST.1 X.1 1 31/12/2002 38.855 NA 31/12/2002 19.547 NA 2 02/01/2003 38.664 NA 02/01/2003 19.547 NA 3 03/01/2003 40.386 NA 03/01/2003 19.547 NA 4 06/01/2003 40.386 NA 06/01/2003 19.609 NA 5 07/01/2003 40.195 NA 07/01/2003 19.609 NA 6 08/01/2003 40.386 NA 08/01/2003 19.547 NA
The header row comes from the second line of my CSV file, not the first line.
Thank you.
In Python, while reading a CSV using the CSV module you can skip the first line using next() method.
To make it skip one item before your loop, simply call next(reader, None) and ignore the return value. You can also simplify your code a little; use the opened files as context managers to have them closed automatically: with open("tmob_notcleaned. csv", "rb") as infile, open("tmob_cleaned.
The first row is only mandatory when the import template has the setting use "Use column headers as configuration" enabled. However having the first row in the CSV file helps knowing what data is in the file.
This should do the trick:
all_content = readLines("file.csv") skip_second = all_content[-2] dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE)
The first step using readLines
reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative indexing in R means select all but this index
. Finally, we feed this data to read.csv
to process it into a data.frame
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With