I am using pandas to read an excel file. It doesn't have column name but it continues to read the first row as the column name.
Following is the excel file that is being read.
data1 0.994676
data2 0.994588
data3 0.99488
data4 0.994483
data5 0.994312
data6 0.993823
data7 0.993575
data8 0.994231
data9 0.993838
data10 0.994007
data11 0.994328
data12 0.993503
data13 0.99342
data14 0.992729
data15 0.993013
data16 0.993049
data17 0.993133
data18 0.99262
I'm reading the 2nd column using the following code. import pandas as pd
df=pd.ExcelFile('C:/Users/JohnDoe/Desktop/080718_output.xlsx', header=None, index_col=False).parse('Data_sheet')
y=df.iloc[0:17,1]
The following is the y.
In[38]:y
Out[38]:
0 0.994588
1 0.994880
2 0.994483
3 0.994312
4 0.993823
5 0.993575
6 0.994231
7 0.993838
8 0.994007
9 0.994328
10 0.993503
11 0.993420
12 0.992729
13 0.993013
14 0.993049
15 0.993133
16 0.992620
Name: 0.994676, dtype: float64
It skips the first data because the first row is being used as a column name.. Any idea on how I can improve this?
Edit: 'header=False' to 'header=None'. Both cases give the same outcome.
You can use df. columns=df. iloc[0] to set the column labels by extracting the first row. In pandas, the index starts from 0 hence 0 means first row.
Use pandas. read_excel() function to read excel sheet into pandas DataFrame, by default it loads the first sheet from the excel file and parses the first row as a DataFrame column name.
Use pandas. read_csv() to read a specific column from a CSV file. To read a CSV file, call pd. read_csv(file_name, usecols=cols_list) with file_name as the name of the CSV file, delimiter as the delimiter, and cols_list as the list of specific columns to read from the CSV file.
Method 1: Skip One Specific Row #import DataFrame and skip 2nd row df = pd. Method 2: Skip Several Specific Rows #import DataFrame and skip 2nd and 4th row df = pd. Method 3: Skip First N Rows #import DataFrame and skip first 2 rows df = pd.
You can use read_excel
with header=None
for default columns with rangeIndex
:
df = pd.read_excel('file.xlsx',
sheet_name ='Data_sheet',
header=None,
index_col=False)
Create a column header variable and call that in your excel read in statement as well as stating header=None
names=['Column1','Column2']
df=pd.read_excel(r"/Users/JohnDoe/Desktop/080718_output.xlsx",header=None,names=names)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With