I wrote a python script that will pull excel files from a folder and write them into a SQL table. I got the code to work, but only if I delete the first line of the excel file which contains the headers. I'm new to Python so this is probably something simple, but I looked at a lot of different techniques and couldn't figure out how to insert it into my code. Any ideas would be greatly appreciated!
# Import arcpy module
from xlrd import open_workbook ,cellname
import arcpy
import pyodbc as p
# Database Connection Info
server = "myServer"
database = "my_Tables"
connStr = ('DRIVER={SQL Server Native Client 10.0};SERVER=' + server + ';DATABASE=' + database + ';' + 'Trusted_Connection=yes')
# Assign path to Excel file
file_to_import = '\\\\Location\\Report_Test.xls'
# Assign column count
column_count=10
# Open entire workbook
book = open_workbook(file_to_import)
# Use first sheet
sheet = book.sheet_by_index(0)
# Open connection to SQL Server Table
conn = p.connect(connStr)
# Get cursor
cursor = conn.cursor()
# Assign the query string without values once, outside the loop
query = "INSERT INTO HED_EMPLOYEE_DATA (Company, Contact, Email, Name, Address, City, CentralCities, EnterpriseZones, NEZ, CDBG) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
# Iterate through each row
for row_index in range(sheet.nrows):
row_num = row_index
Company = sheet.cell(row_index,0).value
Contact = sheet.cell(row_index,1).value
Email = sheet.cell(row_index,2).value
Name = sheet.cell(row_index,3).value
Address = sheet.cell(row_index,4).value
City = sheet.cell(row_index,5).value
CentralCities = sheet.cell(row_index,6).value
EnterpriseZones = sheet.cell(row_index,7).value
NEZ = sheet.cell(row_index,8).value
CDBG = sheet.cell(row_index,9).value
values = (Company, Contact, Email, Name, Address, City, CentralCities, EnterpriseZones, NEZ, CDBG)
cursor.execute(query, values)
# Close cursor
cursor.close()
# Commit transaction
conn.commit()
# Close SQL server connection
conn.close()
Method 1: Skip One Specific Row #import DataFrame and skip 2nd row df = pd. Method 2: Skip Several Specific Rows #import DataFrame and skip 2nd and 4th row df = pd. Method 3: Skip First N Rows #import DataFrame and skip first 2 rows df = pd.
To read CSV file without header, use the header parameter and set it to “None” in the read_csv() method.
Current size limits for excel are 1,048,576 rows by 16,384 columns — owing to memory resources.
You can initialize the iteration at the second row. Try the following:
for row_index in range(1,sheet.nrows):
Edit: If you need to iterate over a list of .xls files, as you asked in the comments, the basic idea is to perform an external loop over the files. Here it comes some hints:
# You need to import the os library. At the beinning of your code
import os
...
# Part of your code here
...
# Assign path to Excel file
#file_to_import = '\\\\Location\\Report_Test.xls'
folder_to_import = '\\\\Location'
l_files_to_import = os.listdir(folder_to_import)
for file_to_import in l_files_to_import:
if file_to_import.endswith('.xls'):
# The rest of your code here. Be careful with the indentation!
column_count=10
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With