I have CSV files which I read in in pandas with:
#!/usr/bin/env python import pandas as pd import sys filename = sys.argv[1] df = pd.read_csv(filename)
Unfortunately, the last line of these files is often corrupt (has the wrong number of commas). Currently I open each file in a text editor and remove the last line.
Is it possible to remove the last line in the same python/pandas script that loads the CSV to save having to take this extra non-automated step?
Using drop() Function to Delete Last Row of Pandas DataFrame. Alternatively, you can also use drop() method to remove the last row. Use index param to specify the last index and inplace=True to apply the change on the existing DataFrame. In the below example, df.
Read Last Line of File With the readlines() Function in Python. The file. readlines() function reads all the lines of a file and returns them in the form of a list. We can then get the last line of the file by referencing the last index of the list using -1 as an index.
Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.
If you pass it an open file it will keep it open (reading from the current position), if you pass a string then read_csv will open and close the file.
pass error_bad_lines=False
and it will skip this line automatically
df = pd.read_csv(filename, error_bad_lines=False)
The advantage of error_bad_lines
is it will skip and not bork on any erroneous lines but if the last line is always duff then skipfooter=1
is better
Thanks to @DexterMorgan for pointing out that skipfooter
option forces the engine to use the python engine which is slower than the c engine for parsing a csv.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With