A few methods to do this:
df.tail
nrows
argument to readskiprows
and read required number of rows.Can it be done in some easier way? If not, which amongst these three should be prefered and why?
Possibly related:
Not directly related:
Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.
pandas.DataFrame.tail() In Python's Pandas module, the Dataframe class provides a tail() function to fetch bottom rows from a Dataframe i.e. It returns the last n rows from a dataframe.
We can remove the last n rows using the drop() method. drop() method gets an inplace argument which takes a boolean value. If inplace attribute is set to True then the dataframe gets updated with the new value of dataframe (dataframe with last n rows removed).
I don't think pandas offers a way to do this in read_csv
.
Perhaps the neatest (in one pass) is to use collections.deque
:
from collections import deque from StringIO import StringIO with open(fname, 'r') as f: q = deque(f, 2) # replace 2 with n (lines read at the end) In [12]: q Out[12]: deque(['7,8,9\n', '10,11,12'], maxlen=2) # these are the last two lines of my csv In [13]: pd.read_csv(StringIO(''.join(q)), header=None)
Another option worth trying is to get the number of lines in a first pass and then read the file again, skip that number of rows (minus n) using read_csv
...
Here's a handy way to do. Works well for what I like to do -
import tailer import pandas as pd import io with open(filename) as file: last_lines = tailer.tail(file, 15) df = pd.read_csv(io.StringIO('\n'.join(last_lines)), header=None)
You need to install tailer, to have this working:
pip install --user tailer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With