Pandas every nth row

People also ask

How do you select alternate rows in pandas?

Use pandas. To select the rows, the syntax is df. loc[start:stop:step] ; where start is the name of the first-row label to take, stop is the name of the last row label to take, and step as the number of indices to advance after each extraction; for example, you can use it to select alternate rows.

What's the difference between LOC and ILOC in pandas?

When it comes to selecting rows and columns of a pandas DataFrame, loc and iloc are two commonly used functions. Here is the subtle difference between the two functions: loc selects rows and columns with specific labels. iloc selects rows and columns at specific integer positions.

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

Though @chrisb's accepted answer does answer the question, I would like to add to it the following.

A simple method I use to get the nth data or drop the nth row is the following:

df1 = df[df.index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0]  # Selects every 3rd raw starting from 0

This arithmetic based sampling has the ability to enable even more complex row-selections.

This assumes, of course, that you have an index column of ordered, consecutive, integers starting at 0.

There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__.

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

For example, to get every 2 rows, you can do

df[::2]

   a  b  c
0  x  x  x
2  x  x  x
4  x  x  x

There's also GroupBy.first/GroupBy.head, you group on the index:

df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')

df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do

# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

Adding reset_index() to metastableB's answer allows you to only need to assume that the rows are ordered and consecutive.

df1 = df[df.reset_index().index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.reset_index().index % 3 == 0]  # Selects every 3rd row starting from 0

df.reset_index().index will create an index that starts at 0 and increments by 1, allowing you to use the modulo easily.

I had a similar requirement, but I wanted the n'th item in a particular group. This is how I solved it.

groups = data.groupby(['group_key'])
selection = groups['index_col'].apply(lambda x: x % 3 == 0)
subset = data[selection]

Related questions
                            
                                Check if value already exists within list of dictionaries?
                            
                                Changing default encoding of Python?
                            
                                How to check if a word is an English word with Python?
                            
                                How to save and load cookies using Python + Selenium WebDriver
                            
                                Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?
                            
                                Find the most frequent number in a NumPy array
                            
                                Test if lists share any items in python
                            
                                How do I install Python 3 on an AWS EC2 instance?
                            
                                How can I manually generate a .pyc file from a .py file
                            
                                Find out time it took for a python script to complete execution
                            
                                how to set "camera position" for 3d plots using python/matplotlib?
                            
                                Why is Python 3.x's super() magic?
                            
                                Saving and loading objects and using pickle
                            
                                When should I (not) want to use pandas apply() in my code?
                            
                                Python ValueError: too many values to unpack [duplicate]
                            
                                Conditional import of modules in Python
                            
                                How do I convert a Django QuerySet into list of dicts?
                            
                                How do I handle the window close event in Tkinter?
                            
                                Understanding the main method of python [duplicate]
                            
                                Filtering a list based on a list of booleans

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas every nth row

Tags:

python

pandas

resampling

People also ask

Recent Activity

Donate For Us