Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slicing multiple column ranges from a dataframe using iloc

I have a df with 32 columns

df.shape
(568285, 32)

I am trying to rearrange the columns in a specific way, and drop the first column using iloc

 df = df.iloc[:,[31,[1:23],24,25,26,28,27,29,30]]
                         ^
SyntaxError: invalid syntax

is this the right way to do it?

like image 424
novicebioinforesearcher Avatar asked Aug 31 '17 16:08

novicebioinforesearcher


2 Answers

You could use the np.r_ indexer.

class RClass(AxisConcatenator)
 |  Translates slice objects to concatenation along the first axis.
 |  
 |  This is a simple way to build up arrays quickly. There are two use cases.

df = df.iloc[:, np.r_[31, 1:23, 24, 25, 26, 28, 27, 29, 30]]

df

     0     1     2     3     4     5     6     7     8     9   ...     40  \
A  33.0  44.0  68.0  31.0   NaN  87.0  66.0   NaN  72.0  33.0  ...   71.0   
B   NaN   NaN  77.0  98.0   NaN  48.0  91.0  43.0   NaN  89.0  ...   38.0   
C  45.0  55.0   NaN  72.0  61.0  87.0   NaN  99.0  96.0  75.0  ...   83.0   
D   NaN   NaN   NaN  58.0   NaN  97.0  64.0  49.0  52.0  45.0  ...   63.0   

     41    42    43    44    45    46    47    48    49  
A   NaN  87.0  31.0  50.0  48.0  73.0   NaN   NaN  81.0  
B  79.0  47.0  51.0  99.0  59.0   NaN  72.0  48.0   NaN  
C  93.0   NaN  95.0  97.0  52.0  99.0  71.0  53.0  69.0  
D   NaN  41.0   NaN   NaN  55.0  90.0   NaN   NaN  92.0

out = df.iloc[:, np.r_[31, 1:23, 24, 25, 26, 28, 27, 29, 30]]
out 
     31    1     2     3     4     5     6     7     8     9   ...     20  \
A  99.0  44.0  68.0  31.0   NaN  87.0  66.0   NaN  72.0  33.0  ...   66.0   
B  42.0   NaN  77.0  98.0   NaN  48.0  91.0  43.0   NaN  89.0  ...    NaN   
C  77.0  55.0   NaN  72.0  61.0  87.0   NaN  99.0  96.0  75.0  ...   76.0   
D  95.0   NaN   NaN  58.0   NaN  97.0  64.0  49.0  52.0  45.0  ...   71.0   

     21    22    24    25    26    28    27    29    30  
A   NaN  40.0  66.0  87.0  97.0  68.0   NaN  68.0   NaN  
B  95.0   NaN  47.0  79.0  47.0   NaN  83.0  81.0  57.0  
C   NaN  75.0  46.0  84.0   NaN  50.0  41.0  38.0  52.0  
D   NaN  74.0  41.0  55.0  60.0   NaN   NaN  84.0   NaN  
like image 194
cs95 Avatar answered Oct 16 '22 21:10

cs95


Here's a custom solution using explicit indexing:
Side note, np.r_ wasn't working for me, which is why I built this solution.

import numpy as np
import pandas as pd

# Make a sample df of 1_000 rows & 100 cols
data = np.zeros(shape=(1_000,100))
df = pd.DataFrame(data)


# Create a custom function for indexing
def all_nums_in_range(*tuple_pairs, len_df):
    """
    Input pairs of tuples for index slicing

    Include `len_df` to ensure length of array matches indexed df
    """

    # Create an array with values to use as an index
    num_range = np.zeros(shape=(len_df,), dtype=bool)

    # Update
    for (start, end) in tuple_pairs:
        num_range[start:end] = True

    return num_range


# Now apply
num_range = all_nums_in_range((0,50), (75, 80), len_df=100)
df.iloc[:, num_range]
like image 1
Yaakov Bressler Avatar answered Oct 16 '22 21:10

Yaakov Bressler