I know about these column slice methods: <code>df2 = df[["col1", "col2", "col3"]]</code> and <code>df2 = df.ix[:,0:2]</code> but I'm wondering if there is a way to slice columns from the front/middle/end of a dataframe in the same slice without specifically listing each one. For example, a dataframe <code>df</code> with columns: col1, col2, col3, col4, col5 and col6. Is there a way to do something like this? <code>df2 = df.ix[:, [0:2, "col5"]]</code> I'm in the situation where I have hundreds of columns and routinely need to slice specific ones for different requests. I've checked through the documentation and haven't seen something like this. Have I overlooked something?

IIUC, the simplest way I can think of would be something like this: <pre class="prettyprint"><code>>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame(np.random.randn(5, 10)) >>> df[list(df.columns[:2]) + [7]] 0 1 7 0 0.210139 0.533249 1.780426 1 0.382136 0.083999 -0.392809 2 -0.237868 0.493646 -1.208330 3 1.242077 -0.781558 2.369851 4 1.910740 -0.643370 0.982876 </code></pre> where the <code>list</code> call isn't optional because otherwise the <code>Index</code> object will try to vector-add itself to the 7. It would be possible to special-case something like numpy's <code>r_</code> so that <pre class="prettyprint"><code>df[col_[:2, "col5", 3:6]] </code></pre> would work, although I don't know if it would be worth the trouble.

keep/slice specific columns in pandas

Tags:

python

pandas

I know about these column slice methods:

df2 = df[["col1", "col2", "col3"]] and df2 = df.ix[:,0:2]

but I'm wondering if there is a way to slice columns from the front/middle/end of a dataframe in the same slice without specifically listing each one.

For example, a dataframe df with columns: col1, col2, col3, col4, col5 and col6.

Is there a way to do something like this?

df2 = df.ix[:, [0:2, "col5"]]

I'm in the situation where I have hundreds of columns and routinely need to slice specific ones for different requests. I've checked through the documentation and haven't seen something like this. Have I overlooked something?

768

asked Feb 25 '13 16:02

bdiamante

2 Answers

IIUC, the simplest way I can think of would be something like this:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.randn(5, 10))
>>> df[list(df.columns[:2]) + [7]]
          0         1         7
0  0.210139  0.533249  1.780426
1  0.382136  0.083999 -0.392809
2 -0.237868  0.493646 -1.208330
3  1.242077 -0.781558  2.369851
4  1.910740 -0.643370  0.982876

where the list call isn't optional because otherwise the Index object will try to vector-add itself to the 7.

It would be possible to special-case something like numpy's r_ so that

df[col_[:2, "col5", 3:6]]

would work, although I don't know if it would be worth the trouble.

173

answered Oct 05 '22 22:10

DSM

If your column names have information that you can filter for, you could use df.filter(regex='name*'). I am using this to filter between my 189 data channels from a1_01 to b3_21 and it works fine.

answered Oct 05 '22 22:10

K.-Michael Aye

Related questions
                            
                                ValueError: Mountpoint must not contain a space. (Colab)
                            
                                Are executables produced with Cython really free of the source code?
                            
                                Understanding `width_shift_range` and `height_shift_range` arguments in Keras's ImageDataGenerator class
                            
                                Run nosetests with warnings as errors?
                            
                                What is the best way to get a stacktrace when using multiprocessing?
                            
                                Is there a javascript equivalent to unpack sequences like in python?
                            
                                Python - difference between os.access and os.path.exists?
                            
                                Google Protocol Buffers, HDF5, NumPy comparison (transferring data)
                            
                                Django testing tips [closed]
                            
                                How do I parse subjectAltName extension data using pyasn1?
                            
                                Downloading a Torrent with libtorrent-python
                            
                                Different logging levels for filehandler and display in Python
                            
                                Minimising reading from and writing to disk in Python for a memory-heavy operation
                            
                                Py_initialize / Py_Finalize not working twice with numpy
                            
                                Installing scrapy/pyopenssl in Windows' virtualenv
                            
                                Sharing Python virtualenv environments
                            
                                Is the max thread limit actually a non-relevant issue for Python / Linux?
                            
                                How can I print to console while the program is running in python? [duplicate]
                            
                                re.sub not replacing all occurrences
                            
                                How can the built-in range function take a single argument or three?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With