I have data in different columns, but I don't know how to extract it to save it in another variable. <pre class="prettyprint lang-none prettyprint-override"><code>index a b c 1 2 3 4 2 3 4 5 </code></pre> How do I select <code>'a'</code>, <code>'b'</code> and save it in to df1? I tried <pre class="prettyprint"><code>df1 = df['a':'b'] df1 = df.ix[:, 'a':'b'] </code></pre> None seem to work.

As of version 0.11.0, columns can be sliced in the manner you tried using the <code>.loc</code> indexer: <pre class="prettyprint"><code>df.loc[:, 'C':'E'] </code></pre> is equivalent to <pre class="prettyprint"><code>df[['C', 'D', 'E']] # or df.loc[:, ['C', 'D', 'E']] </code></pre> and returns columns <code>C</code> through <code>E</code>. <hr> A demo on a randomly generated DataFrame: <pre class="prettyprint"><code>import pandas as pd import numpy as np np.random.seed(5) df = pd.DataFrame(np.random.randint(100, size=(100, 6)), columns=list('ABCDEF'), index=['R{}'.format(i) for i in range(100)]) df.head() Out: A B C D E F R0 99 78 61 16 73 8 R1 62 27 30 80 7 76 R2 15 53 80 27 44 77 R3 75 65 47 30 84 86 R4 18 9 41 62 1 82 </code></pre> To get the columns from C to E (note that unlike integer slicing, 'E' is included in the columns): <pre class="prettyprint"><code>df.loc[:, 'C':'E'] Out: C D E R0 61 16 73 R1 30 80 7 R2 80 27 44 R3 47 30 84 R4 41 62 1 R5 5 58 0 ... </code></pre> The same works for selecting rows based on labels. Get the rows 'R6' to 'R10' from those columns: <pre class="prettyprint"><code>df.loc['R6':'R10', 'C':'E'] Out: C D E R6 51 27 31 R7 83 19 18 R8 11 67 65 R9 78 27 29 R10 7 16 94 </code></pre> <code>.loc</code> also accepts a Boolean array so you can select the columns whose corresponding entry in the array is <code>True</code>. For example, <code>df.columns.isin(list('BCD'))</code> returns <code>array([False, True, True, True, False, False], dtype=bool)</code> - True if the column name is in the list <code>['B', 'C', 'D']</code>; False, otherwise. <pre class="prettyprint"><code>df.loc[:, df.columns.isin(list('BCD'))] Out: B C D R0 78 61 16 R1 27 30 80 R2 53 80 27 R3 65 47 30 R4 9 41 62 R5 78 5 58 ... </code></pre>

Selecting multiple columns in a Pandas dataframe

Tags:

python

select

pandas

dataframe

I have data in different columns, but I don't know how to extract it to save it in another variable.

index  a   b   c 1      2   3   4 2      3   4   5

How do I select 'a', 'b' and save it in to df1?

I tried

df1 = df['a':'b'] df1 = df.ix[:, 'a':'b']

None seem to work.

277

asked Jul 01 '12 21:07

user1234440

2 Answers

The column names (which are strings) cannot be sliced in the manner you tried.

Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []'s).

df1 = df[['a', 'b']]

Alternatively, if it matters to index them numerically and not by their name (say your code should automatically do this without knowing the names of the first two columns) then you can do this instead:

df1 = df.iloc[:, 0:2] # Remember that Python does not slice inclusive of the ending index.

Additionally, you should familiarize yourself with the idea of a view into a Pandas object vs. a copy of that object. The first of the above methods will return a new copy in memory of the desired sub-object (the desired slices).

Sometimes, however, there are indexing conventions in Pandas that don't do this and instead give you a new variable that just refers to the same chunk of memory as the sub-object or slice in the original object. This will happen with the second way of indexing, so you can modify it with the .copy() method to get a regular copy. When this happens, changing what you think is the sliced object can sometimes alter the original object. Always good to be on the look out for this.

df1 = df.iloc[0, 0:2].copy() # To avoid the case where changing df1 also changes df

To use iloc, you need to know the column positions (or indices). As the column positions may change, instead of hard-coding indices, you can use iloc along with get_loc function of columns method of dataframe object to obtain column indices.

{df.columns.get_loc(c): c for idx, c in enumerate(df.columns)}

Now you can use this dictionary to access columns through names and using iloc.

answered Sep 28 '22 01:09

ely

As of version 0.11.0, columns can be sliced in the manner you tried using the .loc indexer:

df.loc[:, 'C':'E']

is equivalent to

df[['C', 'D', 'E']]  # or df.loc[:, ['C', 'D', 'E']]

and returns columns C through E.

A demo on a randomly generated DataFrame:

import pandas as pd import numpy as np np.random.seed(5) df = pd.DataFrame(np.random.randint(100, size=(100, 6)),                   columns=list('ABCDEF'),                   index=['R{}'.format(i) for i in range(100)]) df.head()  Out:      A   B   C   D   E   F R0  99  78  61  16  73   8 R1  62  27  30  80   7  76 R2  15  53  80  27  44  77 R3  75  65  47  30  84  86 R4  18   9  41  62   1  82

To get the columns from C to E (note that unlike integer slicing, 'E' is included in the columns):

df.loc[:, 'C':'E']  Out:       C   D   E R0   61  16  73 R1   30  80   7 R2   80  27  44 R3   47  30  84 R4   41  62   1 R5    5  58   0 ...

The same works for selecting rows based on labels. Get the rows 'R6' to 'R10' from those columns:

df.loc['R6':'R10', 'C':'E']  Out:       C   D   E R6   51  27  31 R7   83  19  18 R8   11  67  65 R9   78  27  29 R10   7  16  94

.loc also accepts a Boolean array so you can select the columns whose corresponding entry in the array is True. For example, df.columns.isin(list('BCD')) returns array([False, True, True, True, False, False], dtype=bool) - True if the column name is in the list ['B', 'C', 'D']; False, otherwise.

df.loc[:, df.columns.isin(list('BCD'))]  Out:       B   C   D R0   78  61  16 R1   27  30  80 R2   53  80  27 R3   65  47  30 R4    9  41  62 R5   78   5  58 ...

answered Sep 28 '22 01:09

ayhan

Related questions
                            
                                Creating a singleton in Python
                            
                                How to get the filename without the extension from a path in Python?
                            
                                How to find if directory exists in Python
                            
                                Referring to the null object in Python
                            
                                How do I check what version of Python is running my script?
                            
                                Use different Python version with virtualenv
                            
                                How can I install packages using pip according to the requirements.txt file from a local directory?
                            
                                How do I get time of a Python program's execution?
                            
                                How do I write JSON data to a file?
                            
                                String formatting: % vs. .format vs. f-string literal
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)
                            
                                How can I use threading in Python?
                            
                                Hidden features of Python [closed]
                            
                                What are the differences between type() and isinstance()?
                            
                                Create a dictionary with list comprehension
                            
                                How can I flush the output of the print function (unbuffer python output)?
                            
                                How do I get the row count of a Pandas DataFrame?
                            
                                Save plot to image file instead of displaying it using Matplotlib
                            
                                How to import a module given the full path?
                            
                                What is the Python 3 equivalent of "python -m SimpleHTTPServer"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With