Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get Pandas DataFrame first column

Tags:

python

pandas

This question is odd, since I know HOW to do something, but I dont know WHY I cant do it another way.

Suppose simple data frame:

import pandasas pd
a = pd.DataFrame([[0,1], [2,3]])

I can slice this data frame very easily, first column is a[[0]], second is a[[1]]. Simple isnt it?

Now, lets have more complex data frame. This is part of my code:

var_vec = [i for i in range(100)]
num_of_sites = 100
row_names = ["_".join(["loc", str(i)]) for i in 
             range(1,num_of_sites + 1)]
frame = pd.DataFrame(var_vec, columns = ["Variable"], index = row_names)
spec_ab = [i**3 for i in range(100)]
frame[1] = spec_ab

Data frame frame is also pandas DataFrame, such as a. I canget second column very easily as frame[[1]]. But when I try frame[[0]] I get an error:

Traceback (most recent call last):

  File "<ipython-input-55-0c56ffb47d0d>", line 1, in <module>
    frame[[0]]

  File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python-    3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 1991, in __getitem__
    return self._getitem_array(key)

  File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python-    3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2035, in     _getitem_array
    indexer = self.ix._convert_to_indexer(key, axis=1)

  File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python-    3.5.2.amd64\lib\site-packages\pandas\core\indexing.py", line 1184, in     _convert_to_indexer
    indexer = labels._convert_list_indexer(objarr, kind=self.name)

  File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python-    3.5.2.amd64\lib\site-packages\pandas\indexes\base.py", line 1112, in     _convert_list_indexer
    return maybe_convert_indices(indexer, len(self))

  File "C:\Users\Robert\Desktop\Záloha\WinPython-64bit-3.5.2.2\python-    3.5.2.amd64\lib\site-packages\pandas\core\indexing.py", line 1856, in     maybe_convert_indices
    raise IndexError("indices are out-of-bounds")

IndexError: indices are out-of-bounds

I can still use frame.iloc[:,0] but problem is that I dont understand why I cant use simple slicing by [[]]? I use winpython spyder 3 if that helps.

like image 503
Bobesh Avatar asked Jan 31 '17 10:01

Bobesh


People also ask

How do I get columns in pandas series?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.

What is first () in pandas?

Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.


1 Answers

using your code:

import pandas as pd

var_vec = [i for i in range(100)]
num_of_sites = 100
row_names = ["_".join(["loc", str(i)]) for i in 
             range(1,num_of_sites + 1)]
frame = pd.DataFrame(var_vec, columns = ["Variable"], index = row_names)
spec_ab = [i**3 for i in range(100)]
frame[1] = spec_ab

if you ask to print out the 'frame' you get:

    Variable    1
loc_1   0       0
loc_2   1       1
loc_3   2       8
loc_4   3       27
loc_5   4       64
loc_6   5       125
......

So the cause of your problem becomes obvious, you have no column called '0'. At line one you specify a lista called var_vec. At line 4 you make a dataframe out of that list, but you specify the index values and the column name (which is usually good practice). The numerical column name, '0', '1',.. as in the first example, only takes place when you dont specify the column name, its not a column position indexer.

If you want to access columns by their position, you can:

df[df.columns[0]]

what happens than, is you get the list of columns of the df, and you choose the term '0' and pass it to the df as a reference.

hope that helps you understand

edit:

another way (better) would be:

df.iloc[:,0]

where ":" stands for all rows. (also indexed by number from 0 to range of rows)

like image 164
epattaro Avatar answered Oct 10 '22 08:10

epattaro