Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get first row value of a given column

This seems like a ridiculously easy question... but I'm not seeing the easy answer I was expecting.

So, how do I get the value at an nth row of a given column in Pandas? (I am particularly interested in the first row, but would be interested in a more general practice as well).

For example, let's say I want to pull the 1.2 value in Btime as a variable.

Whats the right way to do this?

>>> df_test     ATime   X   Y   Z   Btime  C   D   E 0    1.2  2  15   2    1.2  12  25  12 1    1.4  3  12   1    1.3  13  22  11 2    1.5  1  10   6    1.4  11  20  16 3    1.6  2   9  10    1.7  12  29  12 4    1.9  1   1   9    1.9  11  21  19 5    2.0  0   0   0    2.0   8  10  11 6    2.4  0   0   0    2.4  10  12  15 
like image 540
Ahmed Haque Avatar asked Aug 11 '14 23:08

Ahmed Haque


People also ask

How do I get the first row of a DataFrame column?

Get First Row Value of a Given Column in Pandas DataFrameDataFrame. iloc[] is used to get the value of any cell by providing a row and column index. I will use this to get the first-row value of any given column.

How do I get the first row value of a DataFrame?

Select & print first row of dataframe using head() It will return the first row of dataframe as a dataframe object. Using the head() function, we fetched the first row of dataframe as a dataframe and then just printed it.

How do I get the first row of data?

You can get the first row (i.e. the starting row number) in a range with a formula based on the ROW function. If you want only the first row number, you can use the MIN function to extract just the first row number, which will be the lowest number in the array.

How do you get the first row of a value in Python?

We can get the row index using . index[0] .


2 Answers

To select the ith row, use iloc:

In [31]: df_test.iloc[0] Out[31]:  ATime     1.2 X         2.0 Y        15.0 Z         2.0 Btime     1.2 C        12.0 D        25.0 E        12.0 Name: 0, dtype: float64 

To select the ith value in the Btime column you could use:

In [30]: df_test['Btime'].iloc[0] Out[30]: 1.2 

There is a difference between df_test['Btime'].iloc[0] (recommended) and df_test.iloc[0]['Btime']:

DataFrames store data in column-based blocks (where each block has a single dtype). If you select by column first, a view can be returned (which is quicker than returning a copy) and the original dtype is preserved. In contrast, if you select by row first, and if the DataFrame has columns of different dtypes, then Pandas copies the data into a new Series of object dtype. So selecting columns is a bit faster than selecting rows. Thus, although df_test.iloc[0]['Btime'] works, df_test['Btime'].iloc[0] is a little bit more efficient.

There is a big difference between the two when it comes to assignment. df_test['Btime'].iloc[0] = x affects df_test, but df_test.iloc[0]['Btime'] may not. See below for an explanation of why. Because a subtle difference in the order of indexing makes a big difference in behavior, it is better to use single indexing assignment:

df.iloc[0, df.columns.get_loc('Btime')] = x 

df.iloc[0, df.columns.get_loc('Btime')] = x (recommended):

The recommended way to assign new values to a DataFrame is to avoid chained indexing, and instead use the method shown by andrew,

df.loc[df.index[n], 'Btime'] = x 

or

df.iloc[n, df.columns.get_loc('Btime')] = x 

The latter method is a bit faster, because df.loc has to convert the row and column labels to positional indices, so there is a little less conversion necessary if you use df.iloc instead.


df['Btime'].iloc[0] = x works, but is not recommended:

Although this works, it is taking advantage of the way DataFrames are currently implemented. There is no guarantee that Pandas has to work this way in the future. In particular, it is taking advantage of the fact that (currently) df['Btime'] always returns a view (not a copy) so df['Btime'].iloc[n] = x can be used to assign a new value at the nth location of the Btime column of df.

Since Pandas makes no explicit guarantees about when indexers return a view versus a copy, assignments that use chained indexing generally always raise a SettingWithCopyWarning even though in this case the assignment succeeds in modifying df:

In [22]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1]) In [24]: df['bar'] = 100 In [25]: df['bar'].iloc[0] = 99 /home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame  See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy   self._setitem_with_indexer(indexer, value)  In [26]: df Out[26]:    foo  bar 0   A   99  <-- assignment succeeded 2   B  100 1   C  100 

df.iloc[0]['Btime'] = x does not work:

In contrast, assignment with df.iloc[0]['bar'] = 123 does not work because df.iloc[0] is returning a copy:

In [66]: df.iloc[0]['bar'] = 123 /home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame  See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy  In [67]: df Out[67]:    foo  bar 0   A   99  <-- assignment failed 2   B  100 1   C  100 

Warning: I had previously suggested df_test.ix[i, 'Btime']. But this is not guaranteed to give you the ith value since ix tries to index by label before trying to index by position. So if the DataFrame has an integer index which is not in sorted order starting at 0, then using ix[i] will return the row labeled i rather than the ith row. For example,

In [1]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])  In [2]: df Out[2]:    foo 0   A 2   B 1   C  In [4]: df.ix[1, 'foo'] Out[4]: 'C' 
like image 175
unutbu Avatar answered Sep 29 '22 23:09

unutbu


Note that the answer from @unutbu will be correct until you want to set the value to something new, then it will not work if your dataframe is a view.

In [4]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1]) In [5]: df['bar'] = 100 In [6]: df['bar'].iloc[0] = 99 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.16.0_19_g8d2818e-py2.7-macosx-10.9-x86_64.egg/pandas/core/indexing.py:118: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame  See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy   self._setitem_with_indexer(indexer, value) 

Another approach that will consistently work with both setting and getting is:

In [7]: df.loc[df.index[0], 'foo'] Out[7]: 'A' In [8]: df.loc[df.index[0], 'bar'] = 99 In [9]: df Out[9]:   foo  bar 0   A   99 2   B  100 1   C  100 
like image 36
andrew Avatar answered Sep 30 '22 00:09

andrew