Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

i can't pull data by column name?

I just started working with the pandas library. Despite my research, I still haven't figured it out. I want to pull the data for the column named q. But it gives an error. How can I do that?

import pandas as pd
data = pd.read_excel('test1.xlsx')
df = pd.DataFrame(data)
print(df.loc[df['q']])  

Error:

            Traceback (most recent call last):
            File "c:/Users/sabca/visual studio code projects/webscraping/pandastest.py", line 11, 
            in <module>
                print(df.loc[df['q']])
            File "C:\Users\sabca\AppData\Local\Programs\Python\Python38\lib\site- 
            packages\pandas\core\indexing.py", line 879, in __getitem__        
                return self._getitem_axis(maybe_callable, axis=axis)
            File "C:\Users\sabca\AppData\Local\Programs\Python\Python38\lib\site- 
            packages\pandas\core\indexing.py", line 1099, in _getitem_axis     
                return self._getitem_iterable(key, axis=axis)
            File "C:\Users\sabca\AppData\Local\Programs\Python\Python38\lib\site- 
            packages\pandas\core\indexing.py", line 1037, in _getitem_iterable
                keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
            File "C:\Users\sabca\AppData\Local\Programs\Python\Python38\lib\site- 
            packages\pandas\core\indexing.py", line 1254, in _get_listlike_indexer
                self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
            File "C:\Users\sabca\AppData\Local\Programs\Python\Python38\lib\site- 
            packages\pandas\core\indexing.py", line 1298, in _validate_read_indexer
                raise KeyError(f"None of [{key}] are in the [{axis_name}]")
            KeyError: "None of [Index(['qwe1', 'asdf1', 'adfs4', 'wer7', 'tyu1', 'ghfhg5'], 
            dtype='object')] are in the [index]"
like image 785
sabcan Avatar asked Dec 22 '22 16:12

sabcan


1 Answers

Fixing the data/df confusion

Firstly, there's really no need for the line

df = pd.DataFrame(data)

As data is already a Pandas DataFrame as returned by the pd.read_excel function.

Instead I would suggest ommiting this line and going simply with the following (I will be using df to refer to the Pandas DataFrame generated using this function for the remainder of this answer).

df = pd.read_excel('test1.xlsx')

Returning a Pandas Series from column q

Assuming that q is the name of a column in your df then:

df['q']

Will return a Pandas Series representing the column q.

If you wanted to use df.loc this indexing method takes a range of rows to be returned as the first item and an optional range of columns as the second. Assuming you are seeking to return all rows of the column q you could use.

df.loc[:, 'q']

Returning a Numpy array of values from column q

You could use:

df['q'].values

To return a Numpy array containing the values stored within the q column.

like image 67
JPI93 Avatar answered Dec 25 '22 04:12

JPI93