In the following ipython3 session, I read differently-formatted tables and make the sum of the values found in one of the columns: <pre class="prettyprint"><code>In [278]: F = pd.read_table("../RNA_Seq_analyses/mapping_worm_number_tests/hisat2/mapped_C_elegans/feature_count/W100_1_on_C_elegans/protein_coding_fwd_counts.txt", skip ...: rows=2, usecols=[6]).sum() In [279]: S = pd.read_table("../RNA_Seq_analyses/mapping_worm_number_tests/hisat2/mapped_C_elegans/intersect_count/W100_1_on_C_elegans/protein_coding_fwd_counts.txt", us ...: ecols=[6], header=None).sum() In [280]: S Out[280]: 6 3551266 dtype: int64 In [281]: F Out[281]: 72 3164181 dtype: int64 In [282]: type(F) Out[282]: pandas.core.series.Series In [283]: type(S) Out[283]: pandas.core.series.Series In [284]: F[0] Out[284]: 3164181 In [285]: S[0] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-285-5a4339994a41> in <module>() ----> 1 S[0] /home/bli/.local/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key) 601 result = self.index.get_value(self, key) 602 --> 603 if not is_scalar(result): 604 if is_list_like(result) and not isinstance(result, Series): 605 /home/bli/.local/lib/python3.6/site-packages/pandas/indexes/base.py in get_value(self, series, key) pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3323)() pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3026)() pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)() pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)() pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)() KeyError: 0 </code></pre> How come the <code>F</code> and <code>S</code> objects have different behaviours if they result from similar operation (<code>sum</code>) and are of the same type (<code>pandas.core.series.Series</code>)? What is the correct way to extract the value I want (the sum of a column)? <h3>Edit: Trying solutions:</h3> <pre class="prettyprint"><code>In [297]: F["72"] Out[297]: 3164181 In [298]: S["6"] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)() pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8125)() TypeError: an integer is required During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) <ipython-input-298-0127424036a0> in <module>() ----> 1 S["6"] /home/bli/.local/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key) 601 result = self.index.get_value(self, key) 602 --> 603 if not is_scalar(result): 604 if is_list_like(result) and not isinstance(result, Series): 605 /home/bli/.local/lib/python3.6/site-packages/pandas/indexes/base.py in get_value(self, series, key) pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3323)() pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3026)() pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4075)() KeyError: '6' </code></pre> Further investigating: <pre class="prettyprint"><code>In [306]: print(S.index) Int64Index([6], dtype='int64') In [307]: print(F.index) Index(['72'], dtype='object') In [308]: S[6] Out[308]: 3551266 </code></pre> So the two objects ended up having different types of indices. This kind of behaviour reminds me of R... It seems that <code>header=None</code> resulted in columns indexed by numbers for <code>S</code>, whereas the absence of <code>header=None</code> combined with <code>skiprows=2</code> resulted in the index being generated from data read on the third row. (And this revealed a bug in the way I parsed the data in pandas...)

I think you need: <pre class="prettyprint"><code>#select first value of one element series f = F.iat[0] #alternative #f = F.iloc[0] </code></pre> Or: <pre class="prettyprint"><code>#convert to numpy array and select first value f = F.values[0] </code></pre> Or: <pre class="prettyprint"><code>f = F.item() </code></pre> And I think you get error, because no index value <code>0</code>. As IanS commented should be working select by index value <code>6</code> and <code>72</code>: <pre class="prettyprint"><code>f = F[72] #f = f.loc[72] s = S[6] #s = S.loc[6] </code></pre> Sample: <pre class="prettyprint"><code>F = pd.Series([3164181], index=[72]) f = F[72] print (f) 3164181 print (F.index) Int64Index([72], dtype='int64') print (F.index.tolist()) [72] f = F[0] print (f) </code></pre> <blockquote> KeyError: 0 </blockquote> You get one integer index in <code>S</code>, because parameter <code>header=None</code> - pandas add default index (<code>0,1,...</code>). For <code>F</code> is used <code>6th</code> column called <code>'72'</code> - it is string. There is difference.

KeyError when extracting data from a pandas.core.series.Series

Tags:

python

pandas

In the following ipython3 session, I read differently-formatted tables and make the sum of the values found in one of the columns:

In [278]: F = pd.read_table("../RNA_Seq_analyses/mapping_worm_number_tests/hisat2/mapped_C_elegans/feature_count/W100_1_on_C_elegans/protein_coding_fwd_counts.txt", skip
     ...: rows=2, usecols=[6]).sum()

In [279]: S = pd.read_table("../RNA_Seq_analyses/mapping_worm_number_tests/hisat2/mapped_C_elegans/intersect_count/W100_1_on_C_elegans/protein_coding_fwd_counts.txt", us
     ...: ecols=[6], header=None).sum()

In [280]: S
Out[280]: 
6    3551266
dtype: int64

In [281]: F
Out[281]: 
72    3164181
dtype: int64

In [282]: type(F)
Out[282]: pandas.core.series.Series

In [283]: type(S)
Out[283]: pandas.core.series.Series

In [284]: F[0]
Out[284]: 3164181

In [285]: S[0]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-285-5a4339994a41> in <module>()
----> 1 S[0]

/home/bli/.local/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    601             result = self.index.get_value(self, key)
    602 
--> 603             if not is_scalar(result):
    604                 if is_list_like(result) and not isinstance(result, Series):
    605 

/home/bli/.local/lib/python3.6/site-packages/pandas/indexes/base.py in get_value(self, series, key)

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3323)()

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3026)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)()

KeyError: 0

How come the F and S objects have different behaviours if they result from similar operation (sum) and are of the same type (pandas.core.series.Series)?

What is the correct way to extract the value I want (the sum of a column)?

Edit: Trying solutions:

In [297]: F["72"]
Out[297]: 3164181

In [298]: S["6"]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8125)()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-298-0127424036a0> in <module>()
----> 1 S["6"]

/home/bli/.local/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    601             result = self.index.get_value(self, key)
    602 
--> 603             if not is_scalar(result):
    604                 if is_list_like(result) and not isinstance(result, Series):
    605 

/home/bli/.local/lib/python3.6/site-packages/pandas/indexes/base.py in get_value(self, series, key)

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3323)()

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3026)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4075)()

KeyError: '6'

Further investigating:

In [306]: print(S.index)
Int64Index([6], dtype='int64')

In [307]: print(F.index)
Index(['72'], dtype='object')

In [308]: S[6]
Out[308]: 3551266

So the two objects ended up having different types of indices. This kind of behaviour reminds me of R...

It seems that header=None resulted in columns indexed by numbers for S, whereas the absence of header=None combined with skiprows=2 resulted in the index being generated from data read on the third row. (And this revealed a bug in the way I parsed the data in pandas...)

323

asked Aug 29 '17 14:08

bli

1 Answers

I think you need:

#select first value of one element series
f = F.iat[0]
#alternative 
#f = F.iloc[0]

Or:

#convert to numpy array and select first value
f = F.values[0]

Or:

f = F.item()

And I think you get error, because no index value 0.

As IanS commented should be working select by index value 6 and 72:

f = F[72] 
#f = f.loc[72]

s = S[6]
#s = S.loc[6]

Sample:

F = pd.Series([3164181], index=[72])

f = F[72] 
print (f)
3164181

print (F.index)
Int64Index([72], dtype='int64')

print (F.index.tolist())
[72]

f = F[0] 
print (f)

KeyError: 0

You get one integer index in S, because parameter header=None - pandas add default index (0,1,...). For F is used 6th column called '72' - it is string. There is difference.

153

answered Sep 30 '22 13:09

jezrael

Related questions
                            
                                Python 2.7 and Pandas Boxplot connecting median values
                            
                                Django: Update Page Information Without Refreshing
                            
                                Show group on every record in groupby
                            
                                Using the Django ORM, How can you create a unique hash for all possible combinations
                            
                                url_for with _external=True on heroku doesn't append the server name on the URL
                            
                                Why does the call method gets called at build time in Keras layers
                            
                                Colorbar for each row in ImageGrid
                            
                                Unit testing celery tasks directly
                            
                                DeprecationWarning: Non-string object detected for the array ordering. Please pass in 'C', 'F', 'A', or 'K' instead
                            
                                How to achieve TestNG like feature in Python Selenium or add multiple unit test in one test suite?
                            
                                How to share the same instance for all methods of a pytest test class
                            
                                How to protect Flask-RESTful with Flask-USER management?
                            
                                How to create a Git Pull Request in GitPython
                            
                                Python H2O Memory Management
                            
                                Fastest way to solve least square for overdetermined system
                            
                                How to create a msi by using cx_freeze which will accept command line input
                            
                                Adding Chart.js line chart to Jinja2/Flask html page from JS file
                            
                                Python: Identifying undulating patterns in 1d distribution
                            
                                Is there a way to print a short version of the docstring for all members of a Python object?
                            
                                PYMC3 Bayesian Prediction Cones

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With