In the following ipython3 session, I read differently-formatted tables and make the sum of the values found in one of the columns:
In [278]: F = pd.read_table("../RNA_Seq_analyses/mapping_worm_number_tests/hisat2/mapped_C_elegans/feature_count/W100_1_on_C_elegans/protein_coding_fwd_counts.txt", skip
...: rows=2, usecols=[6]).sum()
In [279]: S = pd.read_table("../RNA_Seq_analyses/mapping_worm_number_tests/hisat2/mapped_C_elegans/intersect_count/W100_1_on_C_elegans/protein_coding_fwd_counts.txt", us
...: ecols=[6], header=None).sum()
In [280]: S
Out[280]:
6 3551266
dtype: int64
In [281]: F
Out[281]:
72 3164181
dtype: int64
In [282]: type(F)
Out[282]: pandas.core.series.Series
In [283]: type(S)
Out[283]: pandas.core.series.Series
In [284]: F[0]
Out[284]: 3164181
In [285]: S[0]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-285-5a4339994a41> in <module>()
----> 1 S[0]
/home/bli/.local/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
601 result = self.index.get_value(self, key)
602
--> 603 if not is_scalar(result):
604 if is_list_like(result) and not isinstance(result, Series):
605
/home/bli/.local/lib/python3.6/site-packages/pandas/indexes/base.py in get_value(self, series, key)
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3323)()
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3026)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)()
KeyError: 0
How come the F
and S
objects have different behaviours if they result from similar operation (sum
) and are of the same type (pandas.core.series.Series
)?
What is the correct way to extract the value I want (the sum of a column)?
In [297]: F["72"]
Out[297]: 3164181
In [298]: S["6"]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8125)()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-298-0127424036a0> in <module>()
----> 1 S["6"]
/home/bli/.local/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
601 result = self.index.get_value(self, key)
602
--> 603 if not is_scalar(result):
604 if is_list_like(result) and not isinstance(result, Series):
605
/home/bli/.local/lib/python3.6/site-packages/pandas/indexes/base.py in get_value(self, series, key)
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3323)()
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3026)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4075)()
KeyError: '6'
Further investigating:
In [306]: print(S.index)
Int64Index([6], dtype='int64')
In [307]: print(F.index)
Index(['72'], dtype='object')
In [308]: S[6]
Out[308]: 3551266
So the two objects ended up having different types of indices. This kind of behaviour reminds me of R...
It seems that header=None
resulted in columns indexed by numbers for S
, whereas the absence of header=None
combined with skiprows=2
resulted in the index being generated from data read on the third row. (And this revealed a bug in the way I parsed the data in pandas...)
Accessing elements of a Pandas Series. Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). Labels need not be unique but must be a hashable type. An element in the series can be accessed similarly to that in an ndarray.
Since there is no ‘point’ column in our DataFrame, we receive a KeyError. The way to fix this error is to simply make sure we spell the column name correctly.
This error occurs when you attempt to access some column in a pandas DataFrame that does not exist. Typically this error occurs when you simply misspell a column names or include an accidental space before or after the column name. The following example shows how to fix this error in practice.
The following code shows how to get the value in the third position of a pandas Series using the index value: import pandas as pd #define Series my_series = pd.Series( ['A', 'B', 'C', 'D', 'E']) #get third value in Series print(my_series [2]) C
I think you need:
#select first value of one element series
f = F.iat[0]
#alternative
#f = F.iloc[0]
Or:
#convert to numpy array and select first value
f = F.values[0]
Or:
f = F.item()
And I think you get error, because no index value 0
.
As IanS commented should be working select by index value 6
and 72
:
f = F[72]
#f = f.loc[72]
s = S[6]
#s = S.loc[6]
Sample:
F = pd.Series([3164181], index=[72])
f = F[72]
print (f)
3164181
print (F.index)
Int64Index([72], dtype='int64')
print (F.index.tolist())
[72]
f = F[0]
print (f)
KeyError: 0
You get one integer index in S
, because parameter header=None
- pandas add default index (0,1,...
). For F
is used 6th
column called '72'
- it is string. There is difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With