I have this dataframe <pre class="prettyprint"><code>Python 3.9.0 (v3.9.0:9cf6752276, Oct 5 2020, 11:29:23) [Clang 6.0 (clang-600.0.57)] on darwin >>> import pandas as pd >>> import datetime as datetime >>> pd.__version__ '1.3.5' >>> dates = [datetime.datetime(2012, 2, 3) , datetime.datetime(2012, 2, 4)] >>> x = pd.DataFrame({'Time': dates, 'Selected': [0, 0], 'Nr': [123.4, 25.2]}) >>> x.set_index('Time', inplace=True) >>> x Selected Nr Time 2012-02-03 0 123.4 2012-02-04 0 25.2 </code></pre> An integer value from an integer column is converted to a float in the example but I do not see the reason for this conversion. In both cases I assume I pick the value from the <code>'Selected'</code> column from the first row. What is going on? <pre class="prettyprint"><code>>>> x['Selected'].iloc[0] 0 >>> x.iloc[0]['Selected'] 0.0 >>> x['Selected'].dtype dtype('int64') </code></pre>

<code>x.iloc[0]</code> selects a single "row". A new series object is actually created. When it decides on the dtype of that row, a <code>pd.Series</code>, it uses a floating point type, since that would not lose information in the <code>"Nr"</code> column. On the other hand, <code>x['Selected'].iloc[0]</code> first selects a column, which will always preserve the dtype. <code>pandas</code> is fundamentally "column oriented". You can think of a dataframe as a dictionary of columns (it isn't, although I believe it used to essentially have that under the hood, but now it uses a more complex "block manager" approach, but these are internal implementation details)

Why is this float conversion made

Tags:

python

pandas

dataframe

I have this dataframe

Python 3.9.0 (v3.9.0:9cf6752276, Oct  5 2020, 11:29:23) 
[Clang 6.0 (clang-600.0.57)] on darwin
>>> import pandas as pd  
>>> import datetime as datetime
>>> pd.__version__
'1.3.5'
>>> dates = [datetime.datetime(2012, 2, 3) , datetime.datetime(2012, 2, 4)]
>>> x = pd.DataFrame({'Time': dates, 'Selected': [0, 0], 'Nr': [123.4, 25.2]})
>>> x.set_index('Time', inplace=True)
>>> x
            Selected     Nr
Time                       
2012-02-03         0  123.4
2012-02-04         0   25.2

An integer value from an integer column is converted to a float in the example but I do not see the reason for this conversion. In both cases I assume I pick the value from the 'Selected' column from the first row. What is going on?

>>> x['Selected'].iloc[0]
0
>>> x.iloc[0]['Selected']
0.0
>>> x['Selected'].dtype 
dtype('int64')

868

asked Dec 16 '21 20:12

Elmex80s

1 Answers

x.iloc[0] selects a single "row". A new series object is actually created. When it decides on the dtype of that row, a pd.Series, it uses a floating point type, since that would not lose information in the "Nr" column.

On the other hand, x['Selected'].iloc[0] first selects a column, which will always preserve the dtype.

pandas is fundamentally "column oriented". You can think of a dataframe as a dictionary of columns (it isn't, although I believe it used to essentially have that under the hood, but now it uses a more complex "block manager" approach, but these are internal implementation details)

131

answered Oct 24 '22 19:10

juanpa.arrivillaga

Related questions
                            
                                Is it possible to run a local python script with a remote ssh interpreter via Visual Studio Code?
                            
                                Behavior of __new__ in a metaclass (also in context of inheritance)
                            
                                Python Multiprocessing Pool as Decorator
                            
                                Drawing a neural network
                            
                                Return value from one Airflow DAG into another one
                            
                                sample from randomly generated numbers?
                            
                                Vectorization or efficient way to calculate Longest Increasing subsequence of tuples with Pandas
                            
                                How to build a heatmap?
                            
                                Django to return a view with TokenAuthentication for WebView
                            
                                Why np.hypot and np.subtract.outer very fast compared to vanilla broadcast ? Using Numba for speedup numpy in parallel for distance matrix calculation
                            
                                Can I use abstract methods to import file-specific formatting of (Python) pandas data?
                            
                                Tensorflow issue with softmax
                            
                                Python program to print the pattern 1 121 12321 12 1 for n rows [duplicate]
                            
                                Python symlink to python3
                            
                                Is it possible to make the imports within `__init__.py` visible for python `help()` command?
                            
                                How to convert JPG images to AVIF with Python
                            
                                Audio recording in Python with Pyaudio, error ||PaMacCore (AUHAL)|| ... msg=Audio Unit: cannot do in current context
                            
                                How to use pytest to simulate full reboot
                            
                                Is there a way to make python showtraceback in jupyter notebooks scrollable?
                            
                                Plot bar chart in multiple subplot rows with Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With