Pandas df.itertuples renaming dataframe columns when printing

Tags:

I know that normally pandas' itertuples() will return the values of each including the column names as follows:

ab=pd.DataFrame(np.random.random([3,3]),columns=['hi','low','med'])
for i in ab.itertuples():
    print(i)

and the output is as follows:

Pandas(Index=0, hi=0.05421443, low=0.2456833, med=0.491185)
Pandas(Index=1, hi=0.28670429, low=0.5828551, med=0.279305)
Pandas(Index=2, hi=0.53869406, low=0.3427290, med=0.750075)

However, I have no idea why it doesn't shows the columns as I expected for my another set of code as below:

            us qqq equity  us spy equity
date                                    
2017-06-19            0.0            1.0
2017-06-20            0.0           -1.0
2017-06-21            0.0            0.0
2017-06-22            0.0            0.0
2017-06-23            1.0            0.0
2017-06-26            0.0            0.0
2017-06-27           -1.0            0.0
2017-06-28            1.0            0.0
2017-06-29           -1.0            0.0
2017-06-30            0.0            0.0

the above is a Pandas Dataframe with Timestamp as index, float64 as the values in the list, and a list of string ['us qqq equity','us spy equity'] as the columns.

When I do this:

for row in data.itertuples():
    print (row)

It shows the columns as _1 and _2 as follows:

Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0)
Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0)
Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-22 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-23 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-26 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-27 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-28 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-29 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-30 00:00:00'), _1=0.0, _2=0.0)

Does anyone has any clue about what have I done wrong? Does it have to do with some variable referencing issue when creating the original dataframe? (Also, as a side question, I learnt from the community that the type of data generated from itertuples() should be tuples, but it seems (as shown above), the return type is as I verified from the type statement?)

Thank you for all your patience as I am still trying to master the application of DataFrame.

635

asked Jul 25 '17 15:07

user7786493

2 Answers

This seems to be an issue with handling column names having spaces in them. If you replace the column names with different ones without spaces, it will work:

df.columns = ['us_qqq_equity', 'us_spy_equity'] 
# df.columns = df.columns.str.replace(r'\s+', '_')  # Courtesy @MaxU  
for r in df.head().itertuples():
    print(r)

# Pandas(Index='2017-06-19', us_qqq_equity=0.0, us_spy_equity=1.0)
# Pandas(Index='2017-06-20', us_qqq_equity=0.0, us_spy_equity=-1.0)
# ...

Column names with spaces cannot effectively be represented in named tuples, so they are renamed automatically when printing.

128

answered Oct 18 '22 21:10

cs95

Interesting observation: out of DataFrame.iterrows(), DataFrame.iteritems(), DataFrame.itertuples() only the last one renames the columns, containing spaces:

In [140]: df = df.head(3)

In [141]: list(df.iterrows())
Out[141]:
[(Timestamp('2017-06-19 00:00:00'), us qqq equity    0.0
  us spy equity    1.0
  Name: 2017-06-19 00:00:00, dtype: float64),
 (Timestamp('2017-06-20 00:00:00'), us qqq equity    0.0
  us spy equity   -1.0
  Name: 2017-06-20 00:00:00, dtype: float64),
 (Timestamp('2017-06-21 00:00:00'), us qqq equity    0.0
  us spy equity    0.0
  Name: 2017-06-21 00:00:00, dtype: float64)]

In [142]: list(df.iteritems())
Out[142]:
[('us qqq equity', date
  2017-06-19    0.0
  2017-06-20    0.0
  2017-06-21    0.0
  Name: us qqq equity, dtype: float64), ('us spy equity', date
  2017-06-19    1.0
  2017-06-20   -1.0
  2017-06-21    0.0
  Name: us spy equity, dtype: float64)]

In [143]: list(df.itertuples())
Out[143]:
[Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0),
 Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0),
 Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)]

answered Oct 18 '22 21:10

MaxU - stop WAR against UA

Related questions
                            
                                Pandas multiindex dataframe - Selecting max from one index within multiindex
                            
                                Error installing NLTK Python
                            
                                find out all child elements xpath from parent xpath using selenium webdriver in python
                            
                                PyCharm Python Console - Printing on the same line not working as intended
                            
                                Find index where elements change value pandas dataframe
                            
                                attach img file in pdf weasyprint
                            
                                pytorch Network.parameters() missing 1 required positional argument: 'self'
                            
                                How to create a grouped bar chart in Altair?
                            
                                Where is the luigi config file?
                            
                                Setting both axes logarithmic in bar plot matploblib
                            
                                Why does insert script using cx_Oracle hangs
                            
                                How do I increase decimal precision in Spark?
                            
                                error with snappy while importing fastparquet in python
                            
                                How to set default_app_config for Django with apps directory structure?
                            
                                python: pandas np.where vs. df.loc with multiple conditions
                            
                                split a numpy array both horizontally and vertically
                            
                                How to draw a classic stock chart with matplotlib?
                            
                                _pickle.UnpicklingError: invalid load key, 'x'
                            
                                TypeError: Object of type 'Tag' is not JSON serializable
                            
                                Install LabelImg Annotation tool in Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas df.itertuples renaming dataframe columns when printing

Tags:

python

iteration

pandas

dataframe

user7786493

People also ask

2 Answers

cs95

MaxU - stop WAR against UA

Recent Activity

Donate For Us