Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas df.itertuples renaming dataframe columns when printing

I know that normally pandas' itertuples() will return the values of each including the column names as follows:

ab=pd.DataFrame(np.random.random([3,3]),columns=['hi','low','med'])
for i in ab.itertuples():
    print(i)

and the output is as follows:

Pandas(Index=0, hi=0.05421443, low=0.2456833, med=0.491185)
Pandas(Index=1, hi=0.28670429, low=0.5828551, med=0.279305)
Pandas(Index=2, hi=0.53869406, low=0.3427290, med=0.750075)

However, I have no idea why it doesn't shows the columns as I expected for my another set of code as below:

            us qqq equity  us spy equity
date                                    
2017-06-19            0.0            1.0
2017-06-20            0.0           -1.0
2017-06-21            0.0            0.0
2017-06-22            0.0            0.0
2017-06-23            1.0            0.0
2017-06-26            0.0            0.0
2017-06-27           -1.0            0.0
2017-06-28            1.0            0.0
2017-06-29           -1.0            0.0
2017-06-30            0.0            0.0

the above is a Pandas Dataframe with Timestamp as index, float64 as the values in the list, and a list of string ['us qqq equity','us spy equity'] as the columns.

When I do this:

for row in data.itertuples():
    print (row)

It shows the columns as _1 and _2 as follows:

Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0)
Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0)
Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-22 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-23 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-26 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-27 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-28 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-29 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-30 00:00:00'), _1=0.0, _2=0.0)

Does anyone has any clue about what have I done wrong? Does it have to do with some variable referencing issue when creating the original dataframe? (Also, as a side question, I learnt from the community that the type of data generated from itertuples() should be tuples, but it seems (as shown above), the return type is as I verified from the type statement?)

Thank you for all your patience as I am still trying to master the application of DataFrame.

like image 635
user7786493 Avatar asked Jul 25 '17 15:07

user7786493


People also ask

How do I fix column names in Pandas?

One way to rename columns in Pandas is to use df. columns from Pandas and assign new names directly. For example, if you have the names of columns in a list, you can assign the list to column names directly. This will assign the names in the list as column names for the data frame “gapminder”.

Why is Itertuples faster than Iterrows?

The reason iterrows() is slower than itertuples() is due to iterrows() doing a lot of type checks in the lifetime of its call.

How do I print DataFrame column names?

You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.

How do I change the default column name in Pandas?

We can use pandas DataFrame rename() function to rename columns and indexes. It supports the following parameters. mapper: dictionary or a function to apply on the columns and indexes.


2 Answers

This seems to be an issue with handling column names having spaces in them. If you replace the column names with different ones without spaces, it will work:

df.columns = ['us_qqq_equity', 'us_spy_equity'] 
# df.columns = df.columns.str.replace(r'\s+', '_')  # Courtesy @MaxU  
for r in df.head().itertuples():
    print(r)

# Pandas(Index='2017-06-19', us_qqq_equity=0.0, us_spy_equity=1.0)
# Pandas(Index='2017-06-20', us_qqq_equity=0.0, us_spy_equity=-1.0)
# ...

Column names with spaces cannot effectively be represented in named tuples, so they are renamed automatically when printing.

like image 128
cs95 Avatar answered Oct 18 '22 21:10

cs95


Interesting observation: out of DataFrame.iterrows(), DataFrame.iteritems(), DataFrame.itertuples() only the last one renames the columns, containing spaces:

In [140]: df = df.head(3)

In [141]: list(df.iterrows())
Out[141]:
[(Timestamp('2017-06-19 00:00:00'), us qqq equity    0.0
  us spy equity    1.0
  Name: 2017-06-19 00:00:00, dtype: float64),
 (Timestamp('2017-06-20 00:00:00'), us qqq equity    0.0
  us spy equity   -1.0
  Name: 2017-06-20 00:00:00, dtype: float64),
 (Timestamp('2017-06-21 00:00:00'), us qqq equity    0.0
  us spy equity    0.0
  Name: 2017-06-21 00:00:00, dtype: float64)]

In [142]: list(df.iteritems())
Out[142]:
[('us qqq equity', date
  2017-06-19    0.0
  2017-06-20    0.0
  2017-06-21    0.0
  Name: us qqq equity, dtype: float64), ('us spy equity', date
  2017-06-19    1.0
  2017-06-20   -1.0
  2017-06-21    0.0
  Name: us spy equity, dtype: float64)]

In [143]: list(df.itertuples())
Out[143]:
[Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0),
 Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0),
 Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)]
like image 22
MaxU - stop WAR against UA Avatar answered Oct 18 '22 21:10

MaxU - stop WAR against UA