I know that normally pandas' itertuples() will return the values of each including the column names as follows:
ab=pd.DataFrame(np.random.random([3,3]),columns=['hi','low','med'])
for i in ab.itertuples():
print(i)
and the output is as follows:
Pandas(Index=0, hi=0.05421443, low=0.2456833, med=0.491185)
Pandas(Index=1, hi=0.28670429, low=0.5828551, med=0.279305)
Pandas(Index=2, hi=0.53869406, low=0.3427290, med=0.750075)
However, I have no idea why it doesn't shows the columns as I expected for my another set of code as below:
us qqq equity us spy equity
date
2017-06-19 0.0 1.0
2017-06-20 0.0 -1.0
2017-06-21 0.0 0.0
2017-06-22 0.0 0.0
2017-06-23 1.0 0.0
2017-06-26 0.0 0.0
2017-06-27 -1.0 0.0
2017-06-28 1.0 0.0
2017-06-29 -1.0 0.0
2017-06-30 0.0 0.0
the above is a Pandas Dataframe with Timestamp as index, float64 as the values in the list, and a list of string ['us qqq equity','us spy equity'] as the columns.
When I do this:
for row in data.itertuples():
print (row)
It shows the columns as _1 and _2 as follows:
Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0)
Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0)
Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-22 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-23 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-26 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-27 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-28 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-29 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-30 00:00:00'), _1=0.0, _2=0.0)
Does anyone has any clue about what have I done wrong? Does it have to do with some variable referencing issue when creating the original dataframe? (Also, as a side question, I learnt from the community that the type of data generated from itertuples() should be tuples, but it seems (as shown above), the return type is as I verified from the type statement?)
Thank you for all your patience as I am still trying to master the application of DataFrame.
One way to rename columns in Pandas is to use df. columns from Pandas and assign new names directly. For example, if you have the names of columns in a list, you can assign the list to column names directly. This will assign the names in the list as column names for the data frame “gapminder”.
The reason iterrows() is slower than itertuples() is due to iterrows() doing a lot of type checks in the lifetime of its call.
You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.
We can use pandas DataFrame rename() function to rename columns and indexes. It supports the following parameters. mapper: dictionary or a function to apply on the columns and indexes.
This seems to be an issue with handling column names having spaces in them. If you replace the column names with different ones without spaces, it will work:
df.columns = ['us_qqq_equity', 'us_spy_equity']
# df.columns = df.columns.str.replace(r'\s+', '_') # Courtesy @MaxU
for r in df.head().itertuples():
print(r)
# Pandas(Index='2017-06-19', us_qqq_equity=0.0, us_spy_equity=1.0)
# Pandas(Index='2017-06-20', us_qqq_equity=0.0, us_spy_equity=-1.0)
# ...
Column names with spaces cannot effectively be represented in named tuples, so they are renamed automatically when printing.
Interesting observation: out of DataFrame.iterrows()
, DataFrame.iteritems()
, DataFrame.itertuples()
only the last one renames the columns, containing spaces:
In [140]: df = df.head(3)
In [141]: list(df.iterrows())
Out[141]:
[(Timestamp('2017-06-19 00:00:00'), us qqq equity 0.0
us spy equity 1.0
Name: 2017-06-19 00:00:00, dtype: float64),
(Timestamp('2017-06-20 00:00:00'), us qqq equity 0.0
us spy equity -1.0
Name: 2017-06-20 00:00:00, dtype: float64),
(Timestamp('2017-06-21 00:00:00'), us qqq equity 0.0
us spy equity 0.0
Name: 2017-06-21 00:00:00, dtype: float64)]
In [142]: list(df.iteritems())
Out[142]:
[('us qqq equity', date
2017-06-19 0.0
2017-06-20 0.0
2017-06-21 0.0
Name: us qqq equity, dtype: float64), ('us spy equity', date
2017-06-19 1.0
2017-06-20 -1.0
2017-06-21 0.0
Name: us spy equity, dtype: float64)]
In [143]: list(df.itertuples())
Out[143]:
[Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0),
Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0),
Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With