Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Changing a specific row to percentages

I have a row in a Pandas Data frame that contains the sales rate of my items.

A look at my data:

block_combine
Out[78]: 
END_MONTH         1    2    3   4    5
Total Listings  168  219  185  89  112
Total Sales      85   85   84  41   46

I can easily calculate the sales % by doing the following:

block_combine.loc["Total Sales Rate"] = block_combine.ix[1,:] / block_combine.ix[0,:]
block_combine

Out[79]: 
END_MONTH                  1           2           3          4           5
Total Listings    168.000000  219.000000  185.000000  89.000000  112.000000
Total Sales        85.000000   85.000000   84.000000  41.000000   46.000000
Total Sales Rate    0.505952    0.388128    0.454054   0.460674    0.410714

Now what I am attempting to do is to change the "Total Sales Rate" row to whole number percentages. I am able to do this if it was a column however I run into issues when I work with rows.

Here is what I attempted:

block_combine.loc["Total Sales Rate"] = pd.Series(["{0:.0f}%".format(val * 100) for val in block_combine.loc["Total Sales Rate"]])


block_combine

Out[81]: In [82]: 
END_MONTH           1    2    3    4      5
Total Listings    168  219  185   89  112.0
Total Sales        85   85   84   41   46.0
Total Sales Rate  39%  45%  46%  41%    NaN

The calculations are off/ shifted to the left. The sales rate given for month 1 is actually the sales rate for month 2 (39%)!

like image 985
Kevin Avatar asked Jun 29 '16 18:06

Kevin


People also ask

How do I change a number to percentage in pandas?

Convert Numeric to Percentage String To convert it back to percentage string, we will need to use python's string format syntax '{:. 2%}'. format to add the '%' sign back. Then we use python's map() function to iterate and apply the formatting to all the rows in the 'median_listing_price_yy' column.

How do you calculate row percentage in pandas?

You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.

How do I format a column into a percentage in python?

format({'var1': "{:. 2f}",'var2': "{:. 2f}",'var3': "{:. 2%}"}) Thanks!

How do you calculate change in pandas?

Use pct_change() to Calculate Percentage Change in Pandas periods - having default value 1 . It specifies the periods to shift to calculate the percent change.


1 Answers

You could use .apply('{:.0%}'.format):

import pandas as pd

df = pd.DataFrame([(168,219,185,89,112), (85,85,84,41,46)], 
                  index=['Total Listings', 'Total Sales'], columns=list(range(1,6)))
df.loc['Total Sales Rate'] = ((df.loc['Total Sales']/df.loc['Total Listings'])
                              .apply('{:.0%}'.format))

print(df)

yields

                    1    2    3    4    5
Total Listings    168  219  185   89  112
Total Sales        85   85   84   41   46
Total Sales Rate  51%  39%  45%  46%  41%

Notice that the Python str.format method has a built-in % format which multiplies the number by 100 and displays in fixed ('f') format, followed by a percent sign.


It is important to be aware that Pandas DataFrame columns must have a single dtype. Changing one value to a string forces the entire column to change its dtype to the generic object dtype. Thus the int64s or int32s in the Total Listings and Total Sales rows get recast as plain Python ints. This prevents Pandas from taking advantage of fast NumPy-based numerical operations which only work on native NumPy dtypes (like int64 or float64 -- not object).

So while the above code achieves the desired look, it isn't advisable to use this if further computation is to be done on the DataFrame. Instead, only convert to strings at the end if you need to do so for presentation.

Or, alternatively, transpose your DataFrame so the Total Sales Rate strings are in a column, not a row:

import pandas as pd

df = pd.DataFrame([(168,219,185,89,112), (85,85,84,41,46)], 
                  index=['Total Listings', 'Total Sales'], columns=list(range(1,6))).T

df['Total Sales Rate'] = ((df['Total Sales']/df['Total Listings'])
                              .apply('{:.0%}'.format))

print(df)

yields

   Total Listings  Total Sales Total Sales Rate
1             168           85              51%
2             219           85              39%
3             185           84              45%
4              89           41              46%
5             112           46              41%

The reason why

block_combine.loc["Total Sales Rate"] = pd.Series(["{0:.0f}%".format(val * 100) for val in block_combine.loc["Total Sales Rate"]])

shifted the values to the left by one column is because the new Series has an index which starts at 0 not 1. Pandas aligns the index of the Series on the right with the index of block_combine.loc["Total Sales Rate"] before assigning values to block_combine.loc["Total Sales Rate"].

Thus, you could alternatively have used:

block_combine.loc["Total Sales Rate"] = pd.Series(["{0:.0f}%".format(val * 100) 
    for val in block_combine.loc["Total Sales Rate"]], 
    index=block_combine.columns)
like image 54
unutbu Avatar answered Sep 30 '22 12:09

unutbu