I am missing spontaneous and easy conversion from long to wide and vice versa in Python. Imagine, I have a large tidy dataframe with a lot of property-columns and a single column that contains all the actual values like
PropA ... PropZ Value
green Saturn 400
green Venus 3
red Venus 2
.
.
The data itself is very nicely handled by keeping it tidy. But sometimes I have to perform some actions across certain properties (for instance it might be interesting to compare beeing red vs beeing green (for all the items that are similar w.r.t the other properties)). So the straight-forward way would be to keep it tidy as much as possible and only untidy the certain property which i am interested in (PropA). Subsequently, I could perform a row-wise map with whatever function I desire and I could create an additional PropA-Entry which contains the function-ouput.
However, keeping all the other properties tidy is not as easy in Python as I was used to by using R. The reason is, all the not-pivotal properties are surrendered to the index with all the pd-methods I found. Thats a whole mess if I have some more columns I want to keep.
So how do you get along with this problem. Is there some neat other way of dealing with those type of problems?
I have written a spread-method myself, but it is awefully slow. Maybe, you ahve some ideas how I can improve.
#the idea is to group by the remaining properties, which should be left in the long format.
#then i spread the small tidy data table for any group
@staticmethod
def spread(df, propcol, valcol):
def flip(data,pc,vc):
data = data.reset_index(drop=True)
return {data[pc][i]:[data[vc][i]] for i in range(0,len(data))}
#index columns are all which are not affected
indcols = list(df.columns)
indcols.remove(propcol)
indcols.remove(valcol)
tmpdf=pd.DataFrame()
for key, group in df.groupby(indcols):
dc1 = {a:[b] for (a,b) in zip(indcols,key)}
dc2 = flip(group,propcol,valcol)
tmpdf = pd.concat([tmpdf,pd.concat([pd.DataFrame(dc1),pd.DataFrame(dc2)],axis=1)])
return tmpdf.reset_index(drop = True)
The process of stacking pivots a level of column labels to the row index. Unstacking performs the opposite, pivoting a level of the row index into the column index.
You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.
with help of the hint, i've created a simplier version. i am still a little confused with the index mechanic, but time will help me get a better understanding.
def spread(df, propcol, valcol):
indcol = list(df.columns.drop(valcol))
df = df.set_index(indcol).unstack(propcol).reset_index()
df.columns = [i[1] if i[0] == valcol else i[0] for i in df.columns]
return df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With