python pandas pivot: How to do a proper tidyr-like spread?

Tags:

I am missing spontaneous and easy conversion from long to wide and vice versa in Python. Imagine, I have a large tidy dataframe with a lot of property-columns and a single column that contains all the actual values like

PropA ... PropZ    Value
green     Saturn   400
green     Venus    3
red       Venus    2
.
.

The data itself is very nicely handled by keeping it tidy. But sometimes I have to perform some actions across certain properties (for instance it might be interesting to compare beeing red vs beeing green (for all the items that are similar w.r.t the other properties)). So the straight-forward way would be to keep it tidy as much as possible and only untidy the certain property which i am interested in (PropA). Subsequently, I could perform a row-wise map with whatever function I desire and I could create an additional PropA-Entry which contains the function-ouput.

However, keeping all the other properties tidy is not as easy in Python as I was used to by using R. The reason is, all the not-pivotal properties are surrendered to the index with all the pd-methods I found. Thats a whole mess if I have some more columns I want to keep.

So how do you get along with this problem. Is there some neat other way of dealing with those type of problems?

I have written a spread-method myself, but it is awefully slow. Maybe, you ahve some ideas how I can improve.

#the idea is to group by the remaining properties, which should be left in the long format.
#then i spread the small tidy data table for any group
    @staticmethod
    def spread(df, propcol, valcol):
        def flip(data,pc,vc):
            data = data.reset_index(drop=True)
            return {data[pc][i]:[data[vc][i]] for i in range(0,len(data))}

        #index columns are all which are not affected
        indcols = list(df.columns)
        indcols.remove(propcol)
        indcols.remove(valcol)

        tmpdf=pd.DataFrame()
        for key, group in df.groupby(indcols):
            dc1 = {a:[b] for (a,b) in zip(indcols,key)}
            dc2 = flip(group,propcol,valcol)
            tmpdf = pd.concat([tmpdf,pd.concat([pd.DataFrame(dc1),pd.DataFrame(dc2)],axis=1)])

        return tmpdf.reset_index(drop = True)

244

asked Jul 20 '17 16:07

Antalagor

1 Answers

with help of the hint, i've created a simplier version. i am still a little confused with the index mechanic, but time will help me get a better understanding.

def spread(df, propcol, valcol):
    indcol = list(df.columns.drop(valcol))
    df = df.set_index(indcol).unstack(propcol).reset_index()
    df.columns = [i[1] if i[0] == valcol else i[0] for i in df.columns]
    return df

113

answered Oct 17 '22 00:10

Antalagor

Related questions
                            
                                How to programatically group lap times into teams to minimize difference?
                            
                                Python Export Excel Sheet Range as Image
                            
                                numpy replace array elements with average of 2*2 blocks
                            
                                Solving a BVP with scipy's solve_bvp
                            
                                How do I train multiple neural nets simultaneously in keras?
                            
                                How can I Group By Month, Day from a Date field using Python/Pandas
                            
                                pytest: environment variable to specify pytest.ini location
                            
                                Cannot find a file in my tempfile.TemporaryDirectory() for Python3
                            
                                Collecting results from python coroutines before loop finishes
                            
                                One line solution for editing a numpy array of counts? (python)
                            
                                Fetch data from form and display in template
                            
                                Python: How to update a value in Google BigQuery in less than 40 seconds?
                            
                                Python .loc confusion
                            
                                Maxvalue in cv2.minMaxLoc()?
                            
                                Handle 1000 concurrent requests for Flask/Gunicorn web service
                            
                                Iterating over all notes in Music21
                            
                                Fill a matrix from a matrix of indices
                            
                                Python define function inside if block or vice versa
                            
                                Python: interpolating in a triangular mesh
                            
                                Formatting an entire pandas dataframe as a string, row by row

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python pandas pivot: How to do a proper tidyr-like spread?

Tags:

python

pandas

pivot

spread

Antalagor

People also ask

1 Answers

Antalagor

Recent Activity

Donate For Us