Create columns based on unique column values and fill

Tags:

python

pandas

I have the following dataframe:

    Timestamp   id  lat         long
0   665047      a   30.508420   -84.372882
1   665047      b   30.491882   -84.372938
2   2058714     b   30.492026   -84.372938
3   665348      a   30.508420   -84.372882
4   2055292     b   30.491899   -84.372938

My desired result is to have:

    Timestamp                        a                       b     
0   665047     [30.508420,  -84.372882] [30.491882, -84.372938]
1   665348     [30.508420,  -84.372882]                    NaN
2   2055292                        NaN  [30.491899, -84.372938]
3   2058714                        NaN  [30.492026, -84.372938]

Where the unique values found in df.id become column headers (there can be several thousand of these), with their latitude and longitude as values.

The closest I have come is using:

for i, r in df.iterrows():
    dct[r.Timestamp].append([r.id, r.lat, r.long])

pd.DataFrame.from_dict(dct, orient='index')


                                0                                   1
2055292 [b, 30.491899, -84.372938]                               None
2058714 [b, 30.492026, -84.372938]                               None
665348  [a, 30.50842, -84.37288199999999]                        None
665047  [a, 30.50842, -84.37288199999999]   [b, 30.491882, -84.372938]

But I know using any sort of iteration is bad in pandas (and it's nowhere close to my desired result), and I'm sure there is a much easier way.

833

asked May 16 '18 19:05

user3483203

2 Answers

I think this does it with unstack:

(df.groupby(['Timestamp', 'id'])
 .apply(lambda x: x[['lat', 'long']].values.flatten())
 .unstack(level='id'))

id                              a                        b
Timestamp                                                 
665047     [30.50842, -84.372882]  [30.491882, -84.372938]
665348     [30.50842, -84.372882]                     None
2055292                      None  [30.491899, -84.372938]
2058714                      None  [30.492026, -84.372938]

134

answered Oct 23 '22 03:10

sacuL

Option 1

Set the index then pipe

df.set_index(['Timestamp', 'id']).pipe(
    lambda d: pd.Series(d.values.tolist(), d.index).unstack()
)

id                                      a                        b
Timestamp                                                         
665047     [30.50842, -84.37288199999999]  [30.491882, -84.372938]
665348     [30.50842, -84.37288199999999]                     None
2055292                              None  [30.491899, -84.372938]
2058714                              None  [30.492026, -84.372938]

Option 2

cols = ['Timestamp', 'id', 'lat', 'long']
pd.Series({
    t[:2]: list(t[2:])
    for t in df[cols].itertuples(index=False)
}).unstack()

                                      a                        b
665047   [30.50842, -84.37288199999999]  [30.491882, -84.372938]
665348   [30.50842, -84.37288199999999]                     None
2055292                            None  [30.491899, -84.372938]
2058714                            None  [30.492026, -84.372938]

answered Oct 23 '22 03:10

piRSquared

Related questions
                            
                                ERROR WHILE RUNNING collect() in PYSPARK
                            
                                Matplotlib while debugging in Pycharm: How to turn off interactive mode?
                            
                                PyQt QtWebChannel: calling Python function from JavaScript
                            
                                Zlib compress in python
                            
                                Incremental code coverage for Python unit tests?
                            
                                How can I check if a network is scale free?
                            
                                How to extract a specific section of an image using OpenCV in Python?
                            
                                Why can't I import LDAPBindError from LDAP3?
                            
                                Python - Datetime format with underscore
                            
                                pandas: TypeError: unhashable type: 'list'
                            
                                A recipe to group/aggregate data?
                            
                                Install header-only library with Python
                            
                                How to add axis offset in matplotlib plot?
                            
                                How to check if dask dataframe is empty
                            
                                Stripe charge after subscription, get metadata from subscription
                            
                                ./python: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory
                            
                                Is python's multiprocessing Queue "infinite" by default?
                            
                                Solving a system of non linear equations and inequalities at once in SymPy
                            
                                Filtering or of multiple between in sqlalchemy
                            
                                Correct way to deprecate parameter alias in click

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With