Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create columns based on unique column values and fill

Tags:

python

pandas

I have the following dataframe:

    Timestamp   id  lat         long
0   665047      a   30.508420   -84.372882
1   665047      b   30.491882   -84.372938
2   2058714     b   30.492026   -84.372938
3   665348      a   30.508420   -84.372882
4   2055292     b   30.491899   -84.372938

My desired result is to have:

    Timestamp                        a                       b     
0   665047     [30.508420,  -84.372882] [30.491882, -84.372938]
1   665348     [30.508420,  -84.372882]                    NaN
2   2055292                        NaN  [30.491899, -84.372938]
3   2058714                        NaN  [30.492026, -84.372938]

Where the unique values found in df.id become column headers (there can be several thousand of these), with their latitude and longitude as values.

The closest I have come is using:

for i, r in df.iterrows():
    dct[r.Timestamp].append([r.id, r.lat, r.long])

pd.DataFrame.from_dict(dct, orient='index')


                                0                                   1
2055292 [b, 30.491899, -84.372938]                               None
2058714 [b, 30.492026, -84.372938]                               None
665348  [a, 30.50842, -84.37288199999999]                        None
665047  [a, 30.50842, -84.37288199999999]   [b, 30.491882, -84.372938]

But I know using any sort of iteration is bad in pandas (and it's nowhere close to my desired result), and I'm sure there is a much easier way.

like image 833
user3483203 Avatar asked May 16 '18 19:05

user3483203


People also ask

How do I make unique columns in pandas?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.

How do I print unique values in a column?

To get unique values from a column in a DataFrame, use the unique(). To count the unique values from a column in a DataFrame, use the nunique().


2 Answers

I think this does it with unstack:

(df.groupby(['Timestamp', 'id'])
 .apply(lambda x: x[['lat', 'long']].values.flatten())
 .unstack(level='id'))

id                              a                        b
Timestamp                                                 
665047     [30.50842, -84.372882]  [30.491882, -84.372938]
665348     [30.50842, -84.372882]                     None
2055292                      None  [30.491899, -84.372938]
2058714                      None  [30.492026, -84.372938]
like image 134
sacuL Avatar answered Oct 23 '22 03:10

sacuL


Option 1

Set the index then pipe

df.set_index(['Timestamp', 'id']).pipe(
    lambda d: pd.Series(d.values.tolist(), d.index).unstack()
)

id                                      a                        b
Timestamp                                                         
665047     [30.50842, -84.37288199999999]  [30.491882, -84.372938]
665348     [30.50842, -84.37288199999999]                     None
2055292                              None  [30.491899, -84.372938]
2058714                              None  [30.492026, -84.372938]

Option 2

cols = ['Timestamp', 'id', 'lat', 'long']
pd.Series({
    t[:2]: list(t[2:])
    for t in df[cols].itertuples(index=False)
}).unstack()

                                      a                        b
665047   [30.50842, -84.37288199999999]  [30.491882, -84.372938]
665348   [30.50842, -84.37288199999999]                     None
2055292                            None  [30.491899, -84.372938]
2058714                            None  [30.492026, -84.372938]
like image 30
piRSquared Avatar answered Oct 23 '22 03:10

piRSquared