I have the following dataframe:
Timestamp id lat long
0 665047 a 30.508420 -84.372882
1 665047 b 30.491882 -84.372938
2 2058714 b 30.492026 -84.372938
3 665348 a 30.508420 -84.372882
4 2055292 b 30.491899 -84.372938
My desired result is to have:
Timestamp a b
0 665047 [30.508420, -84.372882] [30.491882, -84.372938]
1 665348 [30.508420, -84.372882] NaN
2 2055292 NaN [30.491899, -84.372938]
3 2058714 NaN [30.492026, -84.372938]
Where the unique values found in df.id
become column headers (there can be several thousand of these), with their latitude and longitude as values.
The closest I have come is using:
for i, r in df.iterrows():
dct[r.Timestamp].append([r.id, r.lat, r.long])
pd.DataFrame.from_dict(dct, orient='index')
0 1
2055292 [b, 30.491899, -84.372938] None
2058714 [b, 30.492026, -84.372938] None
665348 [a, 30.50842, -84.37288199999999] None
665047 [a, 30.50842, -84.37288199999999] [b, 30.491882, -84.372938]
But I know using any sort of iteration is bad in pandas (and it's nowhere close to my desired result), and I'm sure there is a much easier way.
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
To get unique values from a column in a DataFrame, use the unique(). To count the unique values from a column in a DataFrame, use the nunique().
I think this does it with unstack
:
(df.groupby(['Timestamp', 'id'])
.apply(lambda x: x[['lat', 'long']].values.flatten())
.unstack(level='id'))
id a b
Timestamp
665047 [30.50842, -84.372882] [30.491882, -84.372938]
665348 [30.50842, -84.372882] None
2055292 None [30.491899, -84.372938]
2058714 None [30.492026, -84.372938]
Set the index then pipe
df.set_index(['Timestamp', 'id']).pipe(
lambda d: pd.Series(d.values.tolist(), d.index).unstack()
)
id a b
Timestamp
665047 [30.50842, -84.37288199999999] [30.491882, -84.372938]
665348 [30.50842, -84.37288199999999] None
2055292 None [30.491899, -84.372938]
2058714 None [30.492026, -84.372938]
cols = ['Timestamp', 'id', 'lat', 'long']
pd.Series({
t[:2]: list(t[2:])
for t in df[cols].itertuples(index=False)
}).unstack()
a b
665047 [30.50842, -84.37288199999999] [30.491882, -84.372938]
665348 [30.50842, -84.37288199999999] None
2055292 None [30.491899, -84.372938]
2058714 None [30.492026, -84.372938]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With