Any quick way to achieve the below output pls?
Input:
Code Items
123 eq-hk
456 ca-eu; tp-lbe
789 ca-us
321 go-ch
654 ca-au; go-au
987 go-jp
147 co-ml; go-ml
258 ca-us
369 ca-us; ca-my
741 ca-us
852 ca-eu
963 ca-ml; co-ml; go-ml
Output:
Code eq ca go co tp
123 hk
456 eu lbe
789 us
321 ch
654 au au
987 jp
147 ml ml
258 us
369 us,my
741 us
852 eu
963 ml ml ml
Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?
Thank you!
The best way to convert one or more columns of a DataFrame to numeric values is to use pandas.to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
Introduction to Pandas format. Pandas format is indicating the information in the necessary organization. Some of the time, the worth is huge to the point that we need to show just wanted aspect of this or we can say in some ideal configuration. Python pandas library utilizes an open-source standard date-time design.
float_format to "{:,. 2f}". format to display float values to two decimal places.
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column.
This is a little bit complicate
(df.set_index('Code')
.Items.str.split(';',expand=True)
.stack()
.str.split('-',expand=True)
.set_index(0,append=True)[1]
.unstack()
.fillna('')
.sum(level=0))
0 ca co eq go tp
Code
123 hk
147 ml ml
258 us
321 ch
369 usmy
456 eu lbe
654 au au
741 us
789 us
852 eu
963 ml ml ml
987 jp
# using str split to get unnest the column,
#then we do stack, and str split again , then set the first column to index
# after unstack we yield the result
List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.
df2 = pd.DataFrame([
dict(y.split('-') for y in x.split('; '))
for x in df.Items]).fillna('')
df2.insert(0, 'Code', df.Code)
print(df2)
Code ca co eq go tp
0 123 hk
1 456 eu lbe
2 789 us
3 321 ch
4 654 au au
5 987 jp
6 147 ml ml
7 258 us # Should be "us,my"... see below.
8 369 my
9 741 us
10 852 eu
11 963 ml ml ml
This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.
from itertools import chain
v = [x.split('; ') for x in df.Items]
X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])
df2 = pd.concat([X, Y], axis=1, ignore_index=True)
(df2.set_index([0, 1, 3])[2]
.unstack(1)
.fillna('')
.groupby(level=0)
.agg(lambda x: ','.join(x).strip(','))
1 ca co eq go tp
0
123 hk
147 ml ml
258 us
321 ch
369 us,my
456 eu lbe
654 au au
741 us
789 us
852 eu
963 ml ml ml
987 jp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With