I stuck with the problem how to divide a pandas dataframe by row,
I have a similar dataframe with a column where values are separated by \r\n and they are in one cell,
Color Shape Price
0 Green Rectangle\r\nTriangle\r\nOctangle 10
1 Blue Rectangle\r\nTriangle 15
I need to divide this cell into several cells with the same values as other columns, e.g.
Color Shape Price
0 Green Rectangle 10
1 Green Triangle 10
2 Green Octangle 10
3 Blue Rectangle 15
4 Blue Tringle 15
How do I do it well?
How do you split a row in a data frame? Using the iloc() function to split DataFrame in Python We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns.
Series and DataFrame methods define a . explode() method that explodes lists into separate rows. See the docs section on Exploding a list-like column. Since you have a list of comma separated strings, split the string on comma to get a list of elements, then call explode on that column.
Slicing Rows and Columns by Index PositionWhen slicing by index position in Pandas, the start index is included in the output, but the stop index is one step beyond the row you want to select. So the slice return row 0 and row 1, but does not return row 2. The second slice [:] indicates that all columns are required.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
You can do:
df["Shape"]=df["Shape"].str.split("\r\n")
print(df.explode("Shape").reset_index(drop=True))
Output:
Color Shape Price
0 Green Rectangle 10
1 Green Triangle 10
2 Green Octangle 10
3 Blue Rectangle 15
4 Blue Triangle 15
This might not be the most efficient way to do it but I can confirm that it works with the sample df:
data = [['Green', 'Rectangle\r\nTriangle\r\nOctangle', 10], ['Blue', 'Rectangle\r\nTriangle', 15]]
df = pd.DataFrame(data, columns = ['Color', 'Shape', 'Price'])
new_df = pd.DataFrame(columns = ['Color', 'Shape', 'Price'])
for index, row in df.iterrows():
split = row['Shape'].split('\r\n')
for shape in split:
new_df = new_df.append(pd.DataFrame({'Color':[row['Color']], 'Shape':[shape], 'Price':[row['Price']]}))
new_df = new_df.reset_index(drop=True)
print(new_df)
Output:
Color Price Shape
0 Green 10 Rectangle
1 Green 10 Triangle
2 Green 10 Octangle
3 Blue 15 Rectangle
4 Blue 15 Triangle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With