I stuck with the problem how to divide a pandas dataframe by row, I have a similar dataframe with a column where values are separated by \r\n and they are in one cell, <pre class="prettyprint lang-py prettyprint-override"><code> Color Shape Price 0 Green Rectangle\r\nTriangle\r\nOctangle 10 1 Blue Rectangle\r\nTriangle 15 </code></pre> I need to divide this cell into several cells with the same values as other columns, e.g. <pre class="prettyprint lang-py prettyprint-override"><code> Color Shape Price 0 Green Rectangle 10 1 Green Triangle 10 2 Green Octangle 10 3 Blue Rectangle 15 4 Blue Tringle 15 </code></pre> How do I do it well?

You can do: <pre class="prettyprint"><code>df["Shape"]=df["Shape"].str.split("\r\n") print(df.explode("Shape").reset_index(drop=True)) </code></pre> Output: <pre class="prettyprint"><code> Color Shape Price 0 Green Rectangle 10 1 Green Triangle 10 2 Green Octangle 10 3 Blue Rectangle 15 4 Blue Triangle 15 </code></pre>

split rows in pandas dataframe

Tags:

python

pandas

I stuck with the problem how to divide a pandas dataframe by row,

I have a similar dataframe with a column where values are separated by \r\n and they are in one cell,

    Color                              Shape  Price
0  Green  Rectangle\r\nTriangle\r\nOctangle     10
1   Blue              Rectangle\r\nTriangle     15

I need to divide this cell into several cells with the same values as other columns, e.g.

   Color      Shape  Price
0  Green  Rectangle     10
1  Green   Triangle     10
2  Green   Octangle     10
3   Blue  Rectangle     15
4   Blue    Tringle     15

How do I do it well?

894

asked Oct 23 '19 12:10

George

2 Answers

You can do:

df["Shape"]=df["Shape"].str.split("\r\n")
print(df.explode("Shape").reset_index(drop=True))

Output:

   Color    Shape   Price
0   Green   Rectangle   10
1   Green   Triangle    10
2   Green   Octangle    10
3   Blue    Rectangle   15
4   Blue    Triangle    15

151

answered Oct 11 '22 11:10

Sociopath

This might not be the most efficient way to do it but I can confirm that it works with the sample df:

data = [['Green', 'Rectangle\r\nTriangle\r\nOctangle', 10], ['Blue', 'Rectangle\r\nTriangle', 15]]   
df = pd.DataFrame(data, columns = ['Color', 'Shape', 'Price'])
new_df = pd.DataFrame(columns = ['Color', 'Shape', 'Price'])

for index, row in df.iterrows():
    split = row['Shape'].split('\r\n')
    for shape in split:
        new_df = new_df.append(pd.DataFrame({'Color':[row['Color']], 'Shape':[shape], 'Price':[row['Price']]}))

new_df = new_df.reset_index(drop=True)
print(new_df)

Output:

   Color Price      Shape
0  Green    10  Rectangle
1  Green    10   Triangle
2  Green    10   Octangle
3   Blue    15  Rectangle
4   Blue    15   Triangle

answered Oct 11 '22 13:10

MBA Coder

Related questions
                            
                                How to deploy pyside2 applications? - The Qt way
                            
                                How to upload app bundle (.aab) to play store using Google Play Publisher API
                            
                                Pythonic conversion to singleton iterable if not already an iterable
                            
                                How do I fix AttributeError: 'bytes' object has no attribute 'encode'?
                            
                                pandas dataframe: how to aggregate a subset of rows based on value of a column
                            
                                Video Streaming from IP Camera in Python Using OpenCV cv2.VideoCapture
                            
                                Tensorflow/keras: "logits and labels must have the same first dimension" How to squeeze logits or expand labels?
                            
                                Import Error: 'scipy.misc import imsave' on Google Colaboratory
                            
                                Stanford typed dependencies using coreNLP in python
                            
                                Python extract multiple objects from image opencv
                            
                                Unsure how to use colormap with Folium marker plot
                            
                                how to get response_time and response_size while using aiohttp
                            
                                I can't import Python modules in Xcode 11 using PythonKit
                            
                                Get UnsatisfiableError when Installing OpenCV for Python through Anaconda on Windows
                            
                                How do you use EC.presence_of_element_located((By.ID, "myDynamicElement")) except to specify class not ID
                            
                                Vectorizing a "pure" function with numpy, assuming many duplicates
                            
                                Visualising the decision tree in sklearn
                            
                                How change Schemes from HTTP to HTTPS in drf_yasg?
                            
                                Time complexity: deleting element of deque
                            
                                Explanding GeoPandas Multipolygon Dataframe To One Poly Per Line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With