Python - splitting dataframe into multiple dataframes based on column values and naming them with those values [duplicate]

Tags:

I have a large dataset listing competitor products on sale in different regions across the country. I am looking to split this dataframe into several others based on the region via an iterative process using the column values within the names of those new dataframes, so that I can work with each separately - e.g. to sort information in each region by price to understand what the market looks like in each. I've given a simplified version of the data below:

Competitor  Region  ProductA  ProductB
Comp1       A       £10       £15
Comp1       B       £11       £16
Comp1       C       £11       £15
Comp2       A       £9        £16
Comp2       B       £12       £14
Comp2       C       £14       £17
Comp3       A       £11       £16
Comp3       B       £10       £15
Comp3       C       £12       £15

I can create a list of the regions using the below:

region_list=df['Region'].unique().tolist()

Which I was hoping to use in an iterative loop that produced a number of dataframes, e.g.

df_A :

Competitor  Region  ProductA  ProductB
Comp1       A       £10       £15
Comp2       A       £9        £16
Comp3       A       £11       £16

I could do this manually for each region, with the code

df_A=df.loc[df['Region']==A]

but the reality is that this dataset has a large number of areas which would make this code tedious. Is there a way of creating an iterative loop that would replicate this? There is a similar question that asks about splitting dataframes, but the answer does not show how to label outputs based on each column value.

I'm quite new to Python and still learning, so if there is actually a different, more sensible method of approaching this problem I'm very open to suggestions.

215

asked Nov 09 '16 00:11

Sarah

1 Answers

Subsetting by distinct values is called a groupby, if simply want to iterate through the groups with a for loop, the syntax is:

for region, df_region in df.groupby('Region'):     print(df_region)    Competitor Region ProductA ProductB 0      Comp1      A      £10      £15 3      Comp2      A       £9      £16 6      Comp3      A      £11      £16   Competitor Region ProductA ProductB 1      Comp1      B      £11      £16 4      Comp2      B      £12      £14 7      Comp3      B      £10      £15   Competitor Region ProductA ProductB 2      Comp1      C      £11      £15 5      Comp2      C      £14      £17 8      Comp3      C      £12      £15

133

answered Sep 17 '22 10:09

maxymoo

Related questions
                            
                                How to find the last row in a column using openpyxl normal workbook?
                            
                                Anyone using Django in the "Enterprise"
                            
                                Writing a help for python script
                            
                                What's wrong with my except? [duplicate]
                            
                                Quadratic Program (QP) Solver that only depends on NumPy/SciPy?
                            
                                How to upload a file using an ajax call in flask
                            
                                How to display all label values in matplotlib
                            
                                Hide Axis in Bokeh
                            
                                Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`
                            
                                Trailing slash in Flask route
                            
                                Do datetime objects need to be deep-copied?
                            
                                Python Pandas Dataframe, remove all rows where 'None' is the value in any column
                            
                                Python: How to drop a row whose particular column is empty/NaN?
                            
                                Getting No loop matching the specified signature and casting error
                            
                                How do I specify multiple types for a parameter using type-hints? [duplicate]
                            
                                from __future__ import annotations
                            
                                Django BigInteger auto-increment field as primary key?
                            
                                Is there a way to hide the csrf label while looping through form using Flask and Flask-WTForms?
                            
                                Python Serial: How to use the read or readline function to read more than 1 character at a time
                            
                                ExcelFile Vs. read_excel in pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - splitting dataframe into multiple dataframes based on column values and naming them with those values [duplicate]

Tags:

python

pandas

dataframe

Sarah

People also ask

1 Answers

maxymoo

Recent Activity

Donate For Us