Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - splitting dataframe into multiple dataframes based on column values and naming them with those values [duplicate]

I have a large dataset listing competitor products on sale in different regions across the country. I am looking to split this dataframe into several others based on the region via an iterative process using the column values within the names of those new dataframes, so that I can work with each separately - e.g. to sort information in each region by price to understand what the market looks like in each. I've given a simplified version of the data below:

Competitor  Region  ProductA  ProductB
Comp1       A       £10       £15
Comp1       B       £11       £16
Comp1       C       £11       £15
Comp2       A       £9        £16
Comp2       B       £12       £14
Comp2       C       £14       £17
Comp3       A       £11       £16
Comp3       B       £10       £15
Comp3       C       £12       £15

I can create a list of the regions using the below:

region_list=df['Region'].unique().tolist()

Which I was hoping to use in an iterative loop that produced a number of dataframes, e.g.

df_A :

Competitor  Region  ProductA  ProductB
Comp1       A       £10       £15
Comp2       A       £9        £16
Comp3       A       £11       £16

I could do this manually for each region, with the code

df_A=df.loc[df['Region']==A]

but the reality is that this dataset has a large number of areas which would make this code tedious. Is there a way of creating an iterative loop that would replicate this? There is a similar question that asks about splitting dataframes, but the answer does not show how to label outputs based on each column value.

I'm quite new to Python and still learning, so if there is actually a different, more sensible method of approaching this problem I'm very open to suggestions.

like image 215
Sarah Avatar asked Nov 09 '16 00:11

Sarah


People also ask

How do you split a column into two Dataframes in Python?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do you split Dataframes in Python?

div() method divides element-wise division of one pandas DataFrame by another. DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/).


1 Answers

Subsetting by distinct values is called a groupby, if simply want to iterate through the groups with a for loop, the syntax is:

for region, df_region in df.groupby('Region'):     print(df_region)    Competitor Region ProductA ProductB 0      Comp1      A      £10      £15 3      Comp2      A       £9      £16 6      Comp3      A      £11      £16   Competitor Region ProductA ProductB 1      Comp1      B      £11      £16 4      Comp2      B      £12      £14 7      Comp3      B      £10      £15   Competitor Region ProductA ProductB 2      Comp1      C      £11      £15 5      Comp2      C      £14      £17 8      Comp3      C      £12      £15 
like image 133
maxymoo Avatar answered Sep 17 '22 10:09

maxymoo