I've been trying to assign a value for every row of a dataframe and I haven't been able to do so (I'm new in pandas), so if anyone could help, I'd be super grateful!
I've got two dataframes. In the input dataframe, I have brands:
brand_raw.head()
brand_name
0 Nike
1 Lacoste
2 Adidas
And then, on the output dataset, I have objects:
object_raw.head()
category_id object_name
0 24 T-shirt
1 45 Shorts
2 32 Dress
and what I would need to have is a dataframe with all the objects combined with all the brands:
to_raw.head()
category_id object_name brand_name
0 24 T-shirt Nike
1 45 Shorts Nike
2 32 Dress Nike
3 24 T-shirt Lacoste
4 45 Shorts Lacoste
5 32 Dress Lacoste
6 24 T-shirt Adidas
7 45 Shorts Adidas
8 32 Dress Adidas
I've been trying to do it with the apply function, iterating over the rows, but I end up overwriting the values so I write the last brand:
0 24 T-shirt Nike
1 45 Shorts Nike
2 32 Dress Nike
This is my code:
def insert_value_in_every_row(input_df, output_df, column_name):
for row in input_df.values:
row = row[0].rstrip()
output_df[column_name] = output_df[column_name].apply(lambda x: row)
return output_df
insert_value_in_every_row(brand_raw, to_raw, 'brand_name')
Could someone give me a hint on how to deal with this, please? Thanks a lot in advance!
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
Pandas DataFrame: assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
You're looking for a cartesian product of both dataframes. One way around this in pandas is to create a common and unique key for both dataframes and perform a merge
(any, as there is a complete overlap):
df.assign(key=0).merge(object_raw.assign(key=0), on='key').drop(['key'], axis=1)
brand_name category_id object_name
0 Nike 24 T-shirt
1 Nike 45 Shorts
2 Nike 32 Dress
3 Lacoste 24 T-shirt
4 Lacoste 45 Shorts
5 Lacoste 32 Dress
6 Adidas 24 T-shirt
7 Adidas 45 Shorts
8 Adidas 32 Dress
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With