I loop into csv files in a directory and read them with pandas. For each csv files I have a category and a marketplace. Then I need to get the id of the category and the id of the marketplace from the database which will be valid for this csv file.
the finalDf is a dataframe containing all the products for all the csv files and I need to append it with data fron the current csv.
The list of the products of the current CSV are retrived using:
df['PRODUCT']
I need to append them to the finalDf and I used:
finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
This seems to work fine, and I now have to insert catid and marketid to the corresponding columns of the finalDf. because catid and marketid are consitent accross the current csv file I just need to add them as much time as there are rows in the df dataframe, this is what I'm trying to accomplish in the code below.
finalDf = pd.DataFrame(columns=['PRODUCT', 'CAT_ID', 'MARKET_ID'])
finalDf['PRODUCT'] = finalDf.PRODUCT.astype('category')
df = pd.read_csv(filename, header=None,
names=['PRODUCT', 'URL_PRODUCT', 'RANK', 'URL_IMAGE', 'STARS', 'PRICE', 'NAME', 'SNAPDATE',
'CATEGORY', 'MARKETPLACE', 'PARENTCAT', 'LISTTYPE', 'VERSION', 'LEVEL'], sep='\t')
finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
# Here I have a single value to add n times, n corresponding to the number of rows in the dataframe df
catid = 2113
marketid = 13
catids = pd.Series([catid]*len(df.index))
marketids = pd.Series([marketid]*len(df.index))
finalDf['CAT_ID'] = finalDf['CAT_ID'].append(catids, ignore_index=True)
finalDf['MARKET_ID'] = finalDf['MARKET_ID'].append(marketids, ignore_index=True)
print finalDf.head()
PRODUCT CAT_ID MARKET_ID
0 ABC NaN NaN
1 ABB NaN NaN
2 ABE NaN NaN
3 DCB NaN NaN
4 EFT NaN NaN
As you can see, I just have NaN values instead of the actual values. expected output:
PRODUCT CAT_ID MARKET_ID
0 ABC 2113 13
1 ABB 2113 13
2 ABE 2113 13
3 DCB 2113 13
4 EFT 2113 13
finalDF containing several csv would look like:
PRODUCT CAT_ID MARKET_ID
0 ABC 2113 13
1 ABB 2113 13
2 ABE 2113 13
3 DCB 2113 13
4 EFT 2113 13
5 SDD 2114 13
6 ERT 2114 13
7 GHJ 2114 13
8 MOD 2114 13
9 GTR 2114 13
10 WLY 2114 13
11 WLO 2115 13
12 KOP 2115 13
Any idea?
Thanks
sum() function is used to return the sum of the values for the requested axis by the user. If the input value is an index axis, then it will add all the values in a column and works same for all the columns. It returns a series that contains the sum of all the values in each column.
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
Pandas DataFrame append() Method The append() method appends a DataFrame-like object at the end of the current DataFrame. The append() method returns a new DataFrame object, no changes are done with the original DataFrame.
I finally found the solution, don't know why the other one didn't work though. But this one is simpler:
tempDf = pd.DataFrame(columns=['PRODUCT','CAT_ID','MARKET_ID'])
tempDf['PRODUCT'] = df['PRODUCT']
tempDf['CAT_ID'] = catid
tempDf['MARKET_ID'] = 13
finalDf = pd.concat([finalDf,tempDf])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With