Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding values to existing columns in pandas

I loop into csv files in a directory and read them with pandas. For each csv files I have a category and a marketplace. Then I need to get the id of the category and the id of the marketplace from the database which will be valid for this csv file.

the finalDf is a dataframe containing all the products for all the csv files and I need to append it with data fron the current csv.

The list of the products of the current CSV are retrived using:

df['PRODUCT']

I need to append them to the finalDf and I used:

finalDf['PRODUCT'] =  finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)

This seems to work fine, and I now have to insert catid and marketid to the corresponding columns of the finalDf. because catid and marketid are consitent accross the current csv file I just need to add them as much time as there are rows in the df dataframe, this is what I'm trying to accomplish in the code below.

finalDf = pd.DataFrame(columns=['PRODUCT', 'CAT_ID', 'MARKET_ID'])
finalDf['PRODUCT'] = finalDf.PRODUCT.astype('category')

df = pd.read_csv(filename, header=None,
                             names=['PRODUCT', 'URL_PRODUCT', 'RANK', 'URL_IMAGE', 'STARS', 'PRICE', 'NAME', 'SNAPDATE',
                                    'CATEGORY', 'MARKETPLACE', 'PARENTCAT', 'LISTTYPE', 'VERSION', 'LEVEL'], sep='\t')

finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
# Here I have a single value to add n times, n corresponding to the number of rows in the dataframe df
catid = 2113
marketid = 13
catids = pd.Series([catid]*len(df.index))
marketids = pd.Series([marketid]*len(df.index))
finalDf['CAT_ID'] = finalDf['CAT_ID'].append(catids, ignore_index=True)
finalDf['MARKET_ID'] = finalDf['MARKET_ID'].append(marketids, ignore_index=True)

print finalDf.head()

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       NaN    NaN
    1    ABB       NaN    NaN
    2    ABE       NaN    NaN
    3    DCB       NaN    NaN
    4    EFT       NaN    NaN

As you can see, I just have NaN values instead of the actual values. expected output:

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       2113    13
    1    ABB       2113    13
    2    ABE       2113    13
    3    DCB       2113    13
    4    EFT       2113    13

finalDF containing several csv would look like:

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       2113    13
    1    ABB       2113    13
    2    ABE       2113    13
    3    DCB       2113    13
    4    EFT       2113    13
    5    SDD       2114    13
    6    ERT       2114    13
    7    GHJ       2114    13
    8    MOD       2114    13
    9    GTR       2114    13
   10    WLY       2114    13
   11    WLO       2115    13
   12    KOP       2115    13

Any idea?

Thanks

like image 220
Cyrille MODIANO Avatar asked Apr 27 '18 16:04

Cyrille MODIANO


People also ask

How do I add values to a column in pandas?

sum() function is used to return the sum of the values for the requested axis by the user. If the input value is an index axis, then it will add all the values in a column and works same for all the columns. It returns a series that contains the sum of all the values in each column.

How do I add values to multiple columns in pandas?

By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

How do I add data to an existing DataFrame pandas?

Pandas DataFrame append() Method The append() method appends a DataFrame-like object at the end of the current DataFrame. The append() method returns a new DataFrame object, no changes are done with the original DataFrame.


1 Answers

I finally found the solution, don't know why the other one didn't work though. But this one is simpler:

tempDf = pd.DataFrame(columns=['PRODUCT','CAT_ID','MARKET_ID'])
tempDf['PRODUCT'] = df['PRODUCT']
tempDf['CAT_ID'] = catid
tempDf['MARKET_ID'] = 13

finalDf = pd.concat([finalDf,tempDf])
like image 52
Cyrille MODIANO Avatar answered Oct 17 '22 07:10

Cyrille MODIANO