Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AttributeError: 'Series' object has no attribute 'notna'

i have a csv file with multiple columns containing empty strings. Upon reading the csv into pandas dataframe, the empty strings get converted to NaN.

Now i want to append a string tag- to the strings already present in the columns but to only those that have some values in it and not on those with NaN

this is what i was trying to do:

with open('file1.csv','r') as file:
    for chunk in pd.read_csv(file,chunksize=1000, header=0, names=['A','B','C','D'])
        if len(chunk) >=1:
            if chunk['A'].notna:
                chunk['A'] = "tag-"+chunk['A'].astype(str)
            if chunk['B'].notna:
                chunk['B'] = "tag-"+chunk['B'].astype(str)
            if chunk['C'].notna:
                chunk['C'] = "tag-"+chunk['C'].astype(str)
            if chunk['D'].notna:
                chunk['D'] = "tag-"+chunk['D'].astype(str)

and this is the error I'm getting:

AttributeError: 'Series' object has no attribute 'notna'

the final output that i want should be something like this:

A,B,C,D
tag-a,tab-b,tag-c,
tag-a,tag-b,,
tag-a,,,
,,tag-c,
,,,tag-d
,tag-b,,tag-d
like image 229
Aman Singh Avatar asked Dec 12 '17 09:12

Aman Singh


1 Answers

I believe you need mask for add tag- to all columns together:

for chunk in pd.read_csv('file1.csv',chunksize=2, header=0, names=['A','B','C','D']):
    if len(chunk) >=1:
        m1 = chunk.notna()
        chunk = chunk.mask(m1, "tag-" + chunk.astype(str))
 

You need upgrade to last version of pandas, 0.21.0.

You can check docs:

In order to promote more consistency among the pandas API, we have added additional top-level functions isna() and notna() that are aliases for isnull() and notnull(). The naming scheme is now more consistent with methods like .dropna() and .fillna(). Furthermore in all cases where .isnull() and .notnull() methods are defined, these have additional methods named .isna() and .notna(), these are included for classes Categorical, Index, Series, and DataFrame. (GH15001).

The configuration option pd.options.mode.use_inf_as_null is deprecated, and pd.options.mode.use_inf_as_na is added as a replacement.

like image 119
jezrael Avatar answered Sep 21 '22 21:09

jezrael