i have a csv file with multiple columns containing empty strings. Upon reading the csv into pandas dataframe, the empty strings get converted to NaN.
Now i want to append a string tag-
to the strings already present in the columns but to only those that have some values in it and not on those with NaN
this is what i was trying to do:
with open('file1.csv','r') as file:
for chunk in pd.read_csv(file,chunksize=1000, header=0, names=['A','B','C','D'])
if len(chunk) >=1:
if chunk['A'].notna:
chunk['A'] = "tag-"+chunk['A'].astype(str)
if chunk['B'].notna:
chunk['B'] = "tag-"+chunk['B'].astype(str)
if chunk['C'].notna:
chunk['C'] = "tag-"+chunk['C'].astype(str)
if chunk['D'].notna:
chunk['D'] = "tag-"+chunk['D'].astype(str)
and this is the error I'm getting:
AttributeError: 'Series' object has no attribute 'notna'
the final output that i want should be something like this:
A,B,C,D
tag-a,tab-b,tag-c,
tag-a,tag-b,,
tag-a,,,
,,tag-c,
,,,tag-d
,tag-b,,tag-d
I believe you need mask
for add tag-
to all columns together:
for chunk in pd.read_csv('file1.csv',chunksize=2, header=0, names=['A','B','C','D']):
if len(chunk) >=1:
m1 = chunk.notna()
chunk = chunk.mask(m1, "tag-" + chunk.astype(str))
You need upgrade to last version of pandas, 0.21.0
.
You can check docs:
In order to promote more consistency among the pandas API, we have added additional top-level functions
isna()
andnotna()
that are aliases forisnull()
andnotnull()
. The naming scheme is now more consistent with methods like.dropna()
and.fillna()
. Furthermore in all cases where .isnull() and .notnull() methods are defined, these have additional methods named.isna()
and.notna()
, these are included for classes Categorical, Index, Series, and DataFrame. (GH15001).The configuration option pd.options.mode.use_inf_as_null is deprecated, and pd.options.mode.use_inf_as_na is added as a replacement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With