Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas replace nan depending on type

In DataFrane.to_csv, I managed to write csv files removing nan values with

df = df.replace('None','')
df = df.replace('nan','')

but my issue is that with this approach is that every nan values will be replaced with qoutes: ''

is it possible to replace nan values depending on type?

if the nan dataframe == int dont add qoutes
if str set to ''
if float set to 0.0

etc.

tried this code but failed

df['myStringColumn'].replace('None', '')

edit: here is the sample data frame i have

      aTest    Vendor     name     price    qty
 0    y        NewVend             21.20    nan
 1    y        OldMakes            11.20    3
 2    nan      nan        sample   9.20     1
 3    n        nan        make     nan      0

here is my goal

'y','NewVend','',21.20,,     
'y','OldMakes','',11.20,3,
'','','sample',9.20,1,
'n','','make',0.0,0,

here is the full script

dtype_dic= {'price': float, 'qty': float}
df = pd.read_excel(os.path.join(sys.path[0], d.get('csv')), dtype=str)
for col, col_type in dtype_dic.items():
    df[col] = df[col].astype(col_type)
df = df.replace('None','')
df = df.replace('nan','')
df.to_csv('test.csv', index=False, header=False, quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
like image 848
Led Avatar asked Dec 13 '22 15:12

Led


2 Answers

You can select the columns with required type using select_dtypes and then use fillna if the nan is np.nan, it works for None as well

float_cols = df.select_dtypes(include=['float64']).columns
str_cols = df.select_dtypes(include=['object']).columns

df.loc[:, float_cols] = df.loc[:, float_cols].fillna(0)
df.loc[:, str_cols] = df.loc[:, str_cols].fillna('')

You get

    aTest   Vendor      name    price   qty
0   y       NewVend             21.2    0.0
1   y       OldMakes            11.2    3.0
2                       sample  9.2     1.0
3   n                   make    0.0     0.0
like image 119
Vaishali Avatar answered Jan 01 '23 14:01

Vaishali


Try this!!

data = pd.read_csv("dataset/pokemon.csv")
data.head(7)


#   Name    Type 1  Type 2  HP  Attack  Defense Sp. Atk Sp. Def Speed   Generation  Legendary
0   1   Bulbasaur   Grass   Poison  45  49  49  65  65  45  1   False
1   2   Ivysaur Grass   Poison  60  62  63  80  80  60  1   False
2   3   Venusaur    Grass   Poison  80  82  83  100 100 80  1   False
3   4   Mega Venusaur   Grass   Poison  80  100 123 122 120 80  1   False
4   5   Charmander  Fire    NaN 39  52  43  60  50  65  1   False
5   6   Charmeleon  Fire    NaN 58  64  58  80  65  80  1   False
6   7   Charizard   Fire    Flying  78  84  78  109 85  100 1   False

To know string type columns:

types = list(data.iloc[0])

str_types=[]
i=0
for a in l1:
    if type(a) == str:
        str_types.append(i)
    i+=1 

print str_types 
[1, 2, 3] #columns with string values

Replace NaN with " ":

for a in str_types:
    data.iloc[:,a].fillna(" ",inplace =True)

data


#   Name    Type 1  Type 2  HP  Attack  Defense Sp. Atk Sp. Def Speed   Generation  Legendary
0   1   Bulbasaur   Grass   Poison  45  49  49  65  65  45  1   False
1   2   Ivysaur Grass   Poison  60  62  63  80  80  60  1   False
2   3   Venusaur    Grass   Poison  80  82  83  100 100 80  1   False
3   4   Mega Venusaur   Grass   Poison  80  100 123 122 120 80  1   False
4   5   Charmander  Fire        39  52  43  60  50  65  1   False
5   6   Charmeleon  Fire        58  64  58  80  65  80  1   False
6   7   Charizard   Fire    Flying  78  84  78  109 85  100 1   False
7   8   Mega Charizard X    Fire    Dragon  78  130 111 130 85  100 1   False
8   9   Mega Charizard Y    Fire    Flying  78  104 78  159 115 100 1   False
9   10  Squirtle    Water       44  48  65  50  64  43  1   False
10  11  Wartortle   Water       59  63  80  65  80  58  1   False
like image 42
vijay athithya Avatar answered Jan 01 '23 14:01

vijay athithya