This is easy to do in R and I am wondering if it is straight forward in Python and I am just missing something, but how do you create a vector of NaN values and Null values in Python? I am trying to do this using the np.full function.
R Code:
vec <- vector("character", 15)
vec[1:15] <- NA
vec
Python Code
unknowns = np.full(shape = 5, fill_value = ???, dtype = 'str')
'''test if fill value worked or not'''
random.seed(1177)
categories = np.random.choice(['web', 'software', 'hardware', 'biotech'], size = 15, replace = True)
categories = np.concatenate([categories, unknowns])
example = pd.DataFrame(data = {'categories': categories})
example['transformed'] = [ x if pd.isna(x) == False else 'unknown' for x in example['categories']]
print(example['transformed'].value_counts())
This should lead to 5 counts of unknown in the value counts total. Ideally I would like to know how to write this fill_value for NaN and Null and know whether it differs for variable types. I have tried np.nan with and without the string data type. I have tried None and Null with and without quotes. I cannot think of anything else to try and I am starting to wonder if it is possible. Thank you in advance and I apologize if this question is already addressed and for my lack of knowledge in this area.
you could use either None or np.nan to create an array of just missing values in Python like so:
np.full(shape=5, fill_value=None)
np.full(shape=5, fill_value=np.nan)
back to your example, this works just fine:
import numpy as np
import pandas as pd
unknowns = np.full(shape=5, fill_value=None)
categories = np.random.choice(['web', 'software', 'hardware', 'biotech'], size = 15, replace = True)
categories = np.concatenate([categories, unknowns])
example = pd.DataFrame(data = {'categories': categories})
example['transformed'] = [ x if pd.isna(x) == False else 'unknown' for x in example['categories']]
print(example['transformed'].value_counts())
Lastly, this line is inefficient.
example['transformed'] = [ x if pd.isna(x) == False else 'unknown' for x in example['categories']]
You do want to avoid loops & list comprehensions when using pandas
on large data, this is going to run much faster:
example['transformed'] = example.categories.apply(lambda s: s if s else 'unknown')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With