Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create an array of NA or Null values in Python?

Tags:

python

numpy

This is easy to do in R and I am wondering if it is straight forward in Python and I am just missing something, but how do you create a vector of NaN values and Null values in Python? I am trying to do this using the np.full function.

R Code:

vec <- vector("character", 15)
vec[1:15] <- NA
vec

Python Code

unknowns = np.full(shape = 5, fill_value = ???, dtype = 'str')

'''test if fill value worked or not'''

random.seed(1177)
categories = np.random.choice(['web', 'software', 'hardware', 'biotech'], size = 15, replace = True)
categories = np.concatenate([categories, unknowns])

example = pd.DataFrame(data = {'categories': categories})
example['transformed'] = [ x if pd.isna(x) == False else 'unknown' for x in example['categories']]

print(example['transformed'].value_counts())

This should lead to 5 counts of unknown in the value counts total. Ideally I would like to know how to write this fill_value for NaN and Null and know whether it differs for variable types. I have tried np.nan with and without the string data type. I have tried None and Null with and without quotes. I cannot think of anything else to try and I am starting to wonder if it is possible. Thank you in advance and I apologize if this question is already addressed and for my lack of knowledge in this area.

like image 208
Pearl Avatar asked Apr 29 '26 11:04

Pearl


1 Answers

you could use either None or np.nan to create an array of just missing values in Python like so:

np.full(shape=5, fill_value=None)
np.full(shape=5, fill_value=np.nan)

back to your example, this works just fine:

import numpy as np
import pandas as pd

unknowns = np.full(shape=5, fill_value=None)
categories = np.random.choice(['web', 'software', 'hardware', 'biotech'], size = 15, replace = True)
categories = np.concatenate([categories, unknowns])
example = pd.DataFrame(data = {'categories': categories})
example['transformed'] = [ x if pd.isna(x) == False else 'unknown' for x in example['categories']]

print(example['transformed'].value_counts())

Lastly, this line is inefficient. example['transformed'] = [ x if pd.isna(x) == False else 'unknown' for x in example['categories']]

You do want to avoid loops & list comprehensions when using pandas

on large data, this is going to run much faster: example['transformed'] = example.categories.apply(lambda s: s if s else 'unknown')

like image 154
govordovsky Avatar answered May 02 '26 01:05

govordovsky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!