I have an np.where problem using Pandas that is driving me crazy and I can't seem to solve through Google, the documentation, etc.
I'm hoping someone has insight. I'm sure it isn't complex.
I have a df where I'm checking the value in one column - and if that value is 'n/a' (as a string, not as in .isnull()), changing it to another value.
Full_Names_Test_2['MarketCap'] == 'n/a'
returns:
70 True
88 False
90 True
145 True
156 True
181 True
191 True
200 True
219 True
223 False
Name: MarketCap, dtype: bool
so that part works.
but this:
Full_Names_Test_2['NewColumn'] = np.where(Full_Names_Test_2['MarketCap'] == 'n/a', 7)
returns:
ValueError: either both or neither of x and y should be given
What is going on?
NumPy is an open-source Python library that facilitates efficient numerical operations on large quantities of data. There are a few functions that exist in NumPy that we use on pandas DataFrames. For us, the most important part about NumPy is that pandas is built on top of it. So, NumPy is a dependency of Pandas.
We can use nested np. where() condition checks ( like we do for CASE THEN condition checking in other languages).
Definition and Usage The values property returns all values in the DataFrame. The return value is a 2-dimensional array with one array for each row.
NumPy performs better than Pandas for 50K rows or less. But, Pandas' performance is better than NumPy's for 500K rows or more. Thus, performance varies between 50K and 500K rows depending on the type of operation.
You need to pass the boolean mask and the (two) values columns:
np.where(Full_Names_Test_2['MarketCap'] == 'n/a', 7)
# should be
np.where(Full_Names_Test_2['MarketCap'] == 'n/a', Full_Names_Test_2['MarketCap'], 7)
See the np.where
docs.
or alternatively use the where
Series method:
Full_Names_Test_2['MarketCap'].where(Full_Names_Test_2['MarketCap'] == 'n/a', 7)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With