I have the following table. Some values are NaNs. Let's assume that columns are highly correlated. Taking row 0
and row 5
I say that value in col2
will be 4.0
. Same situation for row 1
and row 4
. But in case of row 6
, there is no perfectly matching sample so I should take most similar row - in this case, row 0
and change NaN to 3.0
.
How should I approach it? Is there any pandas function that can do this?
example = pd.DataFrame({"col1": [3, 2, 8, 4, 2, 3, np.nan],
"col2": [4, 3, 6, np.nan, 3, np.nan, 5],
"col3": [7, 8, 9, np.nan, np.nan, 7, 7],
"col4": [7, 8, 9, np.nan, np.nan, 7, 6]})
Output:
col1 col2 col3 col4
0 3.0 4.0 7.0 7.0
1 2.0 3.0 8.0 8.0
2 8.0 6.0 9.0 9.0
3 4.0 NaN NaN NaN
4 2.0 3.0 NaN NaN
5 3.0 NaN 7.0 7.0
6 NaN 5.0 7.0 6.0
Use the fillna() Method: The fillna() function iterates through your dataset and fills all null rows with a specified value. It accepts some optional arguments—take note of the following ones: Value: This is the value you want to insert into the missing rows. Method: Lets you fill missing values forward or in reverse.
Using fillna() to fill values from another column Here, we apply the fillna() function on “Col1” of the dataframe df and pass the series df['Col2'] as an argument. The above code fills the missing values in “Col1” with the corresponding values (based on the index) from “Col2”.
Extract rows/columns with missing values in specific columns/rows. You can use the isnull() or isna() method of pandas. DataFrame and Series to check if each element is a missing value or not. isnull() is an alias for isna() , whose usage is the same.
This is a hard question , involved numpy
broadcast , and groupby
+ transform
, I am using first
here , since first
will pick up the first not NaN
value
s=df.values
t=np.all((s==s[:,None])|np.isnan(s),-1)
idx=pd.DataFrame(t).where(t).stack().index
# we get the pair for each row
df=df.reindex(idx.get_level_values(1))
# reorder our df to the idx we just get
df.groupby(level=[0]).transform('first').groupby(level=1).first()
# using two times groupby with first , get what we need .
Out[217]:
col1 col2 col3 col4
0 3.0 4.0 7.0 7.0
1 2.0 3.0 8.0 8.0
2 8.0 6.0 9.0 9.0
3 4.0 NaN NaN NaN
4 2.0 3.0 8.0 8.0
5 3.0 4.0 7.0 7.0
6 NaN 5.0 7.0 6.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With