I want to merge consecutive NaN
values into slices. Is there a simple way of doing this with numpy or pandas?
l = [
(996, np.nan), (997, np.nan), (998, np.nan),
(999, -47.3), (1000, -72.5), (1100, -97.7),
(1200, np.nan), (1201, np.nan), (1205, -97.8),
(1300, np.nan), (1302, np.nan), (1305, -97.9),
(1400, np.nan), (1405, -97.10), (1408, np.nan)
]
l = pd.Series(dict(l))
Expected result:
[
(slice(996, 999, None), array([nan, nan, nan])),
(999, -47.3),
(1000, -72.5),
(1100, -97.7),
(slice(1200, 1202, None), array([nan, nan])),
(1205, -97.8),
(slice(1300, 1301, None), array([nan])),
(slice(1302, 1303, None), array([nan])),
(1305, -97.9),
(slice(1400, 1401, None), array([nan])),
(1405, -97.1),
(slice(1408, 1409, None), array([nan]))
]
A numpy array with two dimensions would be OK as well, rather than a list of tuples
Update 2019/05/31: I have just realised that if I just use a dictionary instead of a Pandas Series the algorythm is much more efficient
To count the NaN values in a column in a Pandas DataFrame, we can use the isna() method with sum.
Counting NaN in the entire DataFrame : To count NaN in the entire dataset, we just need to call the sum() function twice – once for getting the count in each column and again for finding the total sum of all the columns.
What you want is full or corner cases, nan equality, first element of each pair being a slice or a single value, second being a np.array or a single value.
For so complex requirements, I would just rely on a plain Python non vectorized way:
def trans(ser):
def build(last, cur, val):
if cur == last + 1:
if np.isnan(val):
return (slice(last, cur), np.array([np.nan]))
else:
return (last, val)
else:
return (slice(last, cur), np.array([val] * (cur - last)))
last = ser.iloc[0]
old = last_index = ser.index[0]
resul = []
for i in ser.index[1:]:
val = ser[i]
if ((val != last) and not(np.isnan(val) and np.isnan(last))) \
or i != old + 1:
resul.append(build(last_index, old + 1, last))
last_index = i
last = val
old = i
resul.append(build(last_index, old+1, last))
return resul
It gives something close to the expected result:
[(slice(996, 999, None), array([nan, nan, nan])),
(999, -47.3),
(1000, -72.5),
(1100, -97.7),
(slice(1200, 1202, None), array([nan, nan])),
(1205, -97.8),
(slice(1300, 1301, None), array([nan])),
(slice(1302, 1303, None), array([nan])),
(1305, -97.9),
(slice(1400, 1401, None), array([nan])),
(1405, -97.1),
(slice(1408, 1409, None), array([nan]))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With