Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing data, insert rows in Pandas and fill with NAN

I'm new to Python and Pandas so there might be a simple solution which I don't see.

I have a number of discontinuous datasets which look like this:

ind A    B  C   0   0.0  1  3   1   0.5  4  2   2   1.0  6  1   3   3.5  2  0   4   4.0  4  5   5   4.5  3  3   

I now look for a solution to get the following:

ind A    B  C   0   0.0  1  3   1   0.5  4  2   2   1.0  6  1   3   1.5  NAN NAN   4   2.0  NAN NAN   5   2.5  NAN NAN   6   3.0  NAN NAN   7   3.5  2  0   8   4.0  4  5   9   4.5  3  3   

The problem is,that the gap in A varies from dataset to dataset in position and length...

like image 948
mati Avatar asked Sep 18 '14 10:09

mati


People also ask

How do you replace missing values with NaN in Python?

You can replace the missing value ( NaN ) in pandas. DataFrame and Series with any value using the fillna() method.

How do you fill columns with a NaN values in pandas?

fillna() method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modify using inplace, or limit how many filling to perform or choose an axis whether to fill on rows/column etc. The Below example fills all NaN values with None value.

How do you fill empty cells with NaN in Python?

Another method to replace blank values with NAN is by using DataFrame. apply() method and lambda functions. The apply() method allows you to apply a function along with one of the axis of the DataFrame, default 0, which is the index (row) axis. In order to use this, you need to have all columns as String type.


1 Answers

set_index and reset_index are your friends.

df = DataFrame({"A":[0,0.5,1.0,3.5,4.0,4.5], "B":[1,4,6,2,4,3], "C":[3,2,1,0,5,3]}) 

First move column A to the index:

In [64]: df.set_index("A") Out[64]:       B  C  A         0.0  1  3 0.5  4  2 1.0  6  1 3.5  2  0 4.0  4  5 4.5  3  3 

Then reindex with a new index, here the missing data is filled in with nans. We use the Index object since we can name it; this will be used in the next step.

In [66]: new_index = Index(arange(0,5,0.5), name="A") In [67]: df.set_index("A").reindex(new_index) Out[67]:        B   C 0.0   1   3 0.5   4   2 1.0   6   1 1.5 NaN NaN 2.0 NaN NaN 2.5 NaN NaN 3.0 NaN NaN 3.5   2   0 4.0   4   5 4.5   3   3 

Finally move the index back to the columns with reset_index. Since we named the index, it all works magically:

In [69]: df.set_index("A").reindex(new_index).reset_index() Out[69]:         A   B   C 0    0.0   1   3 1    0.5   4   2 2    1.0   6   1 3    1.5 NaN NaN 4    2.0 NaN NaN 5    2.5 NaN NaN 6    3.0 NaN NaN 7    3.5   2   0 8    4.0   4   5 9    4.5   3   3 
like image 145
cronos Avatar answered Sep 30 '22 18:09

cronos