I'm new to Python and Pandas so there might be a simple solution which I don't see. I have a number of discontinuous datasets which look like this: <pre class="prettyprint"><code>ind A B C 0 0.0 1 3 1 0.5 4 2 2 1.0 6 1 3 3.5 2 0 4 4.0 4 5 5 4.5 3 3 </code></pre> I now look for a solution to get the following: <pre class="prettyprint"><code>ind A B C 0 0.0 1 3 1 0.5 4 2 2 1.0 6 1 3 1.5 NAN NAN 4 2.0 NAN NAN 5 2.5 NAN NAN 6 3.0 NAN NAN 7 3.5 2 0 8 4.0 4 5 9 4.5 3 3 </code></pre> The problem is,that the gap in A varies from dataset to dataset in position and length...

<code>set_index</code> and <code>reset_index</code> are your friends. <pre class="prettyprint"><code>df = DataFrame({"A":[0,0.5,1.0,3.5,4.0,4.5], "B":[1,4,6,2,4,3], "C":[3,2,1,0,5,3]}) </code></pre> First move column A to the index: <pre class="prettyprint"><code>In [64]: df.set_index("A") Out[64]: B C A 0.0 1 3 0.5 4 2 1.0 6 1 3.5 2 0 4.0 4 5 4.5 3 3 </code></pre> Then reindex with a new index, here the missing data is filled in with nans. We use the <code>Index</code> object since we can name it; this will be used in the next step. <pre class="prettyprint"><code>In [66]: new_index = Index(arange(0,5,0.5), name="A") In [67]: df.set_index("A").reindex(new_index) Out[67]: B C 0.0 1 3 0.5 4 2 1.0 6 1 1.5 NaN NaN 2.0 NaN NaN 2.5 NaN NaN 3.0 NaN NaN 3.5 2 0 4.0 4 5 4.5 3 3 </code></pre> Finally move the index back to the columns with <code>reset_index</code>. Since we named the index, it all works magically: <pre class="prettyprint"><code>In [69]: df.set_index("A").reindex(new_index).reset_index() Out[69]: A B C 0 0.0 1 3 1 0.5 4 2 2 1.0 6 1 3 1.5 NaN NaN 4 2.0 NaN NaN 5 2.5 NaN NaN 6 3.0 NaN NaN 7 3.5 2 0 8 4.0 4 5 9 4.5 3 3 </code></pre>

Missing data, insert rows in Pandas and fill with NAN

Tags:

python

pandas

numpy

I'm new to Python and Pandas so there might be a simple solution which I don't see.

I have a number of discontinuous datasets which look like this:

ind A    B  C   0   0.0  1  3   1   0.5  4  2   2   1.0  6  1   3   3.5  2  0   4   4.0  4  5   5   4.5  3  3

I now look for a solution to get the following:

ind A    B  C   0   0.0  1  3   1   0.5  4  2   2   1.0  6  1   3   1.5  NAN NAN   4   2.0  NAN NAN   5   2.5  NAN NAN   6   3.0  NAN NAN   7   3.5  2  0   8   4.0  4  5   9   4.5  3  3

The problem is,that the gap in A varies from dataset to dataset in position and length...

948

asked Sep 18 '14 10:09

mati

1 Answers

set_index and reset_index are your friends.

df = DataFrame({"A":[0,0.5,1.0,3.5,4.0,4.5], "B":[1,4,6,2,4,3], "C":[3,2,1,0,5,3]})

First move column A to the index:

In [64]: df.set_index("A") Out[64]:       B  C  A         0.0  1  3 0.5  4  2 1.0  6  1 3.5  2  0 4.0  4  5 4.5  3  3

Then reindex with a new index, here the missing data is filled in with nans. We use the Index object since we can name it; this will be used in the next step.

In [66]: new_index = Index(arange(0,5,0.5), name="A") In [67]: df.set_index("A").reindex(new_index) Out[67]:        B   C 0.0   1   3 0.5   4   2 1.0   6   1 1.5 NaN NaN 2.0 NaN NaN 2.5 NaN NaN 3.0 NaN NaN 3.5   2   0 4.0   4   5 4.5   3   3

Finally move the index back to the columns with reset_index. Since we named the index, it all works magically:

In [69]: df.set_index("A").reindex(new_index).reset_index() Out[69]:         A   B   C 0    0.0   1   3 1    0.5   4   2 2    1.0   6   1 3    1.5 NaN NaN 4    2.0 NaN NaN 5    2.5 NaN NaN 6    3.0 NaN NaN 7    3.5   2   0 8    4.0   4   5 9    4.5   3   3

145

answered Sep 30 '22 18:09

cronos

Related questions
                            
                                pip how to remove incorrectly installed package with a leading dash: "-pkgname"
                            
                                Indexing numpy array with another numpy array
                            
                                Login to website using urllib2 - Python 2.7
                            
                                What's the simplest way to put a python script into the system tray (Windows)
                            
                                Iterate over pairs in a list (circular fashion) in Python
                            
                                Atlassian Bamboo with Django & Python - Possible?
                            
                                README extension for Python projects
                            
                                Pip: Specifying minor version
                            
                                Setting exit code in Python when an exception is raised
                            
                                Is there a max size, max no. of columns, max rows?
                            
                                Did something about `namedtuple` change in 3.5.1?
                            
                                Copy numpy array into part of another array
                            
                                Python decompressing gzip chunk-by-chunk
                            
                                How should I perform imports in a python module without polluting its namespace?
                            
                                Why is math.factorial much slower in Python 2.x than 3.x?
                            
                                Python package import from parent directory
                            
                                Tkinter assign button command in loop with lambda
                            
                                How to do group by on a multiindex in pandas?
                            
                                Structure of inputs to scipy minimize function
                            
                                Python Matplotlib - how to specify values on y axis?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With