Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas version of rbind

In R, you can combine two dataframes by sticking the columns of one onto the bottom of the columns of the other using rbind. In pandas, how do you accomplish the same thing? It seems bizarrely difficult.

Using append results in a horrible mess including NaNs and things for reasons I don't understand. I'm just trying to "rbind" two identical frames that look like this:

EDIT: I was creating the DataFrames in a stupid way, which was causing issues. Append=rbind to all intents and purposes. See answer below.

        0         1       2        3          4          5        6                    7 0   ADN.L  20130220   437.4   442.37   436.5000   441.9000  2775364  2013-02-20 18:47:42 1   ADM.L  20130220  1279.0  1300.00  1272.0000  1285.0000   967730  2013-02-20 18:47:42 2   AGK.L  20130220  1717.0  1749.00  1709.0000  1739.0000   834534  2013-02-20 18:47:43 3  AMEC.L  20130220  1030.0  1040.00  1024.0000  1035.0000  1972517  2013-02-20 18:47:43 4   AAL.L  20130220  1998.0  2014.50  1942.4999  1951.0000  3666033  2013-02-20 18:47:44 5  ANTO.L  20130220  1093.0  1097.00  1064.7899  1068.0000  2183931  2013-02-20 18:47:44 6   ARM.L  20130220   941.5   965.10   939.4250   951.5001  2994652  2013-02-20 18:47:45 

But I'm getting something horrible a la this:

        0         1        2        3          4         5        6                    7       0         1       2        3          4          5        6                    7 0     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADN.L  20130220   437.4   442.37   436.5000   441.9000  2775364  2013-02-20 18:47:42 1     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADM.L  20130220  1279.0  1300.00  1272.0000  1285.0000   967730  2013-02-20 18:47:42 2     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   AGK.L  20130220  1717.0  1749.00  1709.0000  1739.0000   834534  2013-02-20 18:47:43 3     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN  AMEC.L  20130220  1030.0  1040.00  1024.0000  1035.0000  1972517  2013-02-20 18:47:43 4     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   AAL.L  20130220  1998.0  2014.50  1942.4999  1951.0000  3666033  2013-02-20 18:47:44 5     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN  ANTO.L  20130220  1093.0  1097.00  1064.7899  1068.0000  2183931  2013-02-20 18:47:44 6     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ARM.L  20130220   941.5   965.10   939.4250   951.5001  2994652  2013-02-20 18:47:45 0     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADN.L  20130220   437.4   442.37   436.5000   441.9000  2775364  2013-02-20 18:47:42 1     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADM.L  20130220  1279.0  1300.00  1272.0000  1285.0000   967730  2013-02-20 18:47:42 2     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   AGK.L  20130220  1717.0  1749.00  1709.0000  1739.0000   834534  2013-02-20 18:47:43 3     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   

And I don't understand why. I'm starting to miss R :(

like image 868
N. McA. Avatar asked Feb 20 '13 19:02

N. McA.


People also ask

Is there a Rbind in Python?

The rbind function in R, short for row-bind, can be used to combine data frames together by their rows. The following examples shows how to use this function in practice.

What is the equivalent of Rbind in Python?

R Medium rbind function , namely _row-bind_ Abbreviation , It can be used to combine data frames according to their number of rows . We can use pandas Of concat() Function to execute Python Equivalent functions in . The following example shows how to use this function in practice .

What is faster than Rbind?

table is the fastest with average execution time 428 milliseconds. It's more than twice faster than bind_rows from dplyr , which took an average of 1,050 milliseconds, and more than 10 times faster than rbind from base R, which took an average of 5,358 milliseconds!

Does pandas support int32?

The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32.


2 Answers

Ah, this is to do with how I created the DataFrame, not with how I was combining them. The long and the short of it is, if you are creating a frame using a loop and a statement that looks like this:

Frame = Frame.append(pandas.DataFrame(data = SomeNewLineOfData)) 

You must ignore the index

Frame = Frame.append(pandas.DataFrame(data = SomeNewLineOfData), ignore_index=True) 

Or you will have issues later when combining data.

like image 66
N. McA. Avatar answered Oct 09 '22 22:10

N. McA.


pd.concat will serve the purpose of rbind in R.

import pandas as pd df1 = pd.DataFrame({'col1': [1,2], 'col2':[3,4]}) df2 = pd.DataFrame({'col1': [5,6], 'col2':[7,8]}) print(df1) print(df2) print(pd.concat([df1, df2])) 

The outcome will looks like:

   col1  col2 0     1     3 1     2     4    col1  col2 0     5     7 1     6     8    col1  col2 0     1     3 1     2     4 0     5     7 1     6     8 

If you read the documentation careful enough, it will also explain other operations like cbind, ..etc.

like image 20
B.Mr.W. Avatar answered Oct 09 '22 22:10

B.Mr.W.