<p>Basically, I am looping through a bunch of CSV files and in the end would like to <code>append</code> each dataframe into one. Actually, all I need is an <code>rbind</code> type function. So, I did some search and followed the guide. However, I still could not get the ideal solution.</p> <p>A sample code is attached below. For instance shape of data1 is always 47 by 42. But shape of <code>data_out_final</code> becomes (47, 42), (47, 84), and (47, 126) after the first three files. Idealy, it should be (141, 42). In addition, I check index of <code>data1</code>, which is <code>RangeIndex(start=0, stop=47, step=1)</code>. Appreciate any suggestions!</p> <p>My <code>pandas</code> version is <code>0.18.1</code></p> <h3>code</h3> <pre class="prettyprint"><code>appended_data = [] for csv_each in csv_pool: data1 = pd.read_csv(csv_each, header=0) # do something here appended_data.append(data2) data_out_final = pd.concat(appended_data, axis=1) </code></pre> <p>If using <code>data_out_final = pd.concat(appended_data, axis=1)</code>, shape of data_out_final becomes (141, 94)</p> <h3>PS</h3> <p>kind of figure it out. Actually, you have to standardize column names before <code>pd.concat</code>.</p>

<pre class="prettyprint"><code>>>> df1 a b 0 -1.417866 -0.828749 1 0.212349 0.791048 2 -0.451170 0.628584 3 0.612671 -0.995330 4 0.078460 -0.322976 5 1.244803 1.576373 6 1.169629 -1.135926 7 -0.652443 0.506388 8 0.549604 -0.691054 9 -0.512829 -0.959398 >>> df2 a b 0 -0.652161 0.940932 1 2.495067 0.004833 2 -2.187792 1.692402 3 1.900738 0.372425 4 0.245976 1.894527 5 0.627297 0.029331 6 -0.828628 -1.600014 7 -0.991835 -0.061202 8 0.543389 0.703457 9 -0.755059 1.239968 >>> pd.concat([df1, df2]) a b 0 -1.417866 -0.828749 1 0.212349 0.791048 2 -0.451170 0.628584 3 0.612671 -0.995330 4 0.078460 -0.322976 5 1.244803 1.576373 6 1.169629 -1.135926 7 -0.652443 0.506388 8 0.549604 -0.691054 9 -0.512829 -0.959398 0 -0.652161 0.940932 1 2.495067 0.004833 2 -2.187792 1.692402 3 1.900738 0.372425 4 0.245976 1.894527 5 0.627297 0.029331 6 -0.828628 -1.600014 7 -0.991835 -0.061202 8 0.543389 0.703457 9 -0.755059 1.239968 </code></pre> <p>Unless I'm misinterpreting what you need, this is what you need.</p>

Pandas equivalent rbind operation

Tags:

python

pandas

Basically, I am looping through a bunch of CSV files and in the end would like to append each dataframe into one. Actually, all I need is an rbind type function. So, I did some search and followed the guide. However, I still could not get the ideal solution.

A sample code is attached below. For instance shape of data1 is always 47 by 42. But shape of data_out_final becomes (47, 42), (47, 84), and (47, 126) after the first three files. Idealy, it should be (141, 42). In addition, I check index of data1, which is RangeIndex(start=0, stop=47, step=1). Appreciate any suggestions!

My pandas version is 0.18.1

code

Click to copy

appended_data = []
for csv_each in csv_pool:
    data1 = pd.read_csv(csv_each, header=0)
    # do something here
    appended_data.append(data2) 
data_out_final = pd.concat(appended_data, axis=1)

If using data_out_final = pd.concat(appended_data, axis=1), shape of data_out_final becomes (141, 94)

PS

kind of figure it out. Actually, you have to standardize column names before pd.concat.

411

asked Aug 08 '16 20:08

TTT

2 Answers

Click to copy

>>> df1
          a         b
0 -1.417866 -0.828749
1  0.212349  0.791048
2 -0.451170  0.628584
3  0.612671 -0.995330
4  0.078460 -0.322976
5  1.244803  1.576373
6  1.169629 -1.135926
7 -0.652443  0.506388
8  0.549604 -0.691054
9 -0.512829 -0.959398

>>> df2
          a         b
0 -0.652161  0.940932
1  2.495067  0.004833
2 -2.187792  1.692402
3  1.900738  0.372425
4  0.245976  1.894527
5  0.627297  0.029331
6 -0.828628 -1.600014
7 -0.991835 -0.061202
8  0.543389  0.703457
9 -0.755059  1.239968

>>> pd.concat([df1, df2])
          a         b
0 -1.417866 -0.828749
1  0.212349  0.791048
2 -0.451170  0.628584
3  0.612671 -0.995330
4  0.078460 -0.322976
5  1.244803  1.576373
6  1.169629 -1.135926
7 -0.652443  0.506388
8  0.549604 -0.691054
9 -0.512829 -0.959398
0 -0.652161  0.940932
1  2.495067  0.004833
2 -2.187792  1.692402
3  1.900738  0.372425
4  0.245976  1.894527
5  0.627297  0.029331
6 -0.828628 -1.600014
7 -0.991835 -0.061202
8  0.543389  0.703457
9 -0.755059  1.239968

Unless I'm misinterpreting what you need, this is what you need.

160

answered Sep 19 '22 07:09

Asish M.

Try: http://pandas.pydata.org/pandas-docs/stable/10min.html?highlight=concat#concat

"pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations."

answered Sep 18 '22 07:09

Jon

Related questions
                            
                                Timeout handling while using run_in_executor and asyncio
                            
                                How to speed up Sieve of Eratosthenes python list generator
                            
                                Hello World program in Cython fails with gcc after installation of python-dev and linking libraries
                            
                                OpenCV: How can I find the color inside a contour/polygon?
                            
                                Fit plane to a set of points in 3D: scipy.optimize.minimize vs scipy.linalg.lstsq
                            
                                Calculate attribute if it doesn't exist
                            
                                Problems with PLY LEX and YACC
                            
                                TensorFlow: generating a random constant
                            
                                Django form with two submit buttons . . . one requires fields and one doesn't
                            
                                What's the best way to "periodically" replace characters in a string in Python?
                            
                                Why running a python file doesn't require the execute permission?
                            
                                How to skip reading empty files with panda in Python
                            
                                Python logging handler to append to list
                            
                                How do I deploy a function in python with its dependencies?
                            
                                Cython: why is size_t faster than int?
                            
                                Mixing datetime.strptime() arguments
                            
                                Converting all occurrence of True/False to 1/0 in a dataframe with mixed datatype [duplicate]
                            
                                smtplib.SMTPAuthenticationError: (535, '5.7.3 Authentication unsuccessful')
                            
                                How do I extract data from a Bokeh ColumnDatasource
                            
                                Core Reporting API - How to use multiple dimensionFilterClauses filters?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With