Basically, I am looping through a bunch of CSV files and in the end would like to append
each dataframe into one. Actually, all I need is an rbind
type function. So, I did some search and followed the guide. However, I still could not get the ideal solution.
A sample code is attached below. For instance shape of data1 is always 47 by 42. But shape of data_out_final
becomes (47, 42), (47, 84), and (47, 126) after the first three files. Idealy, it should be (141, 42). In addition, I check index of data1
, which is RangeIndex(start=0, stop=47, step=1)
. Appreciate any suggestions!
My pandas
version is 0.18.1
appended_data = []
for csv_each in csv_pool:
data1 = pd.read_csv(csv_each, header=0)
# do something here
appended_data.append(data2)
data_out_final = pd.concat(appended_data, axis=1)
If using data_out_final = pd.concat(appended_data, axis=1)
, shape of data_out_final becomes (141, 94)
kind of figure it out. Actually, you have to standardize column names before pd.concat
.
Method 1: Use rbind() function with equal columns This will combine the rows based on columns. Example: Python3.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
In Pandas, there are parameters to perform left, right, inner or outer merge and join on two DataFrames or Series. However there's no possibility as of now to perform a cross join to merge or join two methods using how="cross" parameter.
Use pandas.concat() method to concat two DataFrames by rows meaning appending two DataFrames. By default, it performs append operations similar to a union where it bright all rows from both DataFrames to a single DataFrame.
>>> df1
a b
0 -1.417866 -0.828749
1 0.212349 0.791048
2 -0.451170 0.628584
3 0.612671 -0.995330
4 0.078460 -0.322976
5 1.244803 1.576373
6 1.169629 -1.135926
7 -0.652443 0.506388
8 0.549604 -0.691054
9 -0.512829 -0.959398
>>> df2
a b
0 -0.652161 0.940932
1 2.495067 0.004833
2 -2.187792 1.692402
3 1.900738 0.372425
4 0.245976 1.894527
5 0.627297 0.029331
6 -0.828628 -1.600014
7 -0.991835 -0.061202
8 0.543389 0.703457
9 -0.755059 1.239968
>>> pd.concat([df1, df2])
a b
0 -1.417866 -0.828749
1 0.212349 0.791048
2 -0.451170 0.628584
3 0.612671 -0.995330
4 0.078460 -0.322976
5 1.244803 1.576373
6 1.169629 -1.135926
7 -0.652443 0.506388
8 0.549604 -0.691054
9 -0.512829 -0.959398
0 -0.652161 0.940932
1 2.495067 0.004833
2 -2.187792 1.692402
3 1.900738 0.372425
4 0.245976 1.894527
5 0.627297 0.029331
6 -0.828628 -1.600014
7 -0.991835 -0.061202
8 0.543389 0.703457
9 -0.755059 1.239968
Unless I'm misinterpreting what you need, this is what you need.
Try: http://pandas.pydata.org/pandas-docs/stable/10min.html?highlight=concat#concat
"pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With