I have multiple (more than 100) dataframes. How can I concat all of them?
The problem is, that I have too many dataframes, that I can not write them manually in a list, like this:
>>> cluster_1 = pd.DataFrame([['a', 1], ['b', 2]], ... columns=['letter ', 'number']) >>> cluster_1 letter number 0 a 1 1 b 2 >>> cluster_2 = pd.DataFrame([['c', 3], ['d', 4]], ... columns=['letter', 'number']) >>> cluster_2 letter number 0 c 3 1 d 4 >>> pd.concat([cluster_1, cluster_2]) letter number 0 a 1 1 b 2 0 c 3 1 d 4
The names of my N dataframes are cluster_1, cluster_2, cluster_3,..., cluster_N. The number N can be very high.
How can I concat N dataframes?
To concatenate DataFrames, use the concat() method, but to ignore duplicates, use the drop_duplicates() method.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
You can use the same approach to merge more than three DataFrames. Alternatively, you can also use DataFrame. merge() to join multiple pandas DataFrames.
I think you can just put it into a list, and then concat the list. In Pandas, the chunk function kind of already does this. I personally do this when using the chunk function in pandas.
pdList = [df1, df2, ...] # List of your dataframes new_df = pd.concat(pdList)
To create the pdList automatically assuming your dfs always start with "cluster".
pdList = [] pdList.extend(value for name, value in locals().items() if name.startswith('cluster_'))
Generally it goes like:
frames = [df1, df2, df3] result = pd.concat(frames)
Note: It will reset the index automatically. Read more details on different types of merging here.
For a large number of data frames: If you have hundreds of data frames, depending one if you have in on disk or in memory you can still create a list ("frames" in the code snippet) using a for a loop. If you have it in the disk, it can be easily done just saving all the df's in one single folder then reading all the files from that folder.
If you are generating the df's in memory, maybe try saving it in .pkl
first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With