I've been building pandas dataframes recently by iterating through multiple files, rows, etc. I've been building them by appending items in a dictionary and then converting to a dataframe:
I understand there are other tools such as apply() and interrows() to step through rows and apply or screen data by row. That is not the topic of this question.
new_data_dict = {}
for r in df.index:
    new_data = df.loc[r] **2
    new_data_dict[r] = new_data
new_df = pd.DataFrame.from_dict(new_data_dict, orient = 'index')
Is this the most efficient way to build a pandas df? I haven't compared it to pandas.DataFrame.append. I've had two thoughts about append. On one hand seems unnecessarily heavy to create a dataframe or series (of a single row) only to append it. On the otherhand everything built into pandas is super fast such as the above methods apply() and iterrows() as well as groupby() etc.
What is the 'pandamic' way to build a dataframe row by row?
Instead of using the for loop and repeating the rows, you can use numpy.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Location':['New York','Florida','California', 'Nevada','Georgia'],
                   'Owner':['John','Gary','Mike','Kate','Lucy'],
                   'Score':[50,80,70,90,80]})
print (df)
new_df = pd.DataFrame(np.repeat(df.values,2,axis=0))
print (new_df)
Original dataframe:
     Location Owner  Score
0    New York  John     50
1     Florida  Gary     80
2  California  Mike     70
3      Nevada  Kate     90
4     Georgia  Lucy     80
New Dataframe with repeated rows:
            0     1   2
0    New York  John  50
1    New York  John  50
2     Florida  Gary  80
3     Florida  Gary  80
4  California  Mike  70
5  California  Mike  70
6      Nevada  Kate  90
7      Nevada  Kate  90
8     Georgia  Lucy  80
9     Georgia  Lucy  80
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With