I've been building pandas dataframes recently by iterating through multiple files, rows, etc. I've been building them by appending items in a dictionary and then converting to a dataframe:
I understand there are other tools such as apply() and interrows() to step through rows and apply or screen data by row. That is not the topic of this question.
new_data_dict = {}
for r in df.index:
new_data = df.loc[r] **2
new_data_dict[r] = new_data
new_df = pd.DataFrame.from_dict(new_data_dict, orient = 'index')
Is this the most efficient way to build a pandas df? I haven't compared it to pandas.DataFrame.append. I've had two thoughts about append. On one hand seems unnecessarily heavy to create a dataframe or series (of a single row) only to append it. On the otherhand everything built into pandas is super fast such as the above methods apply() and iterrows() as well as groupby() etc.
What is the 'pandamic' way to build a dataframe row by row?
Instead of using the for loop and repeating the rows, you can use numpy.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Location':['New York','Florida','California', 'Nevada','Georgia'],
'Owner':['John','Gary','Mike','Kate','Lucy'],
'Score':[50,80,70,90,80]})
print (df)
new_df = pd.DataFrame(np.repeat(df.values,2,axis=0))
print (new_df)
Original dataframe:
Location Owner Score
0 New York John 50
1 Florida Gary 80
2 California Mike 70
3 Nevada Kate 90
4 Georgia Lucy 80
New Dataframe with repeated rows:
0 1 2
0 New York John 50
1 New York John 50
2 Florida Gary 80
3 Florida Gary 80
4 California Mike 70
5 California Mike 70
6 Nevada Kate 90
7 Nevada Kate 90
8 Georgia Lucy 80
9 Georgia Lucy 80
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With