This is my .csv file
Choco_Type,ID,Cocoa,Milk,Sugar,ID,Cocoa,Milk,Sugar
Dark,Batch_11,80,0,16,Batch_12,78,0,14
Milk,Batch_72,35,25,25,Batch_73,32,27,22
Swiss,Batch_52,30,30,20,Batch_53,28,33,18
This is my code
for row in reader_in:
type_chocolate=row[0]
a= [(type_chocolate,row[1],row[2],row[3],row[4]),(type_chocolate,row[5],row[6],row[7],row[8])]
df=DataFrame.from_records(a)
this should be my output DataFrame
Choco_Type ID Cocoa Milk Sugar
Dark Batch_11 80 0 16
Dark Batch_12 78 0 14
Milk Batch_72 35 25 25
Milk Batch_73 32 27 22
Swiss Batch_52 30 30 20
Swiss Batch_53 28 33 18
I am unable to understand how to update the DataFrame'df' after each 'for' loop with new rows which are updated by using 'from_records' function which takes a list of iterables from the reader_in as it's input
First use read_csv for creating DataFrame from csv.
Then replace .1 to empty string what was added for NO duplicates in columns names.
set_index with first column and use concat with selecting first and last 4 columns by iloc:
import pandas as pd
from pandas.compat import StringIO
temp=u"""Choco_Type,ID,Cocoa,Milk,Sugar,ID,Cocoa,Milk,Sugar
Dark,Batch_11,80,0,16,Batch_12,78,0,14
Milk,Batch_72,35,25,25,Batch_73,32,27,22
Swiss,Batch_52,30,30,20,Batch_53,28,33,18"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp))
print (df)
Choco_Type ID Cocoa Milk Sugar ID.1 Cocoa.1 Milk.1 Sugar.1
0 Dark Batch_11 80 0 16 Batch_12 78 0 14
1 Milk Batch_72 35 25 25 Batch_73 32 27 22
2 Swiss Batch_52 30 30 20 Batch_53 28 33 18
df.columns = df.columns.str.replace('.1', '')
df = df.set_index('Choco_Type')
df = pd.concat([df.iloc[:, :4], df.iloc[:, 4:]]).reset_index()
print (df)
Choco_Type ID Cocoa Milk Sugar
0 Dark Batch_11 80 0 16
1 Milk Batch_72 35 25 25
2 Swiss Batch_52 30 30 20
3 Dark Batch_12 78 0 14
4 Milk Batch_73 32 27 22
5 Swiss Batch_53 28 33 18
If order is necessary changed by desired output:
df.columns = df.columns.str.replace('.1', '')
df = df.set_index('Choco_Type')
df = pd.concat([df.iloc[:, :4], df.iloc[:, 4:]], keys=(1,2), axis=1)
.stack(0)
.reset_index(level=1, drop=True)
.reset_index()
print (df)
Choco_Type ID Cocoa Milk Sugar
0 Dark Batch_11 80 0 16
1 Dark Batch_12 78 0 14
2 Milk Batch_72 35 25 25
3 Milk Batch_73 32 27 22
4 Swiss Batch_52 30 30 20
5 Swiss Batch_53 28 33 18
Another solution with pd.lreshape by dict which is created by dict comprehension with column names which not contains .1, also is necessery remove Choco_Type:
cols = df.columns[~((df.columns.str.contains('.1')) | (df.columns == 'Choco_Type'))]
print (cols)
Index(['ID', 'Cocoa', 'Milk', 'Sugar'], dtype='object')
d = {x: df.columns[df.columns.str.contains(x)].tolist() for x in cols}
print (d)
{'Milk': ['Milk', 'Milk.1'],
'Sugar': ['Sugar', 'Sugar.1'],
'ID': ['ID', 'ID.1'],
'Cocoa': ['Cocoa', 'Cocoa.1']}
df = pd.lreshape(df, d)
print (df)
Choco_Type Milk Sugar ID Cocoa
0 Dark 0 16 Batch_11 80
1 Milk 25 25 Batch_72 35
2 Swiss 30 20 Batch_52 30
3 Dark 0 14 Batch_12 78
4 Milk 27 22 Batch_73 32
5 Swiss 33 18 Batch_53 28
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With