I am opening a csv called Remarks_Drug.csv which contains product names and mapped
filenames in consecutive columns. I am doing some operations on the product column
to remove all string content after + character. After stripping the string from +
characters, I am storing the result in a variable called product_patterns.
Now I am opening a new csv and I want to write the output from the for loop into
two columns, the first one containing the product_patterns and the second one containing
the corresponding filenames.
What I am getting as output now is only the last row of the output csv that I am looking for.
I think I am not looping properly so that each row of product_patterns and filename gets
appended in the output csv file.
Can someone please help me with this.
Attaching code below:
import csv
with open('Remarks_Drug.csv', newline='', encoding ='utf-8') as myFile:
reader = csv.reader(myFile)
for row in reader:
product = row[0].lower()
#print('K---'+ product)
filename = row[1]
product_patterns = ', '.join([i.split("+")[0].strip() for i in product.split(",")])
#print(product_patterns, filename)
with open ('drug_output100.csv', 'a') as csvfile:
fieldnames = ['product_patterns', 'filename']
print(fieldnames)
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
print(writer)
#writer.writeheader()
writer.writerow({'product_patterns':product_patterns, 'filename':filename})
Sample input:
Film-coated tablet + TERIFLUNOMIDE, 2011-07-18 - Received approval letter_EN.txt
Film-coated tablet + VANDETANIB, 2013-12-14 RECD Eudralink_Caprelsa II-28 - RSI - 14.12.2017.txt
Solution for injection + MenQuadTT, 395_EU001930-PIP01-16_2016-02-22.txt
Solution for injection + INSULIN GLARGINE, 2017-11-4 Updated PR.txt
Solution for injection + INSULIN GLARGINE + LIXISENATIDE, 2017 12 12 Email Approval Texts - SA1006-.txt
import csv
import pandas as pd
with open('Remarks_Drug.csv', newline='', encoding ='utf-8') as myFile:
reader = csv.reader(myFile)
mydrug = []
for row in reader:
product = row[0].lower()
#print('K---'+ product)
filename = row[1]
product_patterns = ', '.join([i.split("+")[0].strip() for i in product.split(",")])
mydrug.append([product_patterns, filename])
# print(mydrug)
df = pd.DataFrame(mydrug, columns=['product_patterns', 'filename'])
print(df)
df.to_csv('drug_output100.csv', sep=',', index=False)
This utilizes pandas library. If you're to deal with large csv files using pandas will be handy and efficient in terms of performance and memory. This is just an alternative solution for the above.
I hope this is the right way for you, if is not, tell me and we check.
import csv
with open('Remarks_Drug.csv') as myFile:
reader = csv.reader(myFile)
products_list = list()
filenames_list = list()
for row in reader:
products_list.append(row[0].lower().split("+")[0].strip())
filenames_list.append(row[1])
for index, product in enumerate(products_list):
with open ('drug_output100.csv', 'a') as csvfile:
fieldnames = ['product_patterns', 'filename']
print(fieldnames)
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
print(writer)
writer.writerow({'product_patterns':product, 'filename':filenames_list[index]})
You can also use pandas to elaborate csv files, faster and in a smart way.
Here the pandas solution:
import pandas as pd
def select_real_product(string_to_elaborate):
return string_to_elaborate.split('+')[0].strip()
df = pd.read_csv("Remarks_Drug.csv", delimiter=',', names=("product", "filename"))
df['product'] = df['product'].apply(select_real_product)
df.to_csv("drug_output100.csv", sep=',', na_rep='empty',index_label=False, index=False)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With