This is a question that usually appears in interviews.
I know how to read csv files using Pandas
.
However I am struggling to find a way to read files without using external libraries.
Does Python come with any module that would help read csv files?
To read CSV data into a record in a Numpy array you can use the Numpy library genfromtxt() function, In this function's parameter, you need to set the delimiter to a comma. The genfromtxt() function is used quite frequently to load data from text files in Python.
To read CSV file without header, use the header parameter and set it to “ None ” in the read_csv () method. Load data from a CSV file into a Pandas DataFrame. This will display the headers as well − dataFrame = pd. read_csv ("C:\Users\amit_\Desktop\SalesData.csv") While loading, use the header parameter and set None to load the CSV without header −
In this article, we will discuss how to read CSV files with Numpy in Python. To import data from a text file, we will use the NumPy loadtxt () method. To use this function we need to make sure that the count of entries in each line of the text document should be equal.
read_csv() delimiter is a comma character; read_table() is a delimiter of tab \t. Related course: Data Analysis with Python Pandas. Read CSV Read csv with Python. The pandas function read_csv() reads in values, where the delimiter is a comma character. You can export a file into a csv file in any modern office suite including Google Sheets.
A simple way to store big data sets is to use CSV files (comma separated files). CSV files contains plain text and is a well know format that can be read by everyone including Pandas. In our examples we will be using a CSV file called 'data.csv'.
You most likely will need a library to read a CSV file. While you could potentially open and parse the data yourself, this would be tedious and time consuming. Luckily python comes with a standard csv
module that you won't have to pip install! You can read your file in like this:
import csv
with open('file.csv', 'r') as file:
my_reader = csv.reader(file, delimiter=',')
for row in my_reader:
print(row)
This will show you that each row
is being read in as a list. You can then process it based on index! There are other ways to read in data too as described at https://docs.python.org/3/library/csv.html one of which will create a dictionary instead of a list!
You linked your github for the project I took the snip
product_id,product_name,aisle_id,department_id
9327,Garlic Powder,104,13
17461,Air Chilled Organic Boneless Skinless Chicken Breasts,35,12
17668,Unsweetened Chocolate Almond Breeze Almond Milk,91,16
28985,Michigan Organic Kale,83,4
32665,Organic Ezekiel 49 Bread Cinnamon Raisin,112,3
33120,Organic Egg Whites,86,16
45918,Coconut Butter,19,13
46667,Organic Ginger Root,83,4
46842,Plain Pre-Sliced Bagels,93,3
Saved it as file.csv
and ran it with the above code I posted. Result:
['product_id', 'product_name', 'aisle_id', 'department_id']
['9327', 'Garlic Powder', '104', '13']
['17461', 'Air Chilled Organic Boneless Skinless Chicken Breasts', '35', '12']
['17668', 'Unsweetened Chocolate Almond Breeze Almond Milk', '91', '16']
['28985', 'Michigan Organic Kale', '83', '4']
['32665', 'Organic Ezekiel 49 Bread Cinnamon Raisin', '112', '3']
['33120', 'Organic Egg Whites', '86', '16']
['45918', 'Coconut Butter', '19', '13']
['46667', 'Organic Ginger Root', '83', '4']
['46842', 'Plain Pre-Sliced Bagels', '93', '3']
This does what you have asked in your question. I am not going to do your project for you, you should be able to work it from here.
Recently I got a very similar question that was made more complicated than this one on making a data structure without using pandas. This is the only relevant question I have found so far. If I take this question, then what I was asked was: put the product id as keys to a dictionary and then put list of tuples of aisle and department ids as values (in python). The dictionary is the required dataframe. Of course I could not do it in 15 min (rather in 2 hours). It is hard for me to think of outside of numpy and pandas.
I have the following solutions, which also answers this question in the beginning. Probably not ideal but got what I needed.
Hopefully this helps too.
import csv
file = open('data.csv', 'r')
reader = csv.reader(file)
items = [] # put the rows in csv to a list
aisle_dept_id = [] # to have a tuple of aisle and dept ids
mydict = {} # porudtc id as keys and list of above tuple as values in a dictionary
product_id, aisle_id, department_id, product_name = [], [], [], []
for row in reader:
items.append(row)
for i in range(1, len(items)):
product_id.append(items[i][0])
aisle_id.append(items[i][1])
department_id.append(items[i][2])
product_name.append(items[i][3])
for item1, item2 in zip(aisle_id, department_id):
aisle_dept_id.append((item1, item2))
for item1, item2 in zip(product_id, aisle_dept_id):
mydict.update({item1: [item2]})
With the output,
mydict:
{'9327': [('104', '13')],
'17461': [('35', '12')],
'17668': [('91', '16')],
'28985': [('83', '4')],
'32665': [('112', '3')],
'33120': [('86', '16')],
'45918': [('19', '13')],
'46667': [('83', '4')],
'46842': [('93', '3')]}
When one's production environment is limited by memory, being able to read and manage data without importing additional libraries may be helpful.
In order to achieve that, the built in csv
module does the work.
import csv
There are at least two ways one might do that: using csv.Reader()
or using csv.DictReader()
.
csv.Reader()
allows you to access CSV data using indexes and is ideal for simple CSV files (Source).
csv.DictReader()
on the other hand is friendlier and easy to use, especially when working with large CSV files (Source).
Here's how to do it with csv.Reader()
>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
... for row in spamreader:
... print(', '.join(row))
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam
Here's how to do it with csv.DictReader()
>>> import csv
>>> with open('names.csv', newline='') as csvfile:
... reader = csv.DictReader(csvfile)
... for row in reader:
... print(row['first_name'], row['last_name'])
...
Eric Idle
John Cleese
>>> print(row)
{'first_name': 'John', 'last_name': 'Cleese'}
For another example, check Real Python's page here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With