This is a question that usually appears in interviews. I know how to read csv files using <code>Pandas</code>. However I am struggling to find a way to read files without using external libraries. Does Python come with any module that would help read csv files?

When one's production environment is limited by memory, being able to read and manage data without importing additional libraries may be helpful. In order to achieve that, the built in <code>csv</code> module does the work. <pre class="prettyprint"><code>import csv </code></pre> There are at least two ways one might do that: using <code>csv.Reader()</code> or using <code>csv.DictReader()</code>. <code>csv.Reader()</code> allows you to access CSV data using indexes and is ideal for simple CSV files (Source). <code>csv.DictReader()</code> on the other hand is friendlier and easy to use, especially when working with large CSV files (Source). Here's how to do it with <code>csv.Reader()</code> <pre class="prettyprint"><code>>>> import csv >>> with open('eggs.csv', newline='') as csvfile: ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') ... for row in spamreader: ... print(', '.join(row)) Spam, Spam, Spam, Spam, Spam, Baked Beans Spam, Lovely Spam, Wonderful Spam </code></pre> Here's how to do it with <code>csv.DictReader()</code> <pre class="prettyprint"><code>>>> import csv >>> with open('names.csv', newline='') as csvfile: ... reader = csv.DictReader(csvfile) ... for row in reader: ... print(row['first_name'], row['last_name']) ... Eric Idle John Cleese >>> print(row) {'first_name': 'John', 'last_name': 'Cleese'} </code></pre> For another example, check Real Python's page here.

How to read a CSV file without using external libraries (such as Numpy, Pandas)?

3 Answers

You most likely will need a library to read a CSV file. While you could potentially open and parse the data yourself, this would be tedious and time consuming. Luckily python comes with a standard csv module that you won't have to pip install! You can read your file in like this:

import csv

with open('file.csv', 'r') as file:
    my_reader = csv.reader(file, delimiter=',')
    for row in my_reader:
        print(row)

This will show you that each row is being read in as a list. You can then process it based on index! There are other ways to read in data too as described at https://docs.python.org/3/library/csv.html one of which will create a dictionary instead of a list!

update

You linked your github for the project I took the snip

product_id,product_name,aisle_id,department_id
9327,Garlic Powder,104,13
17461,Air Chilled Organic Boneless Skinless Chicken Breasts,35,12
17668,Unsweetened Chocolate Almond Breeze Almond Milk,91,16
28985,Michigan Organic Kale,83,4
32665,Organic Ezekiel 49 Bread Cinnamon Raisin,112,3
33120,Organic Egg Whites,86,16
45918,Coconut Butter,19,13
46667,Organic Ginger Root,83,4
46842,Plain Pre-Sliced Bagels,93,3

Saved it as file.csv and ran it with the above code I posted. Result:

['product_id', 'product_name', 'aisle_id', 'department_id']
['9327', 'Garlic Powder', '104', '13']
['17461', 'Air Chilled Organic Boneless Skinless Chicken Breasts', '35', '12']
['17668', 'Unsweetened Chocolate Almond Breeze Almond Milk', '91', '16']
['28985', 'Michigan Organic Kale', '83', '4']
['32665', 'Organic Ezekiel 49 Bread Cinnamon Raisin', '112', '3']
['33120', 'Organic Egg Whites', '86', '16']
['45918', 'Coconut Butter', '19', '13']
['46667', 'Organic Ginger Root', '83', '4']
['46842', 'Plain Pre-Sliced Bagels', '93', '3']

This does what you have asked in your question. I am not going to do your project for you, you should be able to work it from here.

111

answered Oct 25 '22 05:10

Reedinationer

Recently I got a very similar question that was made more complicated than this one on making a data structure without using pandas. This is the only relevant question I have found so far. If I take this question, then what I was asked was: put the product id as keys to a dictionary and then put list of tuples of aisle and department ids as values (in python). The dictionary is the required dataframe. Of course I could not do it in 15 min (rather in 2 hours). It is hard for me to think of outside of numpy and pandas.

I have the following solutions, which also answers this question in the beginning. Probably not ideal but got what I needed.
Hopefully this helps too.

import csv
file =  open('data.csv', 'r')
reader = csv.reader(file)

items = []  # put the rows in csv to a list
aisle_dept_id = []  # to have a tuple of aisle and dept ids
mydict = {} # porudtc id as keys and list of above tuple as values in a dictionary

product_id, aisle_id, department_id, product_name = [], [], [], []

for row in reader:
    items.append(row)

for i  in range(1, len(items)):
    product_id.append(items[i][0])
    aisle_id.append(items[i][1])
    department_id.append(items[i][2])
    product_name.append(items[i][3])

for item1, item2 in zip(aisle_id, department_id):
    aisle_dept_id.append((item1, item2))
for item1, item2 in zip(product_id, aisle_dept_id):
    mydict.update({item1: [item2]})

With the output,

mydict:
{'9327': [('104', '13')],
 '17461': [('35', '12')],
 '17668': [('91', '16')],
 '28985': [('83', '4')],
 '32665': [('112', '3')],
 '33120': [('86', '16')],
 '45918': [('19', '13')],
 '46667': [('83', '4')],
 '46842': [('93', '3')]}

answered Oct 25 '22 03:10

Manjit P.

When one's production environment is limited by memory, being able to read and manage data without importing additional libraries may be helpful.

In order to achieve that, the built in csv module does the work.

import csv

There are at least two ways one might do that: using csv.Reader() or using csv.DictReader().

csv.Reader() allows you to access CSV data using indexes and is ideal for simple CSV files (Source).

csv.DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files (Source).

Here's how to do it with csv.Reader()

>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
...     for row in spamreader:
...         print(', '.join(row))
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam

Here's how to do it with csv.DictReader()

>>> import csv
>>> with open('names.csv', newline='') as csvfile:
...     reader = csv.DictReader(csvfile)
...     for row in reader:
...         print(row['first_name'], row['last_name'])
...
Eric Idle
John Cleese

>>> print(row)
{'first_name': 'John', 'last_name': 'Cleese'}

For another example, check Real Python's page here.

answered Oct 25 '22 04:10

Gonçalo Peres

Related questions
                            
                                Incorrect UTC date in MongoDB Compass
                            
                                conda update anaconda Fails | ClobberError
                            
                                Error Compiling Tensorflow From Source - No module named 'keras_applications'
                            
                                Rolling maximum with numpy
                            
                                Writing to JSON - Converting \u00a3 to £
                            
                                How to force install package in virtualenv?
                            
                                How to upgrade pandas on google colab
                            
                                String concatenation from a list of string, using a praticle in front and one at the end for each element
                            
                                Pygame failing to draw on Mac
                            
                                What is "DEDENT" in Python reference?
                            
                                Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?
                            
                                Get training hyperparameters from a trained keras model
                            
                                Search multiple strings for multiple words
                            
                                How to remove rows from Pandas dataframe if the same row exists in another dataframe but end up with all columns from both df
                            
                                Pytorch equivalent of Numpy's logical_and and kin?
                            
                                pip install produces the following error on mac: error: command 'gcc' failed with exit status 1
                            
                                How to remove the double quote when the value is empty in Spark?
                            
                                How to plot a thermometer?
                            
                                How can I change the coloring of parentheses and brackets?
                            
                                Python: how to get the subject of an email from gmail API

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read a CSV file without using external libraries (such as Numpy, Pandas)?

Tags:

python

pandas

dataframe

csv

excel

Mosali HarshaVardhan Reddy

People also ask