Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a CSV file without using external libraries (such as Numpy, Pandas)?

This is a question that usually appears in interviews.

I know how to read csv files using Pandas.

However I am struggling to find a way to read files without using external libraries.

Does Python come with any module that would help read csv files?

like image 734
Mosali HarshaVardhan Reddy Avatar asked Mar 28 '19 18:03

Mosali HarshaVardhan Reddy


People also ask

How do I read a CSV file in Numpy?

To read CSV data into a record in a Numpy array you can use the Numpy library genfromtxt() function, In this function's parameter, you need to set the delimiter to a comma. The genfromtxt() function is used quite frequently to load data from text files in Python.

How to read CSV file without header in pandas?

To read CSV file without header, use the header parameter and set it to “ None ” in the read_csv () method. Load data from a CSV file into a Pandas DataFrame. This will display the headers as well − dataFrame = pd. read_csv ("C:\Users\amit_\Desktop\SalesData.csv") While loading, use the header parameter and set None to load the CSV without header −

How to read CSV files with NumPy in Python?

In this article, we will discuss how to read CSV files with Numpy in Python. To import data from a text file, we will use the NumPy loadtxt () method. To use this function we need to make sure that the count of entries in each line of the text document should be equal.

What is the difference between read_CSV and read_table in pandas?

read_csv() delimiter is a comma character; read_table() is a delimiter of tab \t. Related course: Data Analysis with Python Pandas. Read CSV Read csv with Python. The pandas function read_csv() reads in values, where the delimiter is a comma character. You can export a file into a csv file in any modern office suite including Google Sheets.

What is the best way to store data in pandas?

A simple way to store big data sets is to use CSV files (comma separated files). CSV files contains plain text and is a well know format that can be read by everyone including Pandas. In our examples we will be using a CSV file called 'data.csv'.


3 Answers

You most likely will need a library to read a CSV file. While you could potentially open and parse the data yourself, this would be tedious and time consuming. Luckily python comes with a standard csv module that you won't have to pip install! You can read your file in like this:

import csv

with open('file.csv', 'r') as file:
    my_reader = csv.reader(file, delimiter=',')
    for row in my_reader:
        print(row)

This will show you that each row is being read in as a list. You can then process it based on index! There are other ways to read in data too as described at https://docs.python.org/3/library/csv.html one of which will create a dictionary instead of a list!

update

You linked your github for the project I took the snip

product_id,product_name,aisle_id,department_id
9327,Garlic Powder,104,13
17461,Air Chilled Organic Boneless Skinless Chicken Breasts,35,12
17668,Unsweetened Chocolate Almond Breeze Almond Milk,91,16
28985,Michigan Organic Kale,83,4
32665,Organic Ezekiel 49 Bread Cinnamon Raisin,112,3
33120,Organic Egg Whites,86,16
45918,Coconut Butter,19,13
46667,Organic Ginger Root,83,4
46842,Plain Pre-Sliced Bagels,93,3

Saved it as file.csv and ran it with the above code I posted. Result:

['product_id', 'product_name', 'aisle_id', 'department_id']
['9327', 'Garlic Powder', '104', '13']
['17461', 'Air Chilled Organic Boneless Skinless Chicken Breasts', '35', '12']
['17668', 'Unsweetened Chocolate Almond Breeze Almond Milk', '91', '16']
['28985', 'Michigan Organic Kale', '83', '4']
['32665', 'Organic Ezekiel 49 Bread Cinnamon Raisin', '112', '3']
['33120', 'Organic Egg Whites', '86', '16']
['45918', 'Coconut Butter', '19', '13']
['46667', 'Organic Ginger Root', '83', '4']
['46842', 'Plain Pre-Sliced Bagels', '93', '3']

This does what you have asked in your question. I am not going to do your project for you, you should be able to work it from here.

like image 111
Reedinationer Avatar answered Oct 25 '22 05:10

Reedinationer


Recently I got a very similar question that was made more complicated than this one on making a data structure without using pandas. This is the only relevant question I have found so far. If I take this question, then what I was asked was: put the product id as keys to a dictionary and then put list of tuples of aisle and department ids as values (in python). The dictionary is the required dataframe. Of course I could not do it in 15 min (rather in 2 hours). It is hard for me to think of outside of numpy and pandas.

I have the following solutions, which also answers this question in the beginning. Probably not ideal but got what I needed.
Hopefully this helps too.

import csv
file =  open('data.csv', 'r')
reader = csv.reader(file)

items = []  # put the rows in csv to a list
aisle_dept_id = []  # to have a tuple of aisle and dept ids
mydict = {} # porudtc id as keys and list of above tuple as values in a dictionary

product_id, aisle_id, department_id, product_name = [], [], [], []

for row in reader:
    items.append(row)

for i  in range(1, len(items)):
    product_id.append(items[i][0])
    aisle_id.append(items[i][1])
    department_id.append(items[i][2])
    product_name.append(items[i][3])

for item1, item2 in zip(aisle_id, department_id):
    aisle_dept_id.append((item1, item2))
for item1, item2 in zip(product_id, aisle_dept_id):
    mydict.update({item1: [item2]})

With the output,

mydict:
{'9327': [('104', '13')],
 '17461': [('35', '12')],
 '17668': [('91', '16')],
 '28985': [('83', '4')],
 '32665': [('112', '3')],
 '33120': [('86', '16')],
 '45918': [('19', '13')],
 '46667': [('83', '4')],
 '46842': [('93', '3')]}
like image 2
Manjit P. Avatar answered Oct 25 '22 03:10

Manjit P.


When one's production environment is limited by memory, being able to read and manage data without importing additional libraries may be helpful.

In order to achieve that, the built in csv module does the work.

import csv

There are at least two ways one might do that: using csv.Reader() or using csv.DictReader().

csv.Reader() allows you to access CSV data using indexes and is ideal for simple CSV files (Source).

csv.DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files (Source).

Here's how to do it with csv.Reader()

>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
...     for row in spamreader:
...         print(', '.join(row))
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam

Here's how to do it with csv.DictReader()

>>> import csv
>>> with open('names.csv', newline='') as csvfile:
...     reader = csv.DictReader(csvfile)
...     for row in reader:
...         print(row['first_name'], row['last_name'])
...
Eric Idle
John Cleese

>>> print(row)
{'first_name': 'John', 'last_name': 'Cleese'}

For another example, check Real Python's page here.

like image 1
Gonçalo Peres Avatar answered Oct 25 '22 04:10

Gonçalo Peres