Python download large csv file from a url line by line for only 10 entries

Tags:

csv

I have a large csv file of the client and shared via a url to download and I want to download it line by line or by bytes and I want to limit only for 10 entries.

I have the following code which will download the file, but i want here to download only the first 10 entries from the file, I don't want the full file.

#!/usr/bin/env python
import requests
from contextlib import closing
import csv

url = "https://example.com.au/catalog/food-catalog.csv"

with closing(requests.get(url, stream=True)) as r:
    f = (line.decode('utf-8') for line in r.iter_lines())
    reader = csv.reader(f, delimiter=',', quotechar='"')
    for row in reader:
        print(row)

I don't know much about contextlib, how it will work with with in Python.

Can anyone help me here, it would be really helpful, and thanks in advance.

409

asked Dec 17 '18 12:12

3 Answers

The issue is not so much with contextlib as with generators. When your with block ends, the connection will be closed, fairly straightforwardly.

The part that actually does the download is for row in reader:, since reader is wrapped around f, which is a lazy generator. Each iteration of the loop will actually read a line from the stream, possibly with some internal buffering by Python.

The key then is to stop the loop after 10 lines. There area couple of simple ways of doing that:

for count, row in enumerate(reader, start=1):
    print(row)

    if count == 10:
        break

from itertools import islice

...

for row in islice(reader, 0, 10):
    print(row)

answered Oct 21 '22 18:10

Mad Physicist

Pandas can also be an approach:

import pandas as pd

#create a datafram from your original csv, with "," as your separator 
#and limiting the read to the first 10 rows
#here, I also configured it to also read it as UTF-8 encoded
your_csv = pd.read_csv("https://example.com.au/catalog/food-catalog.csv", sep = ',', nrows = 10, encoding = 'utf-8')

#You can now print it:
print(your_csv)

#And even save it:
your_csv.to_csv(filePath, sep = ',', encoding = 'utf-8')

answered Oct 21 '22 18:10

Pedro Martins de Souza

You can generalize the idea by making a generator that will yield the next n lines on every call. The grouper recipe from the itertools module is useful for things like this.

import requests
import itertools
import csv
import contextlib

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args, fillvalue=fillvalue)

def stream_csv_download(chunk_size):
    url = 'https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2017-financial-year-provisional/Download-data/annual-enterprise-survey-2017-financial-year-provisional-csv.csv'
    with contextlib.closing(requests.get(url, stream=True)) as stream:
        lines = (line.decode('utf-8') for line in stream.iter_lines(chunk_size))
        reader = csv.reader(lines, delimiter=',', quotechar='"')
        chunker = grouper(reader, chunk_size, None)
        while True:
            try:
                yield [line for line in next(chunker)]
            except StopIteration:
                return

csv_file = stream_csv_download(10)

This definitely does buffer some amount of data as the calls are quick but I don't think that it is downloading the entire file. I'll have to test with a large file.

answered Oct 21 '22 18:10

Austin Mackillop

Related questions
                            
                                tf.keras.models.save_model and optimizer warning
                            
                                Django Rest Framework override viewset list() method without loosing filter_backends functionality
                            
                                How do you understand the ioloop in tornado?
                            
                                Python pretty print nested objects
                            
                                how to put column name into data frame cell with specific conditions in pandas
                            
                                How to use different data augmentation for Subsets in PyTorch
                            
                                Keras：load_model ValueError: axes don't match array
                            
                                Convenient way to deal with ValueError: cannot reindex from a duplicate axis
                            
                                Put comments in between multi-line statement (with line continuation)
                            
                                unable to build model as backend.squeeze has no layer
                            
                                Authentication credentials were not provided. when deployed to AWS
                            
                                How do you "clear" only specific Flask session variables?
                            
                                how to add multiple autocomplete in django admin page
                            
                                Filter pandas row where 1st letter in a column is/is-not a certain value
                            
                                What is the meaning of hash if we still need to check every item?
                            
                                Diagonal snake filling array
                            
                                How to find first value in a list having no duplicates?
                            
                                Best way to get (millions of rows of) data into Janusgraph via Tinkerpop, with a specific model
                            
                                Python: Reversibly encode alphanumeric string to integer
                            
                                Does python threading.Lock() lock everything that needs locking?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python download large csv file from a url line by line for only 10 entries

Tags:

python

csv

chethi

People also ask

3 Answers

Mad Physicist

Pedro Martins de Souza

Austin Mackillop

Recent Activity

Donate For Us