Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: How to upload a pandas dataframe as a csv stream without saving on disc?

I want to upload a pandas dataframe to a server as csv file without saving it on the disc. Is there a way to create a more or less "fake csv" file which pretends to be a real file?

Here is some example code: First I get my data from a sql query and storing it as a dataframe. In the upload_ga_data function I want to have something with this logic

 media = MediaFileUpload('df',
                      mimetype='application/octet-stream',
                      resumable=False)

Full example:

from __future__ import print_function
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.errors import HttpError
from apiclient.http import MediaFileUpload
import pymysql
import pandas as pd
con = x

ga_query = """
    SELECT XXXXX
    """

df = pd.read_sql_query(ga_query,con)

df.to_csv('ga_export.csv', sep=',', encoding='utf-8', index = False)

def upload_ga_data():
    try:
        media = MediaFileUpload('ga_export.csv',
                          mimetype='application/octet-stream',
                          resumable=False)
        daily_upload = service.management().uploads().uploadData(
                accountId=accountId,
                webPropertyId=webPropertyId,
                customDataSourceId=customDataSourceId,
                media_body=media).execute()
        print ("Upload was successfull")
    except TypeError as error:
      # Handle errors in constructing a query.
      print ('There was an error in constructing your query : %s' % error)
like image 610
brnccc Avatar asked Dec 29 '17 10:12

brnccc


People also ask

How do I write a pandas DataFrame to a CSV file without index?

pandas DataFrame to CSV with no index can be done by using index=False param of to_csv() method. With this, you can specify ignore index while writing/exporting DataFrame to CSV file.

How do I get rid of pandas indexing?

Dropping a Pandas Index Column Using reset_index The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index.


1 Answers

The required behavior is possible using stream:

to create a more or less "fake csv" file which pretends to be a real file

Python makes File Descriptor (with open) and Stream (with io.StringIO) behave similarly. Then anywhere you can use a file descriptor can also use a String Stream.

The easiest way to create a text stream is with open(), optionally specifying an encoding:

f = open("myfile.txt", "r", encoding="utf-8")

In-memory text streams are also available as StringIO objects:

f = io.StringIO("some initial text data")

The text stream API is described in detail in the documentation of TextIOBase.

In Pandas you can do it with any function having path_or_buf argument in its signature, such as to_csv:

DataFrame.to_csv(path_or_buf=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.')

Following code exports a dummy DataFrame in CSV format into a String Stream (not physical file, in-memory octet-stream):

import io
import pandas as pd

df = pd.DataFrame(list(range(10)))

stream = io.StringIO()
df.to_csv(stream, sep=";")

When you want to get access to the stream content, just issue:

>>> stream.getvalue()
';0\n0;0\n1;1\n2;2\n3;3\n4;4\n5;5\n6;6\n7;7\n8;8\n9;9\n'

It returns the content without having the need to use a real file.

like image 70
jlandercy Avatar answered Sep 20 '22 23:09

jlandercy