When I try to load a Google Spreadsheet in pandas <pre class="prettyprint"><code>from StringIO import StringIO import requests r = requests.get('https://docs.google.com/spreadsheet/ccc?key=<some_long_code>&output=csv') data = r.content df = pd.read_csv(StringIO(data), index_col=0) </code></pre> I get the following: <pre class="prettyprint"><code>CParserError: Error tokenizing data. C error: Expected 1316 fields in line 73, saw 1386 </code></pre> Why? I would think that one could identify the spreadsheet set of rows and columns with data and use the spreadsheets rows and columns as the dataframe index and columns respectively (with NaN for anything empty). Why does it fail?

Warning: this solution will make your data accessible by anyone. In google sheet click file>publish to web. Then select what do you need to publish and select export format .csv. You'll have the link something like: <code>https://docs.google.com/spreadsheets/d/<your sheets key yhere>/pub?gid=1317664180&single=true&output=csv</code> Then simply: <pre class="prettyprint lang-py prettyprint-override"><code>import pandas as pd pathtoCsv = r'https://docs.google.com/spreadsheets/d/<sheets key>/pub?gid=1317664180&single=true&output=csv' dev = pd.read_csv(pathtoCsv) print dev </code></pre>

Loading a generic Google Spreadsheet in Pandas

Tags:

python

pandas

gdata

When I try to load a Google Spreadsheet in pandas

from StringIO import StringIO  
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=<some_long_code>&output=csv')
data = r.content
df = pd.read_csv(StringIO(data), index_col=0)

I get the following:

CParserError: Error tokenizing data. C error: Expected 1316 fields in line 73, saw 1386

Why? I would think that one could identify the spreadsheet set of rows and columns with data and use the spreadsheets rows and columns as the dataframe index and columns respectively (with NaN for anything empty). Why does it fail?

214

asked Jun 05 '14 15:06

Amelio Vazquez-Reina

2 Answers

This question of mine shows how Getting Google Spreadsheet CSV into A Pandas Dataframe

As one of the commentators noted you have not asked for the data in CSV format you have the "edit" request at the end of the url You can use this code and see it work on the spreadsheet (which by the way needs to be public..) It is possible to do private sheets as well but that is another topic.

from StringIO import StringIO  # got moved around in python3 if you're using that.

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates=['Quradate'])

In [11]: df.head()
Out[11]: 
          City                                            region     Res_Comm  \
0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
10       Foley                              South_Mobile-Baldwin  Residential   
12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
44      Athens                 North_Huntsville-Decatur-Florence  Residential   

          mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
0            Rural 2010-01-15 00:00:00             2            2          3   
10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
38           Rural 2010-01-15 00:00:00             3            3          3   
44  Suburban_Urban 2010-01-15 00:00:00             4            5          4

The new Google spreadsheet url format for getting the csv output is

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id

Well they changed the url format slightly again now you need:

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&gid=0 #for the 1st sheet

I also found I needed to do the following to deal with Python 3 a slight revision to the above:

from io import StringIO

and to get the file:

guid=0 #for the 1st sheet
act = requests.get('https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&gid=%s' % guid)
dataact = act.content.decode('utf-8') #To convert to string for Stringio
actdf = pd.read_csv(StringIO(dataact),index_col=0,parse_dates=[0], thousands=',').sort()

actdf is now a full pandas dataframe with headers (column names)

151

answered Oct 07 '22 03:10

dartdog

Warning: this solution will make your data accessible by anyone.

In google sheet click file>publish to web. Then select what do you need to publish and select export format .csv. You'll have the link something like: https://docs.google.com/spreadsheets/d/<your sheets key yhere>/pub?gid=1317664180&single=true&output=csv

Then simply:

import pandas as pd
pathtoCsv = r'https://docs.google.com/spreadsheets/d/<sheets key>/pub?gid=1317664180&single=true&output=csv'
dev = pd.read_csv(pathtoCsv)
print dev

answered Oct 07 '22 02:10

zhukovgreen

Related questions
                            
                                Lemmatization of non-English words?
                            
                                How to aggregate multiple columns in pandas groupby
                            
                                Converting a Pandas DataFrame to R dataframe using Rpy2
                            
                                How to avoid rebuilding existing wheels when using pip?
                            
                                Python copying or cloning a defaultdict variable
                            
                                Why is it (or isn't it) good practice to use Getters / Accessors in Python OOP? [duplicate]
                            
                                virtualenv that can find relocated libraires (like mysqlclient lib for MySQLdb)
                            
                                sqlalchemy: applying an SQL-like date() function on a datetime column
                            
                                What is the proper way to mock a subscriptable property that raises an exception when accessed in Python 2.7?
                            
                                itertools.product - return list instead of tuple
                            
                                What happens when a Python yield statement has no expression?
                            
                                Random number function python that includes 1?
                            
                                Premature end of script headers Error in python cgi script
                            
                                How to send several keys in WebDriver with Python?
                            
                                Add Permission to Django Admin
                            
                                Python: how to batch rename mixed case to lower case with underscores
                            
                                Kivy virtual keyboard not showing
                            
                                Ordered os.listdir() in python [duplicate]
                            
                                python: UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 0: invalid start byte
                            
                                How protobuf-net serialize DateTime?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With