OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great. <pre class="prettyprint"><code># Import libraries import os import pandas as pd import numpy as np from geopy.geocoders import Nominatim def locate(x): geolocator = Nominatim() # print(x) # debug try: #Get geocode location = geolocator.geocode(x, timeout=8, exactly_one=True) lat = location.latitude lon = location.longitude except: #didn't work for some reason that I really don't care about lat = np.nan lon = np.nan # print(lat,lon) #debug return lat, lon # Note: also tried return { 'LAT': lat, 'LON': lon } df_geo_in = df_addr.drop_duplicates(['COUNTRY']).reset_index() #works perfectly df_geo_in['LAT'], df_geo_in['LON'] = df_geo_in.applymap(locate) # error: returns more than 2 values - default index + column with results </code></pre> I also tried <pre class="prettyprint"><code>df_geo_in['LAT','LON'] = df_geo_in.applymap(locate) </code></pre> I get a single dataframe with no index and a single colume with the series in it. I've tried a number of other methods, including 'applymap' : <pre class="prettyprint"><code>source_cols = ['LAT','LON'] new_cols = [str(x) for x in source_cols] df_geo_in = df_addr.drop_duplicates(['COUNTRY']).set_index(['COUNTRY']) df_geo_in[new_cols] = df_geo_in.applymap(locate) </code></pre> which returned an error after a long time: <blockquote> ValueError: Columns must be same length as key </blockquote> I've also tried manually converting the series to a dataframe using the <code>df.from_dict(df_geo_in)</code> method without success. The goal is to geocode 166 unique countries, then join it back to the 188K addresses in df_addr. I'm trying to be pandas-y in my code and not write loops if possible. But I haven't found the magic to convert series into dataframes and this is the first time I've tried to use apply. Thanks in advance - ancient C programmer

I'm assuming that <code>df_geo</code> is a df with a single column so I believe the following should work: change: <pre class="prettyprint"><code>return lat, lon </code></pre> to <pre class="prettyprint"><code>return pd.Series([lat, lon]) </code></pre> then you should be able to assign like so: <pre class="prettyprint"><code>df_geo_in[['LAT', 'LON']] = df_geo_in.apply(locate) </code></pre> What you tried to do was assign the result of <code>applymap</code> to 2 new columns which is incorrect here as <code>applymap</code> is designed to work on every element in a df so unless the lhs has the same expected shape this won't give the desired result. Your latter method is also incorrect because you drop the duplicate countries and then expect this to assign every country geolocation back but the shape is different. It is probably quicker for large df's to create the geolocation non-duplicated df's and then merge this back to your larger df like so: <pre class="prettyprint"><code>geo_lookup = df_addr.drop_duplicates(['COUNTRY']) geo_lookup[['LAT','LNG']] = geo_lookup['COUNTRY'].apply(locate) df_geo_in.merge(geo_lookup, left_on='COUNTRY', right_on='COUNTRY', how='left') </code></pre> this will create a df with non duplicated countries with geo location addresses and then we perform a left merge back to the master df.

Python Pandas 'apply' returns series; can't convert to dataframe

Tags:

python

pandas

apply

geocode

geopy

OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great.

# Import libraries 
import os 
import pandas as pd 
import numpy as np
from geopy.geocoders import Nominatim

def locate(x):
    geolocator = Nominatim()
    # print(x) # debug
    try:
        #Get geocode
        location = geolocator.geocode(x, timeout=8, exactly_one=True)
        lat = location.latitude
        lon = location.longitude
    except:
        #didn't work for some reason that I really don't care about
        lat = np.nan
        lon = np.nan
   #  print(lat,lon) #debug
    return lat,  lon # Note: also tried return { 'LAT': lat, 'LON': lon }

df_geo_in = df_addr.drop_duplicates(['COUNTRY']).reset_index()    #works perfectly
df_geo_in['LAT'], df_geo_in['LON']  = df_geo_in.applymap(locate) 
# error: returns more than 2 values - default index + column with results

I also tried

df_geo_in['LAT','LON'] = df_geo_in.applymap(locate)

I get a single dataframe with no index and a single colume with the series in it.

I've tried a number of other methods, including 'applymap' :

source_cols = ['LAT','LON'] 
new_cols = [str(x) for x in source_cols]

df_geo_in = df_addr.drop_duplicates(['COUNTRY']).set_index(['COUNTRY']) 
df_geo_in[new_cols] = df_geo_in.applymap(locate)

which returned an error after a long time:

ValueError: Columns must be same length as key

I've also tried manually converting the series to a dataframe using the df.from_dict(df_geo_in) method without success.

The goal is to geocode 166 unique countries, then join it back to the 188K addresses in df_addr. I'm trying to be pandas-y in my code and not write loops if possible. But I haven't found the magic to convert series into dataframes and this is the first time I've tried to use apply.

Thanks in advance - ancient C programmer

271

asked Mar 31 '15 02:03

Harvey

1 Answers

I'm assuming that df_geo is a df with a single column so I believe the following should work:

change:

return lat,  lon

return pd.Series([lat,  lon])

then you should be able to assign like so:

df_geo_in[['LAT', 'LON']] = df_geo_in.apply(locate)

What you tried to do was assign the result of applymap to 2 new columns which is incorrect here as applymap is designed to work on every element in a df so unless the lhs has the same expected shape this won't give the desired result.

Your latter method is also incorrect because you drop the duplicate countries and then expect this to assign every country geolocation back but the shape is different.

It is probably quicker for large df's to create the geolocation non-duplicated df's and then merge this back to your larger df like so:

geo_lookup = df_addr.drop_duplicates(['COUNTRY'])
geo_lookup[['LAT','LNG']] = geo_lookup['COUNTRY'].apply(locate)
df_geo_in.merge(geo_lookup, left_on='COUNTRY', right_on='COUNTRY', how='left')

this will create a df with non duplicated countries with geo location addresses and then we perform a left merge back to the master df.

114

answered Sep 21 '22 11:09

EdChum

Related questions
                            
                                django rest framework add field when not in list view
                            
                                python PIL image how to save image to a buffer so can be used later?
                            
                                Python Excel template read and re-write, maintaining formulae and formatting
                            
                                Singular matrix - python
                            
                                Simple way to get current memory usage from Guppy
                            
                                Inverse of a matrix 3x3 using symbols
                            
                                How can I manually compile Cython code that uses C++?
                            
                                Custom Deployment to Azure Websites
                            
                                Proper overloading of json encoding and decoding with Flask
                            
                                Equivalent for ? in Java for Python? [duplicate]
                            
                                Python/Pip C package PyProj fails to compile with GCC
                            
                                unable to add spark to PYTHONPATH
                            
                                How to create a user 'programmatically' with Flask-user extension?
                            
                                check Python requests with charles proxy for HTTPS
                            
                                How to quickly determine if a matrix is a permutation matrix
                            
                                How to use SQLAlchemy contextmanager and still get row ID?
                            
                                Scrapy Spider: Restart spider when finishes
                            
                                splitting data into test and train, making a logistic regression model in pandas
                            
                                Proper way to convert bytea from Postgres back to a string in python
                            
                                Payment method token is invalid in Braintree

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With