I use
"""Data taken from https://datos.gob.mx/busca/organization/conapo and
https://es.wikipedia.org/wiki/Anexo:Entidades_federativas_de_M%C3%A9xico_por_superficie,_poblaci%C3%B3n_y_densidad """
total_population_segmentation = pd.read_html('professional_segmentation_mexico.html')
population_segmentation = pd.read_html('population_segmentation.html')
followed by
total_population_segmentation = population_segmentation[2]
total_population_segmentation = total_population_segmentation['Población histórica de México']
total_population_segmentation = total_population_segmentation.drop('Pos',axis=1)
total_population_segmentation = total_population_segmentation.sort_values('Entidad').reset_index().drop('index',axis=1)
Therefore, I am working with the following DataFrame
total_population_segmentation.head(5)
I used total_population_segmentation.dtypes
and I got
Entidad object
2010 object
2015 object
2020 object
2025 object
2030 object
dtype: object
I used pd.to_numeric(total_population_segmentation['2010'])
to check if it works but I got
ValueError Traceback (most recent call last)
pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "1 195 787"
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-202-28db64f185e1> in <module>()
----> 1 pd.to_numeric(total_population_segmentation['2010'])
~\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
148 try:
149 values = lib.maybe_convert_numeric(
--> 150 values, set(), coerce_numeric=coerce_numeric
151 )
152 except (ValueError, TypeError):
pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "1 195 787" at position 0
When I look at each one of the values I obtain data that is decoded differently
In [1]: total_population_segmentation['2010'][4]
Out[1]: '4\xa0933\xa0755'
How can I convert this type of data to float?
When you read the csv, add a parameter thousands=' '
like so:
total_population_segmentation = pd.read_csv('your_csv.csv', thousands=' ')
Then try again:
pd.to_numeric(total_population_segmentation['2010'])
As per your updated question, assuming that you have all columns other than first as numeric, try this:
for x in total_population_segmentation.columns[1:]:
total_population_segmentation[x] = total_population_segmentation[x].map(lambda x: float(x.replace(' ','')))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With