Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas.read_csv() with special characters (accents) in column names �

Tags:

I have a csv file that contains some data with columns names:

  • "PERIODE"
  • "IAS_brut"
  • "IAS_lissé"
  • "Incidence_Sentinelles"

I have a problem with the third one "IAS_lissé" which is misinterpreted by pd.read_csv() method and returned as �.

What is that character?

Because it's generating a bug in my flask application, is there a way to read that column in an other way without modifying the file?

In [1]: import pandas as pd  In [2]: pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";").columns  Out[2]: Index([u'PERIODE', u'IAS_brut', u'IAS_liss�', u'Incidence_Sentinelles'], dtype='object') 
like image 1000
farhawa Avatar asked Sep 22 '16 23:09

farhawa


People also ask

What does parse_dates in pandas do?

If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.

What output type does pandas read_csv () return?

Read a CSV File In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument. This string can be any valid path, including URLs.

What is Index_col in read_csv?

index_col: This is to allow you to set which columns to be used as the index of the dataframe. The default value is None, and pandas will add a new column start from 0 to specify the index column. It can be set as a column name or column index, which will be used as the index column.

What is delimiter in pandas read_csv?

read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, ....) It reads the content of a csv file at given path, then loads the content to a Dataframe and returns that. It uses comma (,) as default delimiter or separator while parsing a file.

How to read specific columns of a CSV file using PANDAS?

Let us see how to read specific columns of a CSV file using Pandas. This can be done with the help of the pandas.read_csv () method. We will pass the first parameter as the CSV file and the second parameter the list of specific columns in the keyword usecols. It will return the data of the CSV file of specific columns. Attention geek!

What is the use of read_CSV in pandas?

The pandas.read_csv is used to load a CSV file as a pandas dataframe. In this article, you will learn the different features of the read_csv function of pandas apart from loading the CSV file and the parameters which can be customized to get better output from the read_csv function.

Why is my CSV file not loading in pandas?

Loading CSV without column headers in pandas There is a chance that the CSV file you load doesn’t have any column header. The pandas will make the first row as a column header in the default case. # Read the csv file df = pd.read_csv("data3.csv") df.head()

How to give prefixes to the numbered column headers in pandas read_CSV?

You can also give prefixes to the numbered column headers using the prefix parameter of pandas read_csv function. # Read the csv file with header=None and prefix=column_ df = pd.read_csv("data3.csv", header=None, prefix='column_') df.head() Set any column (s) as Index


1 Answers

I found the same problem with spanish, solved it with with "latin1" encoding:

import pandas as pd   pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";", encoding='latin1') 

Hope it helps!

like image 167
Francisco del Valle Bas Avatar answered Sep 20 '22 04:09

Francisco del Valle Bas