Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas not reading first column from csv file

Tags:

python

pandas

csv

I have a simple 2 column csv file called st1.csv:

GRID    St1   1457    614   1458    657   1459    679   1460    732   1461    754   1462    811   1463    748   

However, when I try to read the csv file, the first column is not loaded:

a = pandas.DataFrame.from_csv('st1.csv')   a.columns 

outputs:

 Index([u'ST1'], dtype=object) 

Why is the first column not being read?

like image 323
user308827 Avatar asked Feb 20 '14 08:02

user308827


People also ask

How do I select a specific column in a CSV file in Python?

Use pandas. read_csv() to read a specific column from a CSV file. To read a CSV file, call pd. read_csv(file_name, usecols=cols_list) with file_name as the name of the CSV file, delimiter as the delimiter, and cols_list as the list of specific columns to read from the CSV file.

How do I read the first N line of a CSV file in pandas?

Method 4: Pandas To read the first n lines of a file, you can use the pandas call pd. read_csv(filename, nrows=n) .


2 Answers

Judging by your data it looks like the delimiter you're using is a .

Try the following:

a = pandas.DataFrame.from_csv('st1.csv', sep=' ') 

The other issue is that it's assuming your first column is an index, which we can also disable:

a = pandas.DataFrame.from_csv('st1.csv', index_col=None) 

UPDATE:

In newer pandas versions, do:

a = pandas.DataFrame.from_csv('st1.csv', index_col=False) 
like image 65
Ewan Avatar answered Oct 03 '22 20:10

Ewan


For newer versions of pandas, pd.DataFrame.from_csv doesn't exist anymore, and index_col=None no longer does the trick with pd.read_csv. You'll want to use pd.read_csv with index_col=False instead:

pd.read_csv('st1.csv', index_col=False) 

Example:

(so) URSA-MattM-MacBook:stackoverflow mmessersmith$ cat input.csv  Date                        Employee        Operation        Order  2001-01-01 08:32:17         User1           Approved         #00045 2001-01-01 08:36:23         User1           Edited           #00045 2001-01-01 08:41:04         User1           Rejected         #00046 2001-01-01 08:42:56         User1           Deleted          #00046 2001-01-02 09:01:11         User1           Created          #00047 2019-10-03 17:23:45         User1           Approved         #72681  (so) URSA-MattM-MacBook:stackoverflow mmessersmith$ python Python 3.7.4 (default, Aug 13 2019, 15:17:50)  [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> pd.__version__ '0.25.1'               >>> df_bad_index = pd.read_csv('input.csv', delim_whitespace=True) >>> df_bad_index                 Date Employee Operation   Order 2001-01-01  08:32:17    User1  Approved  #00045 2001-01-01  08:36:23    User1    Edited  #00045 2001-01-01  08:41:04    User1  Rejected  #00046 2001-01-01  08:42:56    User1   Deleted  #00046 2001-01-02  09:01:11    User1   Created  #00047 2019-10-03  17:23:45    User1  Approved  #72681 >>> df_bad_index.index Index(['2001-01-01', '2001-01-01', '2001-01-01', '2001-01-01', '2001-01-02',        '2019-10-03'],       dtype='object') >>> df_still_bad_index = pd.read_csv('input.csv', delim_whitespace=True, index_col=None) >>> df_still_bad_index                 Date Employee Operation   Order 2001-01-01  08:32:17    User1  Approved  #00045 2001-01-01  08:36:23    User1    Edited  #00045 2001-01-01  08:41:04    User1  Rejected  #00046 2001-01-01  08:42:56    User1   Deleted  #00046 2001-01-02  09:01:11    User1   Created  #00047 2019-10-03  17:23:45    User1  Approved  #72681 >>> df_still_bad_index.index Index(['2001-01-01', '2001-01-01', '2001-01-01', '2001-01-01', '2001-01-02',        '2019-10-03'],       dtype='object') >>> df_good_index = pd.read_csv('input.csv', delim_whitespace=True, index_col=False) >>> df_good_index          Date  Employee Operation     Order 0  2001-01-01  08:32:17     User1  Approved 1  2001-01-01  08:36:23     User1    Edited 2  2001-01-01  08:41:04     User1  Rejected 3  2001-01-01  08:42:56     User1   Deleted 4  2001-01-02  09:01:11     User1   Created 5  2019-10-03  17:23:45     User1  Approved >>> df_good_index.index RangeIndex(start=0, stop=6, step=1) 
like image 40
Matt Messersmith Avatar answered Oct 03 '22 22:10

Matt Messersmith