Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a large text file in Python?

I am using Enthought Canopy (a set of many different Python Library packages e.g. NumPy, Pandas,etc) for data analysis. I am trying to read a text file and create a dataframe out of it. The text file has 1180598 rows and 18 columns. All columns have numbers in them. I wrote following code for reading and naming data columns:

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt

import pandas as pd

print 'Pandas Version ' + pd.__version__
Pandas Version 0.12.0

location=r'C:\UMAIR\Directed Studies\US-101 Data\Main Data\US-101-Main-Data\vehicle-trajectory-data\0750am-0805am\tra.txt'

df=read_csv(location, names=['Vehicle ID','Frame ID','Total Frames','Global Time','Local X','Local Y','Global X','Global Y','Vehicle Length','Vehicle Width','Vehicle Class','Vehicle Velocity','Vehicle Acceleration','Lane Identification','Preceding Vehicle','Following Vehicle','Spacing','Headway'])

df
Out[41]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1180598 entries, 0 to 1180597
Data columns (total 18 columns):
Vehicle ID              1180598  non-null values
Frame ID                0  non-null values
Total Frames            0  non-null values
Global Time             0  non-null values
Local X                 0  non-null values
Local Y                 0  non-null values
Global X                0  non-null values
Global Y                0  non-null values
Vehicle Length          0  non-null values
Vehicle Width           0  non-null values
Vehicle Class           0  non-null values
Vehicle Velocity        0  non-null values
Vehicle Acceleration    0  non-null values
Lane Identification     0  non-null values
Preceding Vehicle       0  non-null values
Following Vehicle       0  non-null values
Spacing                 0  non-null values
Headway                 0  non-null values
dtypes: float64(17), object(1) 

As you can see from Out[41], the file was read to have 1 column only. What should I do to let Python know that my file has 18 columns so that it is read the way it is meant to be?

like image 526
beginagain Avatar asked Sep 03 '13 22:09

beginagain


1 Answers

This will import your dataset correctly:

df = pd.read_csv(location, names=names, header=None, delim_whitespace=True)
like image 159
elyase Avatar answered Sep 21 '22 21:09

elyase