Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read a fixed width format text file in pandas?

I just got my hands on pandas and am figuring out how I can read a file. The file is from the WRDS database and is the SP500 constituents list all the way back to the 1960s. I checked the file and no matter what I do to import it using read_csv, I still can't display the data correctly.

df = read_csv('sp500-sb.txt')

df

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1231 entries, 0 to 1230
Data columns: gvkeyx      from      thru     conm
                                        gvkey      co_conm
...(the column names)
dtypes: object(1)

What does the above chunk of output mean? Anything would be helpful.

like image 763
user1234440 Avatar asked Mar 15 '12 14:03

user1234440


People also ask

How do you read a fixed length text file in Python?

To efficiently parse fixed width files with Python, we can use the Pandas' read_fwf method. to define the col_specification list with the column specifications for filename. txt. Then we call read.

How do you read a fixed width file?

Method 1: Using read.fwf function from utils package. We have to use column widths for reading. Syntax: read. fwf(file, widths, header = FALSE, sep = “\t”, skip = 0, row.

How do I read a .TXT file in Pandas?

We will read the text file with pandas using the read_csv() function. Along with the text file, we also pass separator as a single space (' ') for the space character because, for text files, the space character will separate each field. There are three parameters we can pass to the read_csv() function.

What does a fixed width file look like?

A fixed width file is similar to a csv file, but rather than using a delimiter, each field has a set number of characters. This creates files with all the data tidily lined up with an appearance similar to a spreadsheet when opened in a text editor.


2 Answers

pandas.read_fwf() was added in pandas 0.7.3 (April 2012) to handle fixed-width files.

  1. API reference

  2. An example from other question

like image 92
WoodChopper Avatar answered Sep 18 '22 19:09

WoodChopper


Wes answered me in an email. Cheers.

This is a fixed-width-format file (not delimited by commas or tabs as usual). I realize that pandas does not have a fixed-width reader like R does, though one can be fashioned very easily. I'll see what I can do. In the meantime if you can export the data in another format (like csv--truly comma separated) you'll be able to read it with read_csv. I suspect with some unix magic you can transform a FWF file into a CSV file.

I recommend following the issue on github as your e-mail is about to disappear from my inbox :)

https://github.com/pydata/pandas/issues/920

best, Wes

like image 41
user1234440 Avatar answered Sep 17 '22 19:09

user1234440