Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading tab-delimited data without header in pandas

I'm having trouble using pandas to open tab-delimited data without headers.

My test data (actually contains 200 lines, of which I am showing the first 10):

Tag19184    CTAAC   hffef   1   a   36  -   chr1    10006   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10012   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10018   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10024   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10030   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10036   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10042   0   36M 36
Tag20198    CTAAC   hffef   1   a   36  -   chr1    10048   0   36M 36
Tag20198    CTAAC   hffef   1   a   36  -   chr1    10054   0   36M 36
Tag45093    CTAAC   hffef   1   a   36  -   chr1    10060   0   36M 36

My code:

import pandas as pd
df = pd.read_csv('in_test.txt',sep='\t',header=None)
print df

However, I get the following output, which I don't think I can use to further process data (?):

<class 'pandas.core.frame.DataFrame'>
Int64Index: 200 entries, 0 to 199
Data columns:
X.1     200  non-null values
X.2     200  non-null values
X.3     200  non-null values
X.4     200  non-null values
X.5     200  non-null values
X.6     200  non-null values
X.7     200  non-null values
X.8     200  non-null values
X.9     200  non-null values
X.10    200  non-null values
X.11    200  non-null values
X.12    200  non-null values
dtypes: int64(5), object(7)

The tutorial here suggests that print df should just give me the corresponding data frame. What am I doing wrong?

like image 945
biohazard Avatar asked Jul 05 '14 01:07

biohazard


1 Answers

I think you are getting the it read correctly, but:

  1. See: change pandas 0.13.0 "print dataframe" to print dataframe like in earlier versions, this is what pandas do in the older versions. So, update will solve it.
  2. You can use ipython notebook, where DataFrames will show up as HTML tables.
  3. You can use df.head(5) (similar to r's head) to get the first a few rows just to make sure your DataFrame is correct.
like image 134
CT Zhu Avatar answered Nov 11 '22 01:11

CT Zhu