Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read_table use first column as index

Tags:

python

pandas

I have a little bit of a problem here. I have a txt file containing lines of the form (let's say for line 1):

id1-a1-b1-c1

I want to load it in a data frame using pandas with the index being the id's and the columns name being 'A', 'B', 'C' and the values the corresponding ai, bi, ci

at the end I want the dataframe to look like:

    'A'   'B'  'C'
id1  a1    b1   c1
id2  a2    b2   c2
...   ...   ...  ...

I may want to read by chunks in the file is large but let's assume I read at once:

with open('file.txt') as f:
    table = pd.read_table(f, sep='-', index_col=0, header=None,   lineterminator='\n')

and rename the columns

table.columns = ['A','B','C']

my current output is something like:

    'A'   'B'  'C'
0
id1  a1    b1   c1
id2  a2    b2   c2
...   ...   ...  ...

there is an extra row that I can't explain

Thanks

EDIT

when I try to add the field

chunksize=20

and after doing:

for chunk in table:
    print(chunk)

I get the following error:

pandas.parser.CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.
like image 661
fricadelle Avatar asked Jan 28 '15 19:01

fricadelle


People also ask

How do I make the first column an index?

To set a column as index for a DataFrame, use DataFrame. set_index() function, with the column name passed as argument. You can also setup MultiIndex with multiple columns in the index. In this case, pass the array of column names required for index, to set_index() method.

How do you set a column to index in pandas?

To create an index, from a column, in Pandas dataframe you use the set_index() method. For example, if you want the column “Year” to be index you type <code>df. set_index(“Year”)</code>. Now, the set_index() method will return the modified dataframe as a result.

How do I assign a column to an index?

We can set a specific column or multiple columns as an index in pandas DataFrame. Create a list of column labels to be used to set an index. We need to pass the column or list of column labels as input to the DataFrame. set_index() function to set it as an index of DataFrame.


1 Answers

If you know the column names before the file is read, pass the list using names parameter of read_table:

with open('file.txt') as f:
    table = pd.read_table(f, sep='-', index_col=0, header=None, names=['A','B','C'],
                          lineterminator='\n')

Which outputs:

      A   B   C
id1  a1  b1  c1
id2  a2  b2  c2
like image 143
Bryan Avatar answered Oct 05 '22 20:10

Bryan