Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read_table error

I am trying to read a tab delimited text file into a dataframe.

This is the how the file looks in Excel:

CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER  TRANSACTION_TYPE    CUSTOMER_NUMBER   CUSTOMER_NAME
5/13/2016 0:00    13867666       6892372              S                 2026            CUSTOMER 1

Import into a df:

df = p.read_table("E:/FileLoc/ThisIsAFile.txt", encoding = "iso-8859-1")

Now it doesn't see the first 3 columns as part of the column index (df[0] = Transaction Type) and all of the headers shift over to reflect this.

                                CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER
5/13/2016 0:00 13867666 6892372       S             2026          CUSTOMER 1

I am trying to manipulate the text file and then import it to a mysql database as an end result.

like image 987
ctd25 Avatar asked May 25 '16 19:05

ctd25


1 Answers

You can use read_csv with separator 2 and more whitespaces:

import pandas as pd
import io

temp=u"""CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER  TRANSACTION_TYPE    CUSTOMER_NUMBER   CUSTOMER_NAME
5/13/2016 0:00    13867666       6892372              S                 2026            CUSTOMER 1"""
#after testing replace io.StringIO(temp) to filename
df =pd.read_csv(io.StringIO(temp), sep=r'\s{2,}', engine='python', encoding = "iso-8859-1")
print (df)
    CALENDAR_DATE  ORDER_NUMBER  INVOICE_NUMBER TRANSACTION_TYPE  \
0  5/13/2016 0:00      13867666         6892372                S   

   CUSTOMER_NUMBER CUSTOMER_NAME  
0             2026    CUSTOMER 1  

If separator is tabulator, use sep='\t'.

EDIT:

I test it with your data and it works:

import pandas as pd

df = pd.read_csv('test/AnonymizedData.txt', sep='\t')
print (df)

   CUSTOMER_NUMBER CUSTOMER_NAME  CUSTOMER_BRANCH_CODE CUSTOMER_BRANCH_NAME  \
0             2026    CUSTOMER 1                    83       SALES BRANCH 1   
1             2359    CUSTOMER 2                    76       SALES BRANCH 2   
2           100662    CUSTOMER 3                    28       SALES BRANCH 3   
3             3245    CUSTOMER 4                    84       SALES BRANCH 4   
4             3179    CUSTOMER 5                    28       SALES BRANCH 5   
5            39881    CUSTOMER 6                    67       SALES BRANCH 6   
6            37020    CUSTOMER 7                    58       SALES BRANCH 7   
7             1239    CUSTOMER 8                    50       SALES BRANCH 8   
8             2379    CUSTOMER 9                    76       SALES BRANCH 9   

  CUSTOMER_CITY CUSTOMER_STATE     ...      PRICING_PRODUCT_TYPE_CODE  \
0        TOWN 1             CO     ...                             11   
1        TOWN 2             OH     ...                             11   
2        TOWN 3             ME     ...                             11   
3        TOWN 4             IL     ...                             11   
4        TOWN 5             NH     ...                             11   
5        TOWN 6             TX     ...                             11   
6        TOWN 7             NC     ...                             11   
7        TOWN 8             NY     ...                             11   
8        TOWN 9             OH     ...                             11   

  PRICING_PRODUCT_TYPE  ORGANIZATION_ID ORGANIZATION_NAME  PRODUCT_LINE_CODE  \
0          DISPOSABLES               83  ORGANIZATIONNAME                891   
1          DISPOSABLES               83  ORGANIZATIONNAME                891   
2          DISPOSABLES               83  ORGANIZATIONNAME                891   
3          DISPOSABLES               83  ORGANIZATIONNAME                891   
4          DISPOSABLES               83  ORGANIZATIONNAME                891   
5          DISPOSABLES               83  ORGANIZATIONNAME                891   
6          DISPOSABLES               83  ORGANIZATIONNAME                891   
7          DISPOSABLES               83  ORGANIZATIONNAME                891   
8          DISPOSABLES               83  ORGANIZATIONNAME                891   

  PRODUCT_LINE  ROBOTIC_FLAG  Unnamed: 52  Unnamed: 53  Unnamed: 54  
0  PRODUCTNAME             N            N          NaN            3  
1  PRODUCTNAME             N            N          NaN            3  
2  PRODUCTNAME             N            N          NaN            2  
3  PRODUCTNAME             N            N          NaN            7  
4  PRODUCTNAME             N            N          NaN            1  
5  PRODUCTNAME             N            N          NaN            4  
6  PRODUCTNAME             N            N          NaN            3  
7  PRODUCTNAME             N            N          NaN            5  
8  PRODUCTNAME             N            N          NaN            3  

[9 rows x 55 columns]
like image 131
jezrael Avatar answered Sep 29 '22 09:09

jezrael