I have a single <code>.csv</code> file containing multiple tables. Using Pandas, what would be the best strategy to get two DataFrame <code>inventory</code> and <code>HPBladeSystemRack</code> from this one file ? The input <code>.csv</code> looks like this: <pre class="prettyprint"><code>Inventory System Name IP Address System Status dg-enc05 Normal dg-enc05_vc_domain Unknown dg-enc05-oa1 172.20.0.213 Normal HP BladeSystem Rack System Name Rack Name Enclosure Name dg-enc05 BU40 dg-enc05-oa1 BU40 dg-enc05 dg-enc05-oa2 BU40 dg-enc05 </code></pre> The best I've come up with so far is to convert this <code>.csv</code> file into Excel workbook (<code>xlxs</code>), split the tables into sheets and use: <pre class="prettyprint"><code>inventory = read_excel('path_to_file.csv', 'sheet1', skiprow=1) HPBladeSystemRack = read_excel('path_to_file.csv', 'sheet2', skiprow=2) </code></pre> However: <ul> <li>This approach requires <code>xlrd</code> module.</li> <li>Those log files have to be analyzed in real time, so that it would be way better to find a way to analyze them as they come from the logs.</li> <li>The real logs have far more tables than those two.</li> </ul>

If you know the table names beforehand, then something like this: <pre class="prettyprint"><code>df = pd.read_csv("jahmyst2.csv", header=None, names=range(3)) table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"] groups = df[0].isin(table_names).cumsum() tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)} </code></pre> should work to produce a dictionary with keys as the table names and values as the subtables. <pre class="prettyprint"><code>>>> list(tables) ['HP BladeSystem Rack', 'Inventory'] >>> for k,v in tables.items(): ... print("table:", k) ... print(v) ... print() ... table: HP BladeSystem Rack 0 1 2 6 System Name Rack Name Enclosure Name 7 dg-enc05 BU40 NaN 8 dg-enc05-oa1 BU40 dg-enc05 9 dg-enc05-oa2 BU40 dg-enc05 table: Inventory 0 1 2 1 System Name IP Address System Status 2 dg-enc05 NaN Normal 3 dg-enc05_vc_domain NaN Unknown 4 dg-enc05-oa1 172.20.0.213 Normal </code></pre> Once you've got that, you can set the column names to the first rows, etc.

Python Pandas - Read csv file containing multiple tables

Tags:

python

pandas

csv

excel

python-2.7

I have a single .csv file containing multiple tables.

Using Pandas, what would be the best strategy to get two DataFrame inventory and HPBladeSystemRack from this one file ?

The input .csv looks like this:

Inventory       
System Name            IP Address    System Status
dg-enc05                             Normal
dg-enc05_vc_domain                   Unknown
dg-enc05-oa1           172.20.0.213  Normal

HP BladeSystem Rack         
System Name               Rack Name   Enclosure Name
dg-enc05                  BU40  
dg-enc05-oa1              BU40        dg-enc05
dg-enc05-oa2              BU40        dg-enc05

The best I've come up with so far is to convert this .csv file into Excel workbook (xlxs), split the tables into sheets and use:

inventory = read_excel('path_to_file.csv', 'sheet1', skiprow=1)
HPBladeSystemRack = read_excel('path_to_file.csv', 'sheet2', skiprow=2)

However:

This approach requires xlrd module.
Those log files have to be analyzed in real time, so that it would be way better to find a way to analyze them as they come from the logs.
The real logs have far more tables than those two.

680

asked Dec 09 '15 17:12

JahMyst

2 Answers

Pandas doesn't seem to be ready to do this easily, so I ended up doing my own split_csv function. It only requires table names and will output .csv files named after each table.

import csv
from os.path import dirname # gets parent folder in a path
from os.path import join # concatenate paths

table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"]

def split_csv(csv_path, table_names):
    tables_infos = detect_tables_from_csv(csv_path, table_names)
    for table_info in tables_infos:
        split_csv_by_indexes(csv_path, table_info)

def split_csv_by_indexes(csv_path, table_info):
    title, start_index, end_index = table_info
    print title, start_index, end_index
    dir_ = dirname(csv_path)
    output_path = join(dir_, title) + ".csv"
    with open(output_path, 'w') as output_file, open(csv_path, 'rb') as input_file:
        writer = csv.writer(output_file)
        reader = csv.reader(input_file)
        for i, line in enumerate(reader):
            if i < start_index:
                continue
            if i > end_index:
                break
            writer.writerow(line)

def detect_tables_from_csv(csv_path, table_names):
    output = []
    with open(csv_path, 'rb') as csv_file:
        reader = csv.reader(csv_file)
        for idx, row in enumerate(reader):
            for col in row:
                match = [title for title in table_names if title in col]
                if match:
                    match = match[0] # get the first matching element
                    try:
                        end_index = idx - 1
                        start_index
                    except NameError:
                        start_index = 0
                    else:
                        output.append((previous_match, start_index, end_index))
                    print "Found new table", col
                    start_index = idx
                    previous_match = match
                    match = False

        end_index = idx  # last 'end_index' set to EOF
        output.append((previous_match, start_index, end_index))
        return output


if __name__ == '__main__':
    csv_path = 'switch_records.csv'
    try:
        split_csv(csv_path, table_names)
    except IOError as e:
        print "This file doesn't exist. Aborting."
        print e
        exit(1)

146

answered Sep 27 '22 20:09

JahMyst

If you know the table names beforehand, then something like this:

df = pd.read_csv("jahmyst2.csv", header=None, names=range(3))
table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"]
groups = df[0].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}

should work to produce a dictionary with keys as the table names and values as the subtables.

>>> list(tables)
['HP BladeSystem Rack', 'Inventory']
>>> for k,v in tables.items():
...     print("table:", k)
...     print(v)
...     print()
...     
table: HP BladeSystem Rack
              0          1               2
6   System Name  Rack Name  Enclosure Name
7      dg-enc05       BU40             NaN
8  dg-enc05-oa1       BU40        dg-enc05
9  dg-enc05-oa2       BU40        dg-enc05

table: Inventory
                    0             1              2
1         System Name    IP Address  System Status
2            dg-enc05           NaN         Normal
3  dg-enc05_vc_domain           NaN        Unknown
4        dg-enc05-oa1  172.20.0.213         Normal

Once you've got that, you can set the column names to the first rows, etc.

answered Sep 27 '22 18:09

DSM

Related questions
                            
                                Python for loop decrementing index
                            
                                Count occurrences of item in JSON element
                            
                                Convert Content-Type header into file extension
                            
                                Python fails to open 11gb csv in r+ mode but opens in r mode
                            
                                PEP 8 and list comprehension
                            
                                How can I export an instance and all its related objects in Django?
                            
                                Pass Python list to embedded Rust function
                            
                                Alternative to `any` that returns the last evaluated object?
                            
                                Selenium pdf automatic download not working
                            
                                Flask SqlAlchemy join two models without foreign key MYSQL
                            
                                How to estimate density function and calculate its peaks?
                            
                                Python watchdog windows wait till copy finishes
                            
                                Get the length of reversed list
                            
                                Assign line colors in pandas
                            
                                How can I check if one two-dimensional NumPy array contains a specific pattern of values inside it?
                            
                                How to serialize custom user model in DRF
                            
                                Get cell color from .xlsx
                            
                                Dot product with dictionaries
                            
                                Addition of list and NumPy number
                            
                                How to use nltk regex pattern to extract a specific phrase chunk?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas - Read csv file containing multiple tables

Tags:

python

pandas

csv

excel

python-2.7

JahMyst

People also ask

2 Answers

JahMyst

DSM

Recent Activity

Donate For Us