Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the only way to add a column in PyTables to create a new table and copy?

I am searching for a persistent data storage solution that can handle heterogenous data stored on disk. PyTables seems like an obvious choice, but the only information I can find on how to append new columns is a tutorial example. The tutorial has the user create a new table with added column, copy the old table into the new table, and finally delete the old table. This seems like a huge pain. Is this how it has to be done?

If so, what are better alternatives for storing mixed data on disk that can accommodate new columns with relative ease? I have looked at sqlite3 as well and the column options seem rather limited there, too.

like image 419
Zelazny7 Avatar asked Apr 03 '13 20:04

Zelazny7


People also ask

What is PyTables package?

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. You can download PyTables and use it for free. You can access documentation, some examples of use and presentations here.


2 Answers

Yes, you must create a new table and copy the original data. This is because Tables are a dense format. This gives it a huge performance benefits but one of the costs is that adding new columns is somewhat expensive.

like image 166
Anthony Scopatz Avatar answered Sep 18 '22 23:09

Anthony Scopatz


thanks for Anthony Scopatz's answer.

I search website and in github, I found someone has shown how to add columns in PyTables. Example showing how to add a column in PyTables

orginal version ,Example showing how to add a column in PyTables, but have some difficulty to migrate.

revised version, Isolated the copying logic, while some terms is deprecated, and it has some minor error in adding new columns.

based on their's contribution, I updated the code for adding new column in PyTables. (Python 3.6, windows)

# -*- coding: utf-8 -*-
"""
PyTables, append a column
    """
import tables as tb
pth='d:/download/'

# Describe a water class
class Water(tb.IsDescription):
    waterbody_name   = tb.StringCol(16, pos=1)   # 16-character String
    lati             = tb.Int32Col(pos=2)        # integer
    longi            = tb.Int32Col(pos=3)        # integer
    airpressure      = tb.Float32Col(pos=4)      # float  (single-precision)
    temperature      = tb.Float64Col(pos=5)      # double (double-precision)

# Open a file in "w"rite mode
# if don't include pth, then it will be in the same path as the code.
fileh = tb.open_file(pth+"myadd-column.h5", mode = "w")

# Create a table in the root directory and append data...
tableroot = fileh.create_table(fileh.root, 'root_table', Water,
                               "A table at root", tb.Filters(1))
tableroot.append([("Mediterranean", 10, 0, 10*10, 10**2),
              ("Mediterranean", 11, -1, 11*11, 11**2),
              ("Adriatic", 12, -2, 12*12, 12**2)])
print ("\nContents of the table in root:\n",
       fileh.root.root_table[:])

# Create a new table in newgroup group and append several rows
group = fileh.create_group(fileh.root, "newgroup")
table = fileh.create_table(group, 'orginal_table', Water, "A table", tb.Filters(1))
table.append([("Atlantic", 10, 0, 10*10, 10**2),
              ("Pacific", 11, -1, 11*11, 11**2),
              ("Atlantic", 12, -2, 12*12, 12**2)])
print ("\nContents of the original table in newgroup:\n",
       fileh.root.newgroup.orginal_table[:])
# close the file
fileh.close()

#%% Open it again in append mode
fileh = tb.open_file(pth+"myadd-column.h5", "a")
group = fileh.root.newgroup
table = group.orginal_table

# Isolated the copying logic
def append_column(table, group, name, column):
    """Returns a copy of `table` with an empty `column` appended named `name`."""
    description = table.description._v_colObjects.copy()
    description[name] = column
    copy = tb.Table(group, table.name+"_copy", description)

    # Copy the user attributes
    table.attrs._f_copy(copy)

    # Fill the rows of new table with default values
    for i in range(table.nrows):
        copy.row.append()
    # Flush the rows to disk
    copy.flush()

    # Copy the columns of source table to destination
    for col in descr:
        getattr(copy.cols, col)[:] = getattr(table.cols, col)[:]

    # choose wether remove the original table
#    table.remove()

    return copy

# Get a description of table in dictionary format
descr = table.description._v_colObjects
descr2 = descr.copy()

# Add a column to description
descr2["hot"] = tb.BoolCol(dflt=False)

# append orginal and added data to table2 
table2 = append_column(table, group, "hot", tb.BoolCol(dflt=False))
# Fill the new column
table2.cols.hot[:] = [row["temperature"] > 11**2 for row in table ]
# Move table2 to table, you can use the same name as original one.
table2.move('/newgroup','new_table')

# Print the new table
print ("\nContents of the table with column added:\n",
       fileh.root.newgroup.new_table[:])
# Finally, close the file
fileh.close()
like image 43
Renke Avatar answered Sep 16 '22 23:09

Renke