Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine type of nested data structures in Python?

Tags:

python

types

I am currently translating some Python to F#, specifically neural-networks-and-deep-learning .

To make sure the data structures are correctly translated the details of the nested types from Python are needed. The type() function is working for simple types but not for nested types.

For example in Python:

> data = ([[1,2,3],[4,5,6],[7,8,9]],["a","b","c"])
> type(data)
<type 'tuple'>

only gives the type of the first level. Nothing is known about the arrays in the tuple.

I was hoping for something like what F# does

> let data = ([|[|1;2;3|];[|4;5;6|];[|7;8;9|]|],[|"a";"b";"c"|]);;

val data : int [] [] * string [] =
  ([|[|1; 2; 3|]; [|4; 5; 6|]; [|7; 8; 9|]|], [|"a"; "b"; "c"|])

returning the signature independent of the value

int [] [] * string []

*         is a tuple item separator  
int [] [] is a two dimensional jagged array of int  
string [] is a one dimensional array of string

Can or how is this done in Python?

TLDR;

Currently I am using PyCharm with the debugger and in the variables window clicking the view option for an individual variable to see the details. The problem is that the output contains the values along with the types intermixed and I only need the type signature. When the variables are like (float[50000][784], int[50000]) the values get in the way. Yes I am resizing the variables for now, but that is a workaround and not a solution.

e.g.

Using PyCharm Community

(array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ...,     
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32),
  array([7, 2, 1, ..., 4, 5, 6]))

Using Spyder

Using Spyder variable viewer

Using Visual Studio Community with Python Tools for Visual Studio

(array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],    
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        ...,   
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32),  
  array([5, 0, 4, ..., 8, 4, 8], dtype=int64)) 

EDIT:

Since this question has been stared someone is apparently looking for more details, here is my modified version which can also handle numpy ndarray. Thanks to Vlad for the initial version.

Also because of the use of a variation of Run Length Encoding there is no more use of ? for heterogeneous types.

# Note: Typing for elements of iterable types such as Set, List, or Dict 
# use a variation of Run Length Encoding.

def type_spec_iterable(iterable, name):
    def iterable_info(iterable):
        # With an iterable for it to be comparable 
        # the identity must contain the name and length 
        # and for the elements the type, order and count.
        length = 0
        types_list = []
        pervious_identity_type = None
        pervious_identity_type_count = 0
        first_item_done = False
        for e in iterable:
            item_type = type_spec(e)
            if (item_type != pervious_identity_type):
                if not first_item_done:
                    first_item_done = True
                else:
                    types_list.append((pervious_identity_type, pervious_identity_type_count))
                pervious_identity_type = item_type
                pervious_identity_type_count = 1
            else:
                pervious_identity_type_count += 1
            length += 1
        types_list.append((pervious_identity_type, pervious_identity_type_count))
        return (length, types_list)
    (length, identity_list) = iterable_info(iterable)
    element_types = ""
    for (identity_item_type, identity_item_count) in identity_list:
        if element_types == "":
            pass
        else:
            element_types += ","
        element_types += identity_item_type
        if (identity_item_count != length) and (identity_item_count != 1):
            element_types += "[" + `identity_item_count` + "]"
    result = name + "[" + `length` + "]<" + element_types + ">"
    return result

def type_spec_dict(dict, name):
    def dict_info(dict):
        # With a dict for it to be comparable 
        # the identity must contain the name and length 
        # and for the key and value combinations the type, order and count.
        length = 0
        types_list = []
        pervious_identity_type = None
        pervious_identity_type_count = 0
        first_item_done = False
        for (k, v) in dict.iteritems():
            key_type = type_spec(k)
            value_type = type_spec(v)
            item_type = (key_type, value_type)
            if (item_type != pervious_identity_type):
                if not first_item_done:
                    first_item_done = True
                else:
                    types_list.append((pervious_identity_type, pervious_identity_type_count))
                pervious_identity_type = item_type
                pervious_identity_type_count = 1
            else:
                pervious_identity_type_count += 1
            length += 1
        types_list.append((pervious_identity_type, pervious_identity_type_count))
        return (length, types_list)
    (length, identity_list) = dict_info(dict)
    element_types = ""
    for ((identity_key_type,identity_value_type), identity_item_count) in identity_list:
        if element_types == "":
            pass
        else:
            element_types += ","
        identity_item_type = "(" + identity_key_type + "," + identity_value_type + ")"
        element_types += identity_item_type
        if (identity_item_count != length) and (identity_item_count != 1):
            element_types += "[" + `identity_item_count` + "]"
    result = name + "[" + `length` + "]<" + element_types + ">"
    return result

def type_spec_tuple(tuple, name):
    return name + "<" + ", ".join(type_spec(e) for e in tuple) + ">"

def type_spec(obj):
    object_type = type(obj)
    name = object_type.__name__
    if (object_type is int) or (object_type is long) or (object_type is str) or (object_type is bool) or (object_type is float):            
        result = name
    elif object_type is type(None):
        result = "(none)"
    elif (object_type is list) or (object_type is set):
        result = type_spec_iterable(obj, name)
    elif (object_type is dict):
        result = type_spec_dict(obj, name)
    elif (object_type is tuple):
        result = type_spec_tuple(obj, name)
    else:
        if name == 'ndarray':
            ndarray = obj
            ndarray_shape = "[" + `ndarray.shape`.replace("L","").replace(" ","").replace("(","").replace(")","") + "]"
            ndarray_data_type = `ndarray.dtype`.split("'")[1]
            result = name + ndarray_shape + "<" + ndarray_data_type + ">"
        else:
            result = "Unknown type: " , name
    return result

I would not consider it done, but it has worked on everything I needed thus far.

like image 561
Guy Coder Avatar asked Jan 06 '16 16:01

Guy Coder


1 Answers

As I commented, this is impossible in Python, because lists are untyped.

You can still pretend to do it:

def typ(something, depth=0):
    if depth > 63:
        return "..."
    if type(something) == tuple:
        return "<class 'tuple': <" + ", ".join(typ(ding, depth+1) for ding in something) + ">>"
    elif type(something) == list:
        return "<class 'list': " + (typ(something[0], depth+1) if something else '(empty)') + ">"
    else:
        return str(type(something))

That returns the string <class 'tuple': <<class 'list': <class 'list': <class 'int'>>>,<class 'list': <class 'str'>>>> for your example.

edit: To make it look more like F# you could do this instead:

def typ(something, depth=0):
    if depth > 63:
        return "..."
    if type(something) == tuple:
        return " * ".join(typ(ding, depth+1) for ding in something)
    elif type(something) == list:
        return (typ(something[0]) if something else 'empty') + " []"
    else:
        return str(type(something, depth+1)).split("'")[1]

which will return int [] [] * str [] in your example.

like image 51
L3viathan Avatar answered Sep 28 '22 22:09

L3viathan