Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyYAML and unusual tags

I am working on a project that uses the Unity3D game engine. For some of the pipeline requirements, it is best to be able to update some files from external tools using Python. Unity's meta and anim files are in YAML so I thought this would be strait forward enough using PyYAML.

The problem is that Unity's format uses custom attributes and I am not sure how to work with them as all the examples show more common tags used by Python and Ruby.

Here is what the top lines of a file look like:

%YAML 1.1
%TAG !u! tag:unity3d.com,2011:
--- !u!74 &7400000
AnimationClip:
  m_ObjectHideFlags: 0
  m_PrefabParentObject: {fileID: 0}
  ...

When I try to read the file I get this error:

could not determine a constructor for the tag 'tag:unity3d.com,2011:74'

Now after looking at all the other questions asked, this tag scheme does not seem to resemble those questions and answers. For example this file uses "!u!" which I was unable to figure out what it means or how something similar would behave (my wild uneducated guess says it looks like an alias or namespace).

I can do a hack way and strip the tags out but that is not the ideal way to try to do this. I am looking for help on a solution that will properly handle the tags and allow me to parse & encode the data in a way that preserves the proper format.

Thanks, -R

like image 337
renderbox Avatar asked Jan 31 '14 05:01

renderbox


People also ask

What is PyYAML used for?

PyYAML is a YAML parser and emitter for Python. Using the PyYAML module, we can perform various actions such as reading and writing complex configuration YAML files, serializing and persisting YMAL data. Use it to convert the YAML file into a Python dictionary.

Is PyYAML same as YAML?

YAML is a data serialization format designed for human readability and interaction with scripting languages. PyYAML is a YAML parser and emitter for the Python programming language.

Can PyYAML parse JSON?

It is often used for configuration files, but can also be used for data exchange. The most used python YAML parser is PyYAML, a library that allows you to load, parse, and write YAML, much like Python's JSON library helps you to work with JSON.

What is YAML tag?

YAML (/ˈjæməl/ and YAH-ml) (see § History and name) is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted.


1 Answers

I also had this problem, and the internet was not very helpful. After bashing my head against this problem for 3 days, I was able to sort it out...or at least get a working solution. If anyone wants to add more info, please do. But here's what I got.

1) The documentation on Unity's YAML file format(they call it a "textual scene file" because it contains text that is human readable) - http://docs.unity3d.com/Manual/TextualSceneFormat.html

It is a YAML 1.1 compliant format. So you should be able to use PyYAML or any other Python YAML library to load up a YAML object.

Okay, great. But it doesn't work. Every YAML library has issues with this file.

2) The file is not correctly formed. It turns out, the Unity file has some syntactical issues that make YAML parsers error out on it. Specifically:

2a) At the top, it uses a %TAG directive to create an alias for the string "unity3d.com,2011". It looks like:

%TAG !u! tag:unity3d.com,2011:

What this means is anywhere you see "!u!", replace it with "tag:unity3d.com,2011".

2b) Then it goes on to use "!u!" all over the place before each object stream. But the problem is that - to be YAML 1.1 compliant - it should actually declare a tag alias for each stream (any time a new object starts with "--- "). Declaring it once at the top and never again is only valid for the first stream, and the next stream knows nothing about "!u!", so it errors out.

Also, this tag is useless. It basically appends "tag:unity3d.com,2011" to each entry in the stream. Which we don't care about. We already know it's a Unity YAML file. Why clutter the data?

3) The object types are given by Unity's Class ID. Here is the documentation on that: http://docs.unity3d.com/Manual/ClassIDReference.html

Basically, each stream is defined as a new class of object...corresponding to the IDs in that link. So a "GameObject" is "1", etc. The line looks like this:

--- !u!1 &100000

So the "--- " defines a new stream. The "!u!" is an alias for "tag:unity3d.com,2011" and the "&100000" is the file ID for this object (inside this file, if something references this object, it uses this ID....remember YAML is a node-based representation, so that ID is used to denote a node connection).

The next line is the root of the YAML object, which happens to be the name of the Unity Class...example "GameObject". So it turns out we don't actually need to translate from Class ID to Human Readable node type. It's right there. If you ever need to use it, just take the root node. And if you need to construct a YAML object for Unity, just keep a dictionary around based on that documentation link to translate "GameObject" to "1", etc.

The other problem is that most YAML parsers (PyYAML is the one I tested) only support 3 types of YAML objects out of the box:

  1. Scalar
  2. Sequence
  3. Mapping

You can define/extend custom nodes. But this amounts to hand writing your own YAML parser because you have to define EXPLICITLY how each YAML constructor is created, and outputs. Why would I use a Library like PyYAML, then go ahead and write my own parser to read these custom nodes? The whole point of using a library is to leverage previous work and get all that functionality from day one. I spent 2 days trying to make a new constructor for each class ID in unity. It never worked, and I got into the weeds trying to build the constructors correctly.

THE GOOD NEWS/SOLUTION:

Turns out, all the Unity nodes I've ever run into so far are basic "Mapping" nodes in YAML. So you can throw away the custom node mapping and just let PyYAML auto-detect the node type. From there, everything works great!

In PyYAML, you can pass a file object, or a string. So, my solution was to write a simple 5 line pre-parser to strip out the bits that confuse PyYAML(the bits that Unity incorrectly syntaxed) and feed this new string to PyYAML.

1) Remove line 2 entirely, or just ignore it:

%TAG !u! tag:unity3d.com,2011:

We don't care. We know it's a unity file. And the tag does nothing for us.

2) For each stream declaration, remove the tag alias ("!u!") and remove the class ID. Leave the fileID. Let PyYAML auto-detect the node as a Mapping node.

--- !u!1 &100000

becomes...

--- &100000

3) The rest, output as is.

The code for the pre-parser looks like this:

def removeUnityTagAlias(filepath):
    """
    Name:               removeUnityTagAlias()

    Description:        Loads a file object from a Unity textual scene file, which is in a pseudo YAML style, and strips the
                        parts that are not YAML 1.1 compliant. Then returns a string as a stream, which can be passed to PyYAML.
                        Essentially removes the "!u!" tag directive, class type and the "&" file ID directive. PyYAML seems to handle
                        rest just fine after that.

    Returns:                String (YAML stream as string)  


    """
    result = str()
    sourceFile = open(filepath, 'r')

    for lineNumber,line in enumerate( sourceFile.readlines() ): 
        if line.startswith('--- !u!'):          
            result += '--- ' + line.split(' ')[2] + '\n'   # remove the tag, but keep file ID
        else:
            # Just copy the contents...
            result += line

    sourceFile.close()  

    return result

To create a PyYAML object from a Unity textual scene file, call your pre-parser function on the file:

import yaml

# This fixes Unity's YAML %TAG alias issue.
fileToLoad = '/Users/vlad.dumitrascu/<SOME_PROJECT>/Client/Assets/Gear/MeleeWeapons/SomeAsset_test.prefab'

UnityStreamNoTags = removeUnityTagAlias(fileToLoad)

ListOfNodes = list()

for data in yaml.load_all(UnityStreamNoTags):
    ListOfNodes.append( data )

# Example, print each object's name and type
for node in ListOfNodes:
    if 'm_Name' in node[ node.keys()[0] ]:
        print( 'Name: ' + node[ node.keys()[0] ]['m_Name']  + ' NodeType: ' + node.keys()[0] )
    else:
        print( 'Name: ' + 'No Name Attribute'  + ' NodeType: ' + node.keys()[0] )

Hope that helps!

-Vlad

PS. To Answer the next issue in making this usable:

You also need to walk the entire project directory and parse all ".meta" files for the "GUID", which is Unity's inter-file reference. So, when you see a reference in a Unity YAML file for something like:

m_Materials:
  - {fileID: 2100000, guid: 4b191c3a6f88640689fc5ea3ec5bf3a3, type: 2}

That file is somewhere else. And you can re-cursively open that one to find out any dependencies.

I just ripped through the game project and saved a dictionary of GUID:Filepath Key:Value pairs which I can match against.

like image 164
Vlad Dumitrascu Avatar answered Oct 16 '22 16:10

Vlad Dumitrascu