Optimizing FBX Character Playback in Fabric Engine

May 11, 2017

Need faster playback of moving characters when using Fabric inside of other applications like Maya or modo?  This technical article provides several ways to boost playback speed by up to 4X.  

A customer recently asked us why their Fabric graph was not performing as fast as they expected inside of Autodesk Maya. The graph used an animated joint hierarchy from a Maya scene to drive an FBX character (with the same joint structure) and then drew the result into the viewport. Additionally, there was an animated cache of the face geometry stored in a separate Alembic file that needed to be projected onto the skinned face geometry, masked out by an image file. The initial speed of the graph was 24fps with one character – not quick enough when drawing multiple characters on screen. Let’s see what we did to boost performance up to around 100fps on the same asset.

Reading the Character

The character was contained in an FBX file the held the geometry, joint hierarchy, and skinning data.  It was easily accessed via an FBX extension to Fabric that provides high-level API access along with an ‘FBXHelpers’ extension that exposes FBX Canvas presets for visual programming. FBXHelpers includes a FBXCharacter object that combines skeletons, poses, and geometry sets along with methods to get and set data on the character. In this case, we needed to find all the bones in the FBX file that matched the ones in the Maya bone hierarchy and map the transforms. The names were fed in with a String array and the matrices came in via a Mat44 array that was then set onto the character’s bones for posing. Once this was done, the skinned geometry could be queried and drawn directly into the viewport with InlineDrawing.

The (static) face mesh for the character was in the same FBX file. Fabric wraps the Alembic library, so the graph brought in an Alembic file containing animation for the face (based on identical topology and point count). Calculating the delta between the positions of the static mesh and animated mesh provided the Vec3 values to project the animation onto the skinned face mesh.  We then needed to mask out the deformation using a grayscale image read from disk. The static mesh contained a set of UVs that we used to sample the image and to generate an array of values (mapped to each point) to drive the deformation. The Image extension provided the methods and presets for doing this.

Debugging Tools

We wanted some debug tools to let us visually assess performance gains.  The InlineDrawing extension in Canvas contains presets that allow you to draw attributes and values on screen as geometric representations like points, lines, and axes and arrows. Also included is DrawText that lets you put any string into the viewport.  Super handy!

Profiling Performance with FabricStatistics

To get the performance data, we used FabricStatistics which is used for wrapping method calls (or blocks of code) to print out the evaluation time.  Let’s walk through how to use it.

The easiest way to start working with this tool is to create an Execute Merge node (4 port or 8 port as needed) and plug it into the `exec` port of the main graph. Then, create a new function node and name it `Start Profiling`. Add `FabricStatistics` to the required extensions of this new function node and place the following code in the text field:

 

require FabricStatistics;




dfgEntry {

  StartFabricProfiling();

}

 

Next, you need another function node called `End Profiling`. As with the previous node, you’ll need to add `FabricStatistics` as the required extension. Since displaying the profiling info costs performance, add a boolean input port with the exposed name `mute` so you can toggle the printing on and off.  Here’s the code for the body of that function node:

 

require FabricStatistics;




dfgEntry {

 StopFabricProfiling();

 if(!mute)

   report( GetProfilingReport() );

}

 

Next, plug the Start Profiling node into the first port of the Merge node and connect the End Profiling node into the last port. Once this is set up and the mute port is disabled on the End Profiling node, an empty report from Fabric is logged in the Maya script editor. Now you can add calls to the `AutoProfilingEvent profile()` method within the function nodes that are of interest in the graph. When adding a call, be sure to add it as the first line in the `dfgEntry` statement (along with putting `FabricStatistics` as a required extension).

TIP: The `profile()` call accepts a string argument to label the output so you can understand what the value represents.  We recommended using a label to help identify each reported parameter.

Profiling Results

After reviewing the profiling information on the customer’s scene, we found a way to to optimize the InlineDrawing extension to improve global performance for all of our users.  The optimization (included in Fabric 2.5) checks if the geometry topology has changed or not – this check boosted playback by about 5fps.  

 

# Results of pre-optimization using Fabric Character Asset ~75fps

DrawPolygonMeshArray: duration=10.100098 ms, startTime=+2.102211e-4', threadIndex:0

  FBXChar>ToCharacter: duration=0.0006 ms, startTime=+2.123294e-4', threadIndex:0

  Character>SetClipPoseTime: duration=0.0072 ms, startTime=+2.135341e-4', threadIndex:0

  Character>SetPose: duration=0.0271 ms, startTime=+2.210635e-4', threadIndex:0

    Character>SetBoneXfos: duration=0.0255 ms, startTime=+2.21967e-4', threadIndex:0

  Character>GetDeformedMeshes: duration=1.11311 ms, startTime=+2.487717e-4', threadIndex:0

  MeshArray>Set: duration=8.86869 ms, startTime=+1.382399e-3', threadIndex:0

    DeformBlock>For Loop>DeformInPlace: duration=7.79244 ms, startTime=+1.383905e-3', threadIndex:0

      Alembic>ReadSample: duration=4.45856 ms, startTime=+2.151002e-3', threadIndex:0

 

We continued the hunt.  When reading the Alembic mesh a PolygonMesh node was being plugged directly into the ReadSample node. We created a variable node and inserted it between the PolygonMesh node and the ReadSample node for another ~20fps bump in performance.

On the ReadSample node we set  “FreezeAttributeIndices” to true for another ~5fps gain.  This is only recommended if you are sure the geometry topology is not changing over time.

Next we realized that the texture file that is used to mask the facial deformation was being read and sampled via the uvs0 attribute at every evaluation. Changing the graph to calculate the values just once (and caching) gave another ~5fps gain.  We could really see the speed-ups by now.

 

The most significant performance gain came from a different approach to the Deform Block node in the customer’s graph. Block nodes in Canvas are not executed in parallel. Support for PEX in blocks is on our roadmap, so to optimize the deformation for now we created a custom KL Function node with a PEX call. This gave a performance gain of ~30fps!

 

TIP: Our recommendation is for customers to do their performance critical geometry deformations with PEX function nodes until PEX for block nodes is available. Quickly prototyping with Block nodes is a great way to test out deformers ahead of time, before converting to the PEX KL function node.

A lesser known optimization that gave another ~2 5fps was to set the EvalContext to false on the nodes. Since we don’t make use of the evalContext, it’s a small optimization but every small gain you squeeze out of a graph is worth it for production. Disabling the evalContext is only available via scripting with the following PyMel command:

pm.setAttr('canvasNode1.enableEvalContext', False)

Additionally, Canvas DG nodes are not executed in parallel by default with Maya’s parallel evaluation capabilities. To enable / disable the parallel evaluation as needed you can use the command:

pm.FabricCanvasSetExecuteShared(mayaNode="canvasNode1", enable=True)

Finally, an underestimated and often overlooked optimization is to set Fabric to run in unguarded mode, which will make array out-of-bounds accesses and object NULL pointer references not be checked. This allows for faster code at the expense of potential crashes on programmer errors in KL code, so only enable this mode once your graph is error free and thoroughly tested. You can enable the unguarded mode by setting the “FABRIC_SPLICE_UNGUARDED” environment variable to 1. In this case, it gave a ~10fps performance gain.

# Results post-optimization using Fabric Character Asset ~200fps

DrawPolygonMeshArray: duration=3.29994 ms, startTime=+5.601881e-5', threadIndex:0

  FBXChar>ToCharacter: duration=0.0021 ms, startTime=+5.933175e-5', threadIndex:0

  Character>SetClipPoseTime: duration=0.0060 ms, startTime=+0.617411e-4', threadIndex:0

  Character>SetPose: duration=0.0295 ms, startTime=+0.680658e-4', threadIndex:0

    Character>SetBoneXfos: duration=0.0280 ms, startTime=+0.689694e-4', threadIndex:0

  Character>GetDeformedMeshes: duration=0.9658 ms, startTime=+0.978823e-4', threadIndex:0

  MeshArray>Set: duration=1.18000 ms, startTime=+1.064357e-3', threadIndex:0

    PEX Deformer: duration=1.17967 ms, startTime=+1.067068e-3', threadIndex:0

      Alembic>ReadSample: duration=0.5276 ms, startTime=+2.055228e-3', threadIndex:0

Final Results

Since the customer’s assets couldn’t be shared here, we reproduced the case with assets we owned. The playback speed improvment for our single character on screen wasn’t as dramatic as we got for the customer case – but it is still quite good. The original frame rate of the customer graph/scene went from ~24fps to ~95-100fps. The frame rate using our own assets improved from ~75fps to ~200fps.

posted by Fabric Engine