Using node slicing to manage large scenes
Often in production, large scenes are required to be visualized and edited in real time. These large scenes often create significant performance problems as the application architectures struggle to manage the large graphs that are built when loading large scenes.
Scene Graph Representations
Each drawn geometry on-screen represents a small graph. The Geometry and Transform values are usually computed in separate nodes and access from the instance node which manages drawing to screen.
When a scene containing multiple objects is loaded, these 4 nodes are constructed for each object on screen.
Most DCC apps enable the sharing of the geometry between multiple instances, each with a unique transform. This concept is called ‘Instancing’ and enables many geometries to be drawn without requiring a multiple copies of the geometry. This concept is also supported in hardware via DirectX and OpenGL. A single draw call is issued, which results in the same geometry being drawn efficiently.
This optimization reduces the complexity of the graph if the geometries are all identical.
The description above over simplifies the scene graphs of most DCCs. In fact, the Transform, Material, and Geometry nodes are themselves complex graphs.
The transform node is a good example. Every transform node is itself made up of a graph which is evaluated to generate the transform value. In Softimage, the transform node has many animatable parameters that are evaluated in order to generate the final transform. The transform node in Softimage features operator stacks for global or local evaluation, pivot transform, and pivot animation.
The number of nodes in the underlying scene graph is much larger than the actual number of geometries drawn on screen. In reality, the DCC applications generate enormous graphs often providing features that are not necessary for all purposes, such as scene layout. When loading very large scenes, the size of the required graph becomes large and unwieldy.
For a scene containing 1000 drawn geometries, the underlying scene graph may exceed 50000 nodes.
Multithreaded Evaluation of Large Graph
The traditional DCCs feature single threaded traversals, and usually the scene evaluation is heavily CPU limited. One solution to this problem is to build more efficient graphs that feature multi-threaded traversal. Multi-threaded graph traversals mean that on a multi-core CPU, different parts of the graph can be updated in parallel by independent cores.
This is a good solution, and reduces the problem, but doesn’t attempt to address the underlying problem, which can be defined as the graph itself is too large.
Another solution is to optimize the graph, generating a simpler graph dynamically. This is a very advanced technique that a few architectures to utilize, and can reduce the size of the resulting evaluation graph, after the initial graph is constructed.
The Fabric Engine graph traversal is fully multi-threaded, and so the graph state is updated efficiently on multi-core CPUs. However, the graphs are built using Python, and so instantiating large graphs is a slow process. Our solution is to build smaller graphs that can represent a lot of data.
Node slicing is a concept that enables a single node to store multiple unique data sets that share a common structure.
Fabric Engine enables users to build graphs by constructing nodes, and connecting them together using dependencies. Nodes are data containers, and are populated by adding members.
Once a node has been configured, it represents a single instance of data. The slice count of the node defaults to 1, indicating that only 1 set of the described data is allocated.
node.addMember(‘foo’, ‘Vec3’, Vec3(1,2,3)) node.addMember(‘bar’, ‘Vec3’, Vec3(1,2,3)) node.addMember(‘transform’, ‘Xfo’) node.setCount(3)
By adding members to a node, we are defining the structure of the data stored in the node, and by setting the slice count, we are defining the number of instances stored in the node.
An analogy of node slicing, is tables in a database:
Using Node Slicing to Manage Large Volumes of Data
Often when dealing with large volumes of data, there are patterns which can be exploited.
By using the node slice count feature in the FabricEngine core, we can store many sets of data in any given node.
The polygon mesh data is stored in a pair of data types called ‘GeometryAttributes’, and ‘PolygonMesh’. These types combine to encapsulate the data for a piece of geometry. The geometry node therefore contains these 2 members, to store a geometry on a single slice of a dgnode.
If a geometry dgnode with a single slice contains a single geometry, then by giving the dgnode a slice count of 1000, we can now store 1000 geometries in the same dgnode. Each slice of the node can store a completely unique geometry, and have have operators operator on it’s data.
The transform node can also be sliced, enabling many transforms to be stored in a single node.
The Creation Platform scene graph has been designed around the concept of slicing, and the operators are designed to operate on sliced nodes.
Loading a Large Scene using Creation Platform
Given a very large static scene where all geometries are rendered using the same shader, the entire scene can be stored in a few sliced nodes.
The ‘Geometry node is sliced according to the number of unique geometries in the scene, the Material node is sliced according to the number of unique materials in the scene, and the transform node is sliced according to the number of unique transforms in the scene, and the Instance node is sliced according to the number of drawn geometries.
Constructing this graph takes a few milliseconds as so few nodes are required. We use thread-safe operators to load the data from the persisted file(Fbx or Alembic), so all the various geometries can be parsed in parallel as soon as the graph is constructed.
Loading of very large scene can then be reduced to the time it takes to construct the graph, plus the time it takes to parse the geometries in parallel.
The clean/dirty state of the individual slices it the responsibility of the programmer to manage, and we provide tools to make this job easy.
Where to break up Sliced nodes.
Slicing assumes that all the data sets in the node are computed using the same set of operators. This means that if all the geometries are drawn with a normal mapping shader that requires ‘Tangents’, then the node can have the ‘ComputeTangents’ component applied that will compute the tangents for each geometry in parallel.
However, if some geometries require tangent computation, and some do not, the geometries need to be divided into sets that share common operators.
The geometries could be split into 2 nodes, one that requires the ComputeTangents’ component, and the other that doesn’t.
Some geometries may be animates, such as grass and bushes that have a procedural wind animation modifying the geometry. These geometries will be stored in a modified geometry node, or a have a custom component applied that computes the animation. These geometries should be separated from the static geometries as they cannot be all computed in the same geometry container node.
Characters typically require many custom components and solvers applied to generate the correct animation, and are also drawn using a custom instance node called the ‘CharacterInstance’. Characters would not be stored in the same node as the static geometries.
Data sets with a common structure, and that have the same operators applied to them can be stored in the same sliced node.
Even in a complex scene containing many thousands of geometries, including animated grass, trees and characters, the size of the resulting graph is not large, and can be constructed in a few milliseconds. The loading and parsing of the data still may take some time, but this work can usually be distributed across multiple cores.
The structure of the problem has been transformed from a large graph featuring many small independent tasks, to a small graph featuring few large SIMD tasks.
An area of active research at Fabric Engine is utilizing the GPU to evaluate the graph and compute data. Operators in the graph can be flagged to evaluate on the GPU, whereby the memory stored in the bound nodes is uploaded to the GPU where the operator is executed.
GPU computation is ideally suited to computing large SIMD workloads.
In the near future we expect that large scenes of deforming geometries can be computed mostly on the GPU, where the data is then ready for rendering directly to screen.
The Creation Platform scene graph has been designed to elegantly handle loading and rendering of large numbers of geometries, by simplifying the underling graph representations. Node slicing is a concept that can be applied to many problems where large sets of similiarly structured data exist.