CGNS SIDS-to-HDF File Mapping Manual - General CGNS File Mapping Concepts

(CGNS Documentation Home Page) (Steering Committee Charter) (Overview and Entry-Level Document) (A User's Guide to CGNS) (Mid-Level Library) (Standard Interface Data Structures) (SIDS File Mapping Manual) (CGIO User's Gudie) (Parallel CGNS User's Guide) (ADF Implementation) (HDF5 Implementation) (Python Implementation) (CGNS Tools and Utilities)

(Implementation Summary) (General HDF5 Mapping Concepts)

This section describes the general philosophy underlying the use of the HDF5 tree structure by CGNS. The section Detailed CGNS Node Descriptions describes the exact conventions for each type of data.

We first describe the roles of the various HDF5 elements (i.e. groups or attributes) as they are specifically applied within CGNS, followed by a description of the overall layout of the tree structure itself.

Use of HDF5 Elements in CGNS

We previously described the general role of each of the HDF5 elements without reference to CFD. Here we note any additional information regarding their use within CGNS.

We first describe attributes recognized by both HDF5 and CGNS. We then describe certain elements which are derived from context, i.e., which the node possesses by virtue of its location within a CGNS database. These notions, namely, Cardinality, Parameters, and Functions, are unique to CGNS.

The Node ID

The Node ID is completely controlled by HDF5, and thus its role is exactly the same for CGNS as it is for HDF5. CGNS software accesses the Node ID only through calls to HDF5. HDF5 itself guarantees that Node IDs are unique and constant within any HDF5 file (or collection of files) while the file(s) are open.

The Node Name

In CGNS, the Name may be left to the choice of the user, or it may be specified by the SIDS. At the levels of the tree nearest the root, the (end-)user is free to set the Name to distinguish among like objects in the case at hand. For example, in a multizone problem, nodes associated with different zones might be named "UnderLeftWing" or "AboveForwardFuselage". At this level, it is generally not possible to identify a collection of names which are likely to recur. This means that the naming of high level objects does not require standardization, and the SIDS are silent regarding the naming convention.

Because every HDF5 node must be given a name when it is opened, default names, based on the node Label, are provided by convention. The CGNS Midlevel Library will record the default names if none is provided by the user. The precise formula is given in the Label section below.

At levels of the tree farther from the root, the SIDS often specify the name. There is, for example, a commonly encountered collection of flow variables whose general meaning is widely understood. In this case, standardizing the name conveys precise information. Thus the SIDS specify, for instance, that a node containing static internal energy per unit mass should have the Name "EnergyInternal". Adherence to these naming conventions guarantees uniform meaning of the data from site to site and user to user. Of course, there may be a desire to store quantities for which no naming convention is specified. In this case any suitable name can be used, but there is no guarantee of proper interpretation by anyone unaware of the choice.

By extension, a node name is a series of names separated by a slash "/" (like the POSIX file system names), moreover "/" is the root name of the CGNS tree.

A CGNS Name can contain any printable ASCII character except the slash "/" and the dot "." when this dot is the first character of the name.

The Label

Within CGNS, nearly all labels reflect C-style type definitions ("typedefs") specified by the SIDS, and end in the characters "_t". Some "Leaf" nodes (i.e. nodes that have no children) do not represent higher level CGNS structures and therefore have labels that do not follow the "_t" convention. At this writing, all such nodes have the type int[], i.e., integer array, a type already recognized in C, for which a separate type definition would be artificial. Such nodes are generally located by the software through their names, which are specified by the SIDS, rather than through their labels.

The Label generally indicates the role of the data at and below the node in the context of CFD. Nodes which are entry points to data for a particular zone, for example, have the Label "Zone_t".

Parent nodes often have a number of children each containing data for different instances of the same structure. Multiple children of the same parent therefore often have the same Label. It is customary for software to conduct searches which depend on the Label, e.g., to determine the number of zones in a problem. The software will fail if the conventions regarding Labels are not observed.

Labels are also used to build default node Names. These are derived from the Label by dropping the characters "_t" and substituting the smallest positive integer resulting in a unique name among children of the same parent. For example, the first default Name for a node of type Zone_t will be "Zone1"; the second will be "Zone2"; and so on.

The Data Type

Data Types are completely specified by the file mapping. Although HDF5 provides a number of types, in CGNS the only types used are MT (No Data), I4 (Integer), R4 and R8 (Real), C1 (Character), and LK (Link).

The specification of data types within the File Mapping allows for the probability that files written under different circumstances may differ in precision. The issue is complicated by the ability of HDF5 to sense the capabilities of the platform on which it is running and interpret or record data accordingly. The general rule is that, although the user of HDF5 can specify the precision in which it is desired to read or write the data, HDF5 knows both the precision available at the source and the precision acceptable to the destination and will mitigate accordingly. Thus to specify the precison of real data as R4, for example, has no meaning unless both R4 and R8 are available. Therefore, the generic specification "DataType" is used to allow for all possibilities.

For all integer data specified by the SIDS, I4 provides sufficient precision.

The Number of Dimensions

Whenever data is stored at a node, it is in the form of a single array of elements of a single date type. Insofar as possible, the dimension specified by CGNS is the natural underlying dimension; for example, a rectangular array of pressures is stored with dimension equal to the physical dimension of the problem.

There are situations in which this representation is not feasible. For instance, a list of points which do not form a rectangular array in physical space may be compacted into a one-dimensional array in HDF5.

Frequently the data is of type C1 (character data). In some cases, the data holds additional information in the form of a name specified by the SIDS, and in some cases holds user comment. All such data is generally represented as a one-dimensional array (or list) of characters.

The Dimension Values

These are used exactly as specified by HDF5. In the case of rectangular arrays of physical data, the dimension values are set to the actual sizes in physical space. Note that these sizes often depend on whether the values are associated with grid nodes, cell centers or other physical locations with respect to the grid. In any event, they refer to the amount of data actually stored, not to any larger array from which it may have been extracted.

In the case of list data, the dimension value is the length of the list. Lists of characters may contain termination bytes such as "\n"; by this means an entire document can be stored in the data field.

The Data

CGNS imposes no conventions on the data itself beyond those specified by HDF5. Note that it is a responsibility of the CGNS software to ensure that the amount and type of stored data agrees with the specification of the data type, number of dimensions, and dimension values.

The Child Table

The Child Table is completely controlled by HDF5, and thus its role is exactly the same for CGNS as it is for HDF5. CGNS software accesses and modifies the child table only through calls to HDF5.

In addition to the meaning of attributes of individual HDF5 nodes, the File Mapping specifies the relations between nodes in a CGNS database. Consequently, the File Mapping determines what kinds of nodes will lie in the child table.

It is important to reemphasize that HDF5 provides no notion of order among children. This means children can be identified only by their names, labels and system-provided node IDs. In particular, the order of a list of children returned by HDF5 has nothing to do with the order in which they were inserted in the file. However, the order returned is consistent from call to call provided the file has not been closed and the node structure has not been modifed.

Cardinality

The cardinality of a CGNS node is the number of nodes of the same label permitted at one point in the tree, i.e., as children of the same parent. It consists of both lower and upper limits.

Since the notion of a CGNS database allows for work in progress, the lower limit is generally zero (although the database may be of little use until certain nodes are filled). The upper limit is usually either one or many (N).

Parameters

CGNS relies on the fact that SLL nodes cannot be found except by following the pointers from their parents. This means that software accessing a node has had an opportunity to note all the data above that node in the tree. Therefore, nodes do not repeat within themselves information which is necessary for their interpretation but which is available at a higher level.

A datum which is necessary for the proper interpretation of a node but which is derived from its ancestors is referred to as a structure parameter.

Functions

Occasionally the proper interpretation of a node depends on an implicitly understood function of its structure parameters. Usually these relate to the actual amount of data stored. Several of these functions are defined in the SIDS and referenced in this document.

CGNS Databases

Definition of a CGNS Database

By definition, a CGNS database is created when, within an HDF5 file, a node is created which conforms to the specifications given below for a node of type "CGNSBase_t". This node is conceptually the root of the CGNS database. Because it is created and controlled by the user, it cannot be the root of the HDF5 file. Current CGNS conventions require that it be located directly below the HDF5 root node.

Further, by the mechanism of links, a CGNS database may span multiple files. Thus there is no notion of a CGNS file, only of a CGNS database implemented within one or more HDF5 files.

By virtue of its intended use, a CGNS database is dynamic in that its content at any time reflects the current state of a CFD problem of interest. For example, after the completion of a grid generation procedure, a CGNS file may contain a grid but no boundary conditions. Therefore, beyond the occurrence of a CGNSBase_t node, there is no minimum content required in a CGNS database.

Conversely, there is no proscription against the inclusion, anywhere within an HDF5 file containing a CGNS database, of nodes of any form whatsoever, provided only that their naming and labelling does not mimic CGNS conventions. Such "non-CGNS" nodes, and those below them in the HDF5 tree, are not regarded as part of the CGNS database. CGNS software will not detect the existence of non-CGNS nodes.

We may therefore take the following as a definition of a CGNS database:

A CGNS database is a subtree of an HDF5 file or files which is rooted at a node with label "CGNSBase_t" and which conforms to the SIDS data model as implemented by the SIDS File Mapping.

Location of CGNS Databases within HDF5 Files

An HDF5 file may contain more than one CGNSBase_t node; i.e., there may be more than one CGNS database rooted within the same HDF5 file. CGNS software accepts the name of the desired database as an argument, and will locate the correct CGNSBase_t node within the specified HDF5 file. Obviously, each CGNSBase_t node in a single HDF5 file must have a unique name.

A CGNS database may link to CGNS nodes in the same or other HDF5 files. Thus, for example, a CGNS database may reference the grid from another CGNS database without physically copying the the information. In this case, the structure of the HDF5 file into which the link is made is invisible except below the node to which the link is made.

File Management

Beyond Open and Close neither HDF5 nor CGNS provides any file management facilities. The user is responsible for ensuring that:

The HDF5 file containing the root of the required database is available and its permissions are properly set at runtime.
If links are made to other HDF5 files, including any not under the user's direct control, these are also available at runtime.
No file is opened for writing by more than one program at a time.

It is possible, within CGNS, to protect files from inadvertent writing by opening them as "read only".

Internal Organization of a CGNS Database

The `CGNSBase_t` Node

At the highest level of the tree defining a CGNS database there is always a node labeled "CGNSBase_t". The name of this node is user defined, and serves essentially as the name of the database itself. This name is used by the CGNS software to open the database.

The `CGNSLibraryVersion_t` Node

An HDF5 file may also contain other nodes below the root node beside CGNSBase_t, but these are not officially part of the CGNS database and will not be recognized by most CGNS software. One exception to this is a node called CGNSLibraryVersion_t, which is a child of the HDF5 root node. This node stores the version number of the CGNS standard with which the file is consistent, and is created automatically when the file is created or modified using the CGNS Mid-Level Library. Officially, the CGNS version number is not a part of the CGNS database (because it is not located below CGNSBase_t). But because the Mid-Level Library software makes use of it, the node is included in this document.

Topological Basis of CGNS Database Organization

Below the root, the organization of a CGNS database reflects the problem topology. Omitting detail, the overall structure of the HDF5 file is shown in the first of the CGNS File Mapping Figures. Below the HDF5 root node is the CGNSLibraryVersion_t node, and one or more CGNSBase_t nodes. Each CGNSBase_t node is the root of a CGNS database.

At the next level below a CGNSBase_t node are general specifications which apply to the problem globally, such as reference states, units, and so on. At this level we also find a collection of nodes labeled "Zone_t". The tree below each of these holds all the data local to one of the various zones or subdomains which constitute the problem.

Beneath each Zone_t node there are nodes whose subtrees store: the grid (labeled GridCoordinates_t); flowfields (FlowSolution_t); boundary conditions (ZoneBC_t); information about the geometrical connection to other zones (GridConnectivity_t); and information defining time-dependent data. Below these there may be additional nodes containing yet more geometrically local information. For example, under the ZoneBC_t node there are nodes defining individual boundary conditions on portions of faces of the zone (BC_t).

Certain types of nodes originally specified at a high level are optionally repeated below. For example, immediately below a Zone_t node we may find another ReferenceState_t node. The CGNS convention is that such a node overrides (for the associated portion of the topology only) any data found at a higher level.

Topics Not Currently Covered

No specification of the kind represented by this file mapping can ever be complete. However, it is worth noting that there are certain entities common in CFD which are not currently specified by the file mapping.

Within nodes of type FlowSolution_t, the current file mapping permits the storage of fields of any number of dependent variables. In addition to those whose names are specified in the SIDS the user may add any desired quantities, naming them appropriately. Names that are not currently codified in the SIDS will not be common between practitioners without separate communication.

Obviously any sort of physical field could be stored in a FlowSolution_t node. The problem with using CGNS for such applications lies in the probable need to specify additional physical information. Standardizing this information is tantamount to extending the SIDS and File Mapping to the disciplines in question.

Similarly, if a reacting flow problem requires the specification of rate tables or catalytic wall boundary conditions, extensions to the SIDS and File mapping will be needed.