This document describes the general concepts behind and structure of a CGNS Database. The implementation details of reading and writing to a storage medium are left to an underlying database manager. The CGIO User's Guide provides the core routines to access the database manager at that level.
Since CGNS was originally developed based on the ADF (Advanced Data Format) as the database manager, much of this documentation parallels those concepts. With the introduction of the HDF5 (Hierarchical Data Format) database manager, the concepts in this document have been "mapped" to HDF5.
We first describe the structure of the database, and the nodes from which it is constructed, followed by a description of the overall layout of the tree structure itself.
The section Detailed CGNS Node Descriptions describes the exact conventions for each type of data.
Although the physical structure of a CGNS file in storage is (or should be) of little concern to users of CGNS, an understanding of its logical or conceptual structure is essential. This structure determines its suitability for the type of data at hand and is reflected in all of the low-level database access routines.
A CGNS file consists entirely of a collection of elements called nodes. These nodes are arranged in a tree structure which is logically similar to a UNIX file system. The nodes are said to be connected in a "child-parent" relationship according to the following simple rules:
Every node in a CGNS file has exactly the same internal structure. Each node contains identifying information, pointers to any children, and, optionally, data.
When a CGNS file is opened (by the appropriate database manager), information is returned to the calling program which is sufficient to access the root node. It is then the responsibility of the program to search the tree for whatever information is required, or to add to the tree any information it wishes to store.
There is a special kind of node called a link, which serves as a pointer to a node in another CGNS file, or in another part of the same CGNS file. The tree structure at and below the node to which the link points is available as if that node were present instead of the link. This allows a CGNS file to span multiple physical files, and also allows a portion of one CGNS file to be referenced by several other CGNS files.
The Name may be left to the choice of the user, or it may be specified by the SIDS. At the levels of the tree nearest the root, the (end-)user is free to set the Name to distinguish among like objects in the case at hand. For example, in a multizone problem, nodes associated with different zones might be named "UnderLeftWing" or "AboveForwardFuselage". At this level, it is generally not possible to identify a collection of names which are likely to recur. This means that the naming of high level objects does not require standardization, and the SIDS are silent regarding the naming convention.
Because every node must be given a name when it is opened, default names, based on the node Label, are provided by convention. The CGNS Midlevel Library will record the default names if none is provided by the user. The precise formula is given in the Label section below.
At levels of the tree farther from the root, the SIDS often specify the name. There is, for example, a commonly encountered collection of flow variables whose general meaning is widely understood. In this case, standardizing the name conveys precise information. Thus the SIDS specify, for instance, that a node containing static internal energy per unit mass should have the Name "EnergyInternal". Adherence to these naming conventions guarantees uniform meaning of the data from site to site and user to user. Of course, there may be a desire to store quantities for which no naming convention is specified. In this case any suitable name can be used, but there is no guarantee of proper interpretation by anyone unaware of the choice.
Within CGNS, nearly all labels reflect C-style type definitions ("typedefs") specified by the SIDS, and end in the characters "_t". Some "Leaf" nodes (i.e. nodes that have no children) do not represent higher level CGNS structures and therefore have labels that do not follow the "_t" convention. At this writing, all such nodes have the type int[], i.e., integer array, a type already recognized in C, for which a separate type definition would be artificial. Such nodes are generally located by the software through their names, which are specified by the SIDS, rather than through their labels.
The Label generally indicates the role of the data at and below the node in the context of CFD. Nodes which are entry points to data for a particular zone, for example, have the Label "Zone_t".
Parent nodes often have a number of children each containing data for different instances of the same structure. Multiple children of the same parent therefore often have the same Label. It is customary for software to conduct searches which depend on the Label, e.g., to determine the number of zones in a problem. The software will fail if the conventions regarding Labels are not observed.
Labels are also used to build default node Names. These are derived from the Label by dropping the characters "_t" and substituting the smallest positive integer resulting in a unique name among children of the same parent. For example, the first default Name for a node of type Zone_t will be "Zone1"; the second will be "Zone2"; and so on.
The Data Type is a 2-byte character field which specifies the type and
precision of any data which is stored in the data field. The data types
allowed by CGNS are:
Data Type | Notation | ||
---|---|---|---|
No Data | MT | ||
Link | LK | ||
Integer 32 | I4 | ||
Integer 64 | I8 | ||
Real 32 | R4 | ||
Real 64 | R8 | ||
Character | C1 |
The LK node is a link to a node within within the current or an external database. The actual data stored with the node is a file name and node path name to the link (as C1 data). This information is available only through the core link access routines. A normal read of a LK node will will return the node properties (with the exception of the Name) from the linked-to node.
The specification of data types within the File Mapping allows for the probability that files written under different circumstances, or on different platforms, may differ in precision or format. It is the responsibility of the database manager to provide any data conversions required, and to ensure precision of the data.
Whenever data is stored at a node, it is in the form of a single array of elements of a single date type. Note that Fortran-indexing conventions are used within CGNS. For multi-dimensional data, this means that the first index of the array varies the fastest, and indexing starts at 1. This is important to understand when reading the data into multi-dimensional "C" arrays, since the indexing order is opposite to that as used by CGNS.
Another thing to note, is that character data (C1) is not stored with a "C" terminating '0' character, but rather is padded out with blanks (space character) as in Fortran character data. The MLL, however, will add the terminating-'0' to the strings when returning this data to a "C" application.
In the case of rectangular arrays of physical data, the dimension values are set to the actual sizes in physical space. Note that these sizes often depend on whether the values are associated with grid nodes, cell centers or other physical locations with respect to the grid. In any event, they refer to the amount of data actually stored, not to any larger array from which it may have been extracted.
In the case of list data, the dimension value is the length of the list. Lists of characters may contain termination bytes such as "\n"; this means an entire document can be stored in the data field.
CGNS imposes no conventions on the data itself. Note that it is a responsibility of the CGNS software to ensure that the amount and type of stored data agrees with the specification of the data type, number of dimensions, and dimension values.
Children may be identified by their names and labels, and, thence, by their node IDs once these have been determined. There is no provision for the notion of order among children. In particular, the order of a list of children returned by the database manager may or may not have anything to do with the order in which they were inserted in the file. However, the order returned is consistent from call to call provided the file has not been closed and the node structure has not been modifed.
Note that there is no parent table; that is, a node has no direct knowledge of its parent. Since calling software must open the file from the root, it presumably cannot access a child without having first accessed the parent. It is the responsibility of the calling software to record the node ID of the parent if this information will be required.
The Child Table is completely controlled by the database manager, and it's role is exactly the same for CGNS. CGNS software accesses and modifies the child table only through calls to the database manager.
In addition to the meaning of attributes of individual nodes, the File Mapping specifies the relations between nodes in a CGNS database. Consequently, the File Mapping determines what kinds of nodes will lie in the child table.
Since the notion of a CGNS database allows for work in progress, the lower limit is generally zero (although the database may be of little use until certain nodes are filled). The upper limit is usually either one or many (N).
A datum which is necessary for the proper interpretation of a node but which is derived from its ancestors is referred to as a structure parameter.
A CGNS database is defined by the existence of a node or nodes which conforms to the specifications given below for a node of type "CGNSBase_t". This node is conceptually the root of the CGNS database. Because it is created and controlled by the user, it is not the actual root of the database file. However, current CGNS conventions require that it be located directly below the database root node which is identified by the name "/".
Further, by the mechanism of links, a CGNS database may span multiple files. Thus there is no notion of a CGNS file, only of a CGNS database implemented within one or more files.
By virtue of its intended use, a CGNS database is dynamic in that its content at any time reflects the current state of a CFD problem of interest. For example, after the completion of a grid generation procedure, a CGNS file may contain a grid but no boundary conditions. Therefore, beyond the occurrence of a CGNSBase_t node, there is no minimum content required in a CGNS database.
Conversely, there is no proscription against the inclusion, anywhere within a CGNS database, of nodes of any form whatsoever, provided only that their naming and labeling does not mimic CGNS conventions. Such "non-CGNS" nodes, and those below them in the tree, are not regarded as part of the CGNS database. CGNS software will not detect the existence of non-CGNS nodes.
We may therefore take the following as a definition of a CGNS database:
A CGNS database is a subtree of a file or files which is rooted at a node with label "CGNSBase_t" and which conforms to the SIDS data model as implemented by the SIDS File Mapping.
A file may contain more than one CGNSBase_t node; i.e., there may be more than one CGNS database rooted within the same file. CGNS software accepts the name of the desired database as an argument, and will locate the correct CGNSBase_t node within the specified file. Obviously, each CGNSBase_t node in a single file must have a unique name.
A CGNS database may link to CGNS nodes in the same or other files. Thus, for example, a CGNS database may reference the grid from another CGNS database without physically copying the the information. In this case, the structure of the file into which the link is made is invisible except below the node to which the link is made.
Beyond Open and Close neither CGNS or the database manager provides any file management facilities. The user is responsible for ensuring that:
It is possible, within CGNS, to protect files from inadvertent writing by opening them as "read only".
At the highest level of the tree defining a CGNS database there is always a node labeled "CGNSBase_t". The name of this node is user defined, and serves essentially as the name of the database itself. This name is used by the CGNS software to open the database.
A file may also contain other nodes below the root node beside CGNSBase_t, but these are not officially part of the CGNS database and will not be recognized by most CGNS software. One exception to this is a node called CGNSLibraryVersion_t, which is a child of the root node. This node stores the version number of the CGNS standard with which the file is consistent, and is created automatically when the file is created or modified using the CGNS Mid-Level Library. Officially, the CGNS version number is not a part of the CGNS database (because it is not located below CGNSBase_t). But because the Mid-Level Library software makes use of it, the node is included in this document.
Below the root, the organization of a CGNS database reflects the problem topology. Omitting detail, the overall structure of the CGNS database file is shown in the first of the CGNS File Mapping Figures. Below the root node is the CGNSLibraryVersion_t node, and one or more CGNSBase_t nodes. Each CGNSBase_t node is the root of a CGNS database.
At the next level below a CGNSBase_t node are general specifications which apply to the problem globally, such as reference states, units, and so on. At this level we also find a collection of nodes labeled "Zone_t". The tree below each of these holds all the data local to one of the various zones or subdomains which constitute the problem.
Beneath each Zone_t node there are nodes whose subtrees store: the grid (labeled GridCoordinates_t); flowfields (FlowSolution_t); boundary conditions (ZoneBC_t); information about the geometrical connection to other zones (GridConnectivity_t); and information defining time-dependent data. Below these there may be additional nodes containing yet more geometrically local information. For example, under the ZoneBC_t node there are nodes defining individual boundary conditions on portions of faces of the zone (BC_t).
Certain types of nodes originally specified at a high level are optionally repeated below. For example, immediately below a Zone_t node we may find another ReferenceState_t node. The CGNS convention is that such a node overrides (for the associated portion of the topology only) any data found at a higher level.
No specification of the kind represented by this file mapping can ever be complete. However, it is worth noting that there are certain entities common in CFD which are not currently specified by the file mapping.
Within nodes of type FlowSolution_t, the current file mapping permits the storage of fields of any number of dependent variables. In addition to those whose names are specified in the SIDS the user may add any desired quantities, naming them appropriately. Names that are not currently codified in the SIDS will not be common between practitioners without separate communication.
Obviously any sort of physical field could be stored in a FlowSolution_t node. The problem with using CGNS for such applications lies in the probable need to specify additional physical information. Standardizing this information is tantamount to extending the SIDS and File Mapping to the disciplines in question.
Similarly, if a reacting flow problem requires the specification of rate tables or catalytic wall boundary conditions, extensions to the SIDS and File mapping will be needed.