DnAML (Data and Analytics Markup Language): A Practical Guide for Analytixus

Data and analytics teams juggle multiple concerns: raw source onboarding, semantic enrichment, lineage tracking, dependency management, and consistent code generation across pipelines and layers. DnAML (Data and Analytics Markup Language) is a domain-specific, JSON-like metadata format designed to unify these concerns in a single, readable source of truth. This post explains what DnAML is, what you can do with it, why you might need it, and how to use it—plus small, copy-ready examples you can adapt for your Analytixus projects.


What is DnAML?

DnAML is a compact, human-readable language for semantically describing and orchestrating data sources, structures, and transformations in analytics solutions. It lets you centrally define and control data models along the processing chain—from raw ingestion through semantic modeling to analytical delivery. Conceptually, unsupervised metadata extracted from source systems is transformed into a curated DnAML model, forming the second stage in the metadata life cycle.

DnAML borrows a familiar, block-oriented structure reminiscent of JSON but adds its own semantics for:

  • Declaring tables and views
  • Modeling column types, nullability, defaults, keys, and free-form options
  • Defining lineage via ORIGINS with UNION/JOIN behavior
  • Expressing cross-object relationships via REFERENCES
  • Organizing models with SOURCES and FOLDER constructs

These core constructs are part of the language grammar and enable consistent, machine- and human-friendly modeling of data assets.


What can you do with DnAML?

  • Define data objects (tables, views) and their attributes with clear types, constraints, and options.
  • Document lineage with ORIGINS, composing objects from upstream OBJECTs using UNION or JOIN and explicit field mappings.
  • Specify relationships and matching rules through REFERENCES, including comparisons, ranges, constants, and optional expressions to annotate logic.
  • Organize models by SOURCE (systems) and FOLDER (domains or layers like Bronze, Silver, Gold) to reflect your architecture.
  • Capture operational metadata via free-form options in square brackets—for example, schema names, owners, synchronization hints, and more.

Why DnAML?

Teams need a consistent way to represent:

  • What an object contains (columns, types, constraints),
  • Where it comes from (lineage and mapping),
  • How it relates to other objects (references),
  • How the model is organized (sources, folders, layers).

By keeping all of this in one language, you gain:

  • Clarity and maintainability across pipeline stages
  • Versionability of models as code
  • A foundation for automation (generating DDL, ETL/ELT jobs, notebooks, or validation rules)
  • End-to-end traceability of attribute origins across processing steps

The DnAML grammar natively supports all of these aspects, aligning modeling, lineage, and relationships in a single definition.


How DnAML is used in practice

DnAML uses statements and blocks with consistent syntax:

  • Top-level statements include TABLE, VIEW, REFERENCES, SOURCES, and FOLDER.
  • Blocks use braces { ... }, entries are comma-separated, and semicolons are optional.
  • Identifiers can be dot-qualified (e.g., Application.Cities) or quoted when they include spaces or special characters (e.g., "Order Item").
  • Free-form options appear in square brackets [ ... ] as either bare terms or key: value pairs, where values can be identifiers, strings, or numbers.
  • Comments use /* ... */ and are ignored by parsers.

Workflow: Integrating XML and DnAML

In a typical metadata management workflow, XML and DnAML complement each other:

  • Unsupervised metadata: Automatically extracted metadata from source systems or code is initially stored in XML. This stage captures raw, uncurated information.
  • DnAML modeling: Unsupervised metadata is converted into a DnAML model, where it is structured, semantically enriched, and maintained. Here, you also define mappings between source and target structures.
  • Supervised metadata: Once curated and validated in DnAML, the model is persisted back to XML—but now as supervised metadata. This XML becomes the basis for automated processing, such as generating data structures, pipelines, or validation artifacts.

Examples

Below are small examples you can copy into your projects. They illustrate sources, tables, views, lineage, and options. Each example is presented in two code block variants for easy copying.

1) Declaring a source and a table with typed columns and options

Sources {
    Source WWI  {
        Table Application.Cities [SchemaName:Application,SourceName:wwierp] {
            Columns {
                CityID INT NOT NULL PRIMARY KEY [BasisDataType:INTEGER,Ordinal:1,IsKey:1,IsTechnical:0],
                CityName NVARCHAR(50) NOT NULL [BasisDataType:ALPHANUMERIC,Ordinal:2,IsKey:0,IsTechnical:0],
                ...
            }
        }
    }
}
  • Uses SOURCES and Source to group objects; Table includes a Columns block with types, nullability, primary key, and options.

2) Bronze-layer view with lineage (Origins) and semantic options

Folder Bronze { 
    View Bronze_Access.Application_Cities [SchemaName:'Bronze_Access'] {
        Columns {
            DWH_SOURCE_ID INT NOT NULL [BasisDataType:INTEGER, IsTechnical:1], 
            CityID INT NULL UNIQUE [BasisDataType:INTEGER, IsKey:1, OriginColumnName:'CityID', OriginId:'Bronze.Bronze_Store.Application_Cities.CityID'], 
            ... 
        }
        Origins { 
            Object "Bronze.Bronze_Store.Application_Cities" AS Stage [AliasName:Stage, SchemaName:'Bronze_Store'] 
        } 
    } 
}
  • Uses FOLDER to organize layer-specific models; View includes Columns and an Origins block with an OBJECT, aliased via AS, and free-form options.

3) Columns, defaults, identity, and constraints

Table Sales.Order [SchemaName:dbo] {
  Columns {
    OrderId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    OrderDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CustomerId INT NOT NULL,
    Comment NVARCHAR(200),
    Total DECIMAL(18,2) NOT NULL
  }
}
  • Shows supported types, IDENTITY with seed and increment, NULL/NOT NULL, defaults, and constraints (PRIMARY KEY, UNIQUE).

4) Lineage with UNION/JOIN and field mappings

View rpt.ActiveOrders [SchemaName:dbo] {
  Columns { OrderId INT, CustomerId INT, Status VARCHAR(20) },
  Origins JOIN [on:OrderId] {
    Object erp.Orders AS o [system:'ERP'] {
      o.OrderId = rpt.ActiveOrders.OrderId
    },
    Object crm.OrderExtras AS e {
      e.OrderId = rpt.ActiveOrders.OrderId
    }
  }
}
  • ORIGINS declares composition via JOIN; OBJECT entries can have aliases and free-form options; mappings use SourceId = TargetId.

5) References (relationships) with rules

References [strict:true] {
  Reference FK_Order_Customer FOREIGNOBJECT:Sales.Order PRIMARYOBJECT:dbo.Customer [cascade:false] {
    CustomerId = Id,
    OrderDate BETWEEN MinDate AND MaxDate,
    Status = 'Active' [weight:10],
    'VIP' = Tag,
    CustomerId = Id EXPRESSION: 'Customer.Id equals Order.CustomerId'
  }
}
  • REFERENCES groups REFERENCE entries; supports comparisons between identifiers and constants, BETWEEN ranges, and EXPRESSION annotations to document logic.

Getting started

  • Model sources and organization:
  • Use SOURCES to declare upstream systems and FOLDERs to group objects by domain or layer (e.g., Bronze, Silver, Gold).
  • Define objects:
  • Create TABLE or VIEW blocks with COLUMNS, including types, nullability, identity, defaults, and constraints.
  • Add lineage:
  • Use ORIGINS with UNION/JOIN; add OBJECT entries with optional aliases and field mappings.
  • Describe relationships:
  • Use REFERENCES with REFERENCE rules for FK-to-PK matching, constants, ranges, and annotations.
  • Use options for metadata:
  • Add square-bracketed [ ... ] options anywhere to capture schema, ownership, precision, hints, etc.

Tips and best practices

  • Quote identifiers that contain spaces or special characters (e.g., "Order Item") to avoid parsing ambiguity.
  • Keep operational and semantic metadata in [ ... ] options; this separates core schema from auxiliary attributes without cluttering structure.
  • Be explicit about lineage behavior:
  • Choose UNION vs. JOIN in ORIGINS to document composition semantics clearly.
  • Use REFERENCES consistently:
  • FOREIGNOBJECT represents the foreign-key side; PRIMARYOBJECT the primary-key side.

Advanced: Expressions and operator precedence

DnAML supports expressions for defaults and rules. You can use identifiers, strings, numbers, or tuples; unary operators include NOT, plus/minus, bitwise NOT; binary operators include arithmetic, bitwise, comparison, AND/OR, LIKE/IN, and their NOT variants. Operator precedence ensures predictable evaluation of complex expressions.

DEFAULT (Amount + Tax) / 2
DEFAULT NOT (Status = 'Inactive')
  • Parentheses control evaluation order; compound tokens like NOT LIKE and NOT IN are treated as single operators in parsing.

Where DnAML fits in Analytixus

  • Bronze/Silver/Gold modeling:
  • Use FOLDERs to organize layers and Views/Tables to declare layer-specific schemas, lineage, and rules.
  • Automation:
  • Curated DnAML models can drive generation of DDL scripts, pipeline tasks, notebooks, or validation configurations.
  • Governance:
  • Semantic enrichment in DnAML plus supervised XML persistence enables transparent lineage and audit-ready documentation.

Summary

DnAML provides a unified, domain-specific language to describe your analytics models: sources, objects, attributes, lineage, and relationships. It is readable enough for collaborative design and precise enough for reliable automation. If you’re building or maintaining Data Lakehouse or analytic platforms, DnAML can serve as the backbone of your metadata-driven workflows—bridging raw extraction and curated, supervised models that power repeatable, scalable data engineering.

Leave a Reply