Code Generation in Analytixus with Python and Jinja2: A Practical, Scalable Pattern

Audience: Analytixus users and Data Engineers

This article demonstrates how to generate SQL code from metadata in Analytixus using Python and Jinja2. We use a simple, understandable example (a CREATE TABLE statement) to show the pattern clearly. In real projects, you will modularize and extend this approach for more artifacts (views, DDL migrations, MERGE statements, CDC logic, and entire load pipelines). Treat everything below as an example to illustrate the principles; future modularization is both necessary and recommended.

Jinja-Documentation

Jinja2 at pypi


Why Python + Jinja2 in Analytixus?

  • Scalability: One metadata object → one code file; hundreds of objects → hundreds of files, automatically.
  • Consistency: A single source‑of‑truth template enforces conventions across all generated artifacts.
  • Speed of change: Update the template or type mapping once; regenerate all outputs.
  • Separation of concerns: Business metadata lives in Analytixus; rendering logic lives in templates and small utility functions.

The Stable Interface Contract in Analytixus

Analytixus calls your generator script in a standard way:

  • Method signature: generate(md: dict) -> str
  • Input: A single parameter md that contains the parsed JSON metadata.
  • Output: A single string that contains the generated code, which Analytixus writes to a file.
  • Constraint: Do not change the method name generate, the parameter contract, nor the idea of returning a string. Internals are up to you, as long as the method returns the expected string.

Project Structure (DBX-Demo-WWI)

We’ll use the following layout:

DBX-Demo-WWI (project-root)/
  Bronze/
      02_BronzeStoreWithPy.pyx      # contains the generate(md) function
  Jinja_templates/
      bronze_store_table.sql.jinja2
  Silver/
      ...

  • The generator lives in Bronze/02_BronzeStoreWithPy.pyx.
  • Templates live in Jinja_templates/ at the project root.
  • Silver (and other layers) will follow similar generation patterns later.

Example Metadata (from Analytixus)

We’ll generate a Databricks SQL CREATE TABLE for the WideWorldImporters source table Application.Cities using this metadata sample:

{
  "StartIndex": "53",
  "BraceOpenIndex": "188",
  "BraceCloseIndex": "963",
  "Id": "Source.WWI.Application.Cities",
  "SchemaName": "Application",
  "ObjectName": "Cities",
  "TableName": "Application.Cities",
  "ObjectType": "Table",
  "CatalogName": "bsh-sqldb-WideWorldImporters",
  "SourceName": "WWI",
  "SourceType": "SqlDatabase",
  "Column": [
    {
      "StartIndex": "209",
      "EndIndex": "297",
      "Id": "Source.WWI.Application.Cities.CityID",
      "ColumnName": "CityID",
      "Ordinal": "1",
      "DataType": "INT",
      "IsNullable": "0",
      "IsPK": "1",
      "IsBK": "0",
      "IsKey": "1",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "304",
      "EndIndex": "397",
      "Id": "Source.WWI.Application.Cities.CityName",
      "ColumnName": "CityName",
      "Ordinal": "2",
      "DataType": "NVARCHAR",
      "Length": "50",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "ALPHANUMERIC",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "404",
      "EndIndex": "490",
      "Id": "Source.WWI.Application.Cities.StateProvinceID",
      "ColumnName": "StateProvinceID",
      "Ordinal": "3",
      "DataType": "INT",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "497",
      "EndIndex": "580",
      "Id": "Source.WWI.Application.Cities.Location",
      "ColumnName": "Location",
      "Ordinal": "4",
      "DataType": "GEOGRAPHY",
      "IsNullable": "1",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "GEOGRAPHY",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "587",
      "EndIndex": "681",
      "Id": "Source.WWI.Application.Cities.LatestRecordedPopulation",
      "ColumnName": "LatestRecordedPopulation",
      "Ordinal": "5",
      "DataType": "BIGINT",
      "IsNullable": "1",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "688",
      "EndIndex": "771",
      "Id": "Source.WWI.Application.Cities.LastEditedBy",
      "ColumnName": "LastEditedBy",
      "Ordinal": "6",
      "DataType": "INT",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "778",
      "EndIndex": "863",
      "Id": "Source.WWI.Application.Cities.ValidFrom",
      "ColumnName": "ValidFrom",
      "Ordinal": "7",
      "DataType": "DATETIME",
      "Length": "7",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "TIME",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "870",
      "EndIndex": "952",
      "Id": "Source.WWI.Application.Cities.ValidTo",
      "ColumnName": "ValidTo",
      "Ordinal": "8",
      "DataType": "DATETIME",
      "Length": "7",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "TIME",
      "IsTechnical": "0"
    }
  ]
}

Notes:

  • Many numeric-like fields are strings (e.g., "Ordinal": "1"). Normalize them in code.
  • Some types have auxiliary fields like Length; compose NVARCHAR(50) before mapping to a Databricks-compatible type.

The Jinja2 Template

We render a Databricks SQL CREATE TABLE for the bronze “Store” layer. Technical columns appear first, then business columns from metadata, followed by a PK constraint.

CREATE TABLE IF NOT EXISTS `{{ schema }}`.`Store_{{ table_name }}`
(
    -- Technical Fields
{%- for col in technical_columns %}
    {% if loop.first %}{% else %},{% endif %}`{{ col.name }}` {{ col.data_type }}{% if not col.nullable %} NOT NULL{% endif %}{% if col.comment %} /*{{ col.comment }}*/{% endif %}
{%- endfor %}
    -- Custom Fields
{%- for col in business_columns %}
    , `{{ col.name }}` {{ col.data_type }}{% if not col.nullable %} NOT NULL{% endif %}{% if col.is_business_key %} /*IsBK*/{% elif col.comment %} /*{{ col.comment }}*/{% endif %}
{%- endfor %}
    -- PK-CONSTRAINTs
    ,CONSTRAINT `PK_{{ catalog }}_{{ schema }}_Store_{{ table_name }}` PRIMARY KEY ({% for pk in primary_key_columns %}{% if not loop.first %}, {% endif %}`{{ pk }}`{% endfor %})
);

Template tips:

  • Use loop.first to avoid trailing commas.
  • Keep comments to aid auditing (/*IsBK*/ etc.).
  • Adopt consistent naming conventions (e.g., Store_{{ table_name }}).

The generate(md) Function in Bronze/02_BronzeStoreWithPy.pyx

Below is a compact, end-to-end example that:

  • Normalizes incoming column metadata,
  • Maps SQL Server/generic types to Databricks-compatible types,
  • Loads the Jinja2 template from ../Jinja_templates,
  • Returns the SQL string.

This is an illustrative example. In real projects, modularize and extract reusable helpers.

from jinja2 import Environment, FileSystemLoader

def generate(md: dict) -> str:
    """Receives metadata, generates code and returns it as string."""
    # Prepare technical columns
    technical_columns = [
        {"name": "DWH_DEF_DATE", "data_type": "TIMESTAMP", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_LOAD_ID", "data_type": "INT", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_RECORD_ID", "data_type": "BIGINT GENERATED ALWAYS AS IDENTITY", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_SOURCE_ID", "data_type": "INT", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_SOURCETABLE", "data_type": "STRING", "nullable": True, "comment": "IsTechnical"}
    ]

    # Type mapping
    DATABRICKS_TYPE_MAPPING = { ... }  # as above
    def normalize_type_for_databricks(data_type: str) -> str: ...  # as above

    def _to_column(input_col: dict) -> dict:
        """Convert a column dictionary to a column definition dictionary."""
        return {
            "name": input_col["ColumnName"],
            "data_type": normalize_type_for_databricks(input_col["DataType"]),
            "nullable": bool(input_col["IsNullable"]),
            "is_business_key": bool(input_col["IsBK"]),
            "Ordinal": int(input_col["Ordinal"]),
        }

    business_columns = sorted([_to_column(in_col) for in_col in md["Column"]], key=lambda x: x["Ordinal"])

    # NOW GENERATE!
    env = Environment(loader=FileSystemLoader(r"C:\Temp\ProjectData\DBX-Demo-WWI\jinja_templates"))
    template = env.get_template("bronze_store_table.sql.jinja2")

    return template.render(
        catalog=md["CatalogName"],
        schema=md["SchemaName"],
        table_name=md["ObjectName"],
        technical_columns=technical_columns,
        business_columns=business_columns,
        primary_key_columns=["DWH_RECORD_ID"]
    )

Key points:

  • Uses relative path resolution to keep the build portable across environments.
  • Composes raw types (e.g., NVARCHAR(50)) before mapping to Databricks-compatible types.
  • Defaults complex types like GEOGRAPHY and XML to STRING for bronze ingestion; adjust for curated layers.

Example Output (from the provided metadata)

With the metadata above, the template produces the following Databricks SQL:

CREATE TABLE IF NOT EXISTS `Application`.`Store_Cities`
(
	-- Technical Fields
	`DWH_DEF_DATE` TIMESTAMP NOT NULL /*IsTechnical*/
	,`DWH_LOAD_ID` INT NOT NULL /*IsTechnical*/
	,`DWH_RECORD_ID` BIGINT GENERATED ALWAYS AS IDENTITY NOT NULL /*IsTechnical*/
	,`DWH_SOURCE_ID` INT NOT NULL /*IsTechnical*/
	,`DWH_SOURCETABLE` STRING /*IsTechnical*/
	-- Custom Fields
	, `CityID` None /*IsBK*/
	, `CityName` None /*IsBK*/
	, `StateProvinceID` None /*IsBK*/
	, `Location` None /*IsBK*/
	, `LatestRecordedPopulation` None /*IsBK*/
	, `LastEditedBy` None /*IsBK*/
	, `ValidFrom` None /*IsBK*/
	, `ValidTo` None /*IsBK*/
	-- PK-CONSTRAINTs
	,CONSTRAINT `PK_bsh-sqldb-WideWorldImporters_Application_Store_Cities` PRIMARY KEY (`DWH_RECORD_ID`)
);

Operational Flow in Analytixus

  • The build process selects metadata objects and calls generate(md) for each.
  • Your function returns a string; Analytixus writes each result to its destination file (e.g., based on object name and schema).
  • Determinism matters for reviews and CI: sort columns by Ordinal, not by name.

Testing and Validation

  • Add unit tests with representative metadata samples to verify:
    • Type normalization (e.g., NVARCHAR(50)STRING).
    • Precision/scale preservation for DECIMAL(38,18) (when present).
    • Defaults for complex types (GEOGRAPHY, XML) — choose strict vs. lenient behavior per layer.
  • Validate required fields early (e.g., SchemaName, ObjectName, Column presence). Fail fast on missing essentials to avoid silent mis-generation.

Extending Beyond CREATE TABLE

This pattern generalizes easily:

  • Generate views (silver/gold), MERGE statements, CDC logic, orchestration/job configs, and more.
  • Emit multiple files per object (e.g., DDL + DML + pipeline configuration).
  • The CREATE TABLE example is intentionally simple to demonstrate the approach. In real projects, modularize helpers, factor configurations, and introduce environment-aware conventions.

Best Practices and Pitfalls

  • Prefer relative template paths to avoid hard-coded strings and ensure portability.
  • Keep comments and naming conventions consistent and documented (e.g., Store_ prefix).
  • Separate technical vs. business columns clearly; later, make technical columns configurable when modularizing.
  • Make unknown types a conscious decision: warn or fail. Bronze layers often accept strings; curated layers should be stricter.
  • Preserve precision/scale when mapping numeric types.
  • Ensure deterministic output for clean diffs and reliable CI/CD.

Conclusion

Python + Jinja2 provides a clean, scalable way to transform Analytixus metadata into consistent, reviewable code. With a stable generate(md) -> str contract, robust type normalization, and deterministic templates, you can evolve from simple DDL generation to complete, metadata-driven pipelines across bronze, silver, and gold layers. This article’s code is a minimal example to illustrate the pattern; in production, modularize utilities, configurations, and tests to keep the system maintainable and flexible.

Lean Data Vault: A Pragmatic Approach to Data Modeling in the Lakehouse

Every data and analytics project (data lakehouse) needs a framework with best practices. This one is definitely worth reading if you’re looking for a data modeling approach that doesn’t overcomplicate things, focuses on data quality, and supports massive parallel processing.

Ping back: https://andyloewen.de/2025/09/17/lean-data-vault-a-pragmatic-approach-to-data-modeling-in-the-lakehouse/