Enriching Metadata from Code Comments in Analytixus
In modern Data & Analytics projects, metadata is the backbone of automation and governance. Most of the time, we can derive structural metadata directly from the code — for example, by parsing SQL CREATE TABLE statements into XML-based models.
But sometimes, SQL alone isn’t enough. There are attributes we want to capture — like IsBusinessKey, IsTechnical, or History Flags — that cannot be expressed syntactically in the DDL.
This is where code comments come into play. In Analytixus, we use comments as metadata carriers. During the build process, they are extracted, transformed, and converted into actionable metadata.
From Code to Metadata
Here’s an example generated with an XSLT template:
CREATE OR REPLACE TABLE `WWI`.`Store_Countries`
(
-- Technical Fields
`DWH_DEF_DATE` TIMESTAMP NOT NULL /*IsTechnical*/
,`DWH_LOAD_ID` INT NOT NULL /*IsTechnical*/
,`DWH_RECORD_ID` BIGINT GENERATED ALWAYS AS IDENTITY NOT NULL /*IsTechnical*/
,`DWH_SOURCE_ID` INT NOT NULL /*IsTechnical*/
,`DWH_SOURCETABLE` STRING /*IsTechnical*/
-- Custom Fields
, `CountryID` INT NOT NULL
, `CountryName` STRING NOT NULL
, `FormalName` STRING NOT NULL
);

Notice the inline comment /*IsTechnical*/.
In the resulting XML metadata, this becomes:
<Column ColumnName="DWH_RECORD_ID"
DataType="BIGINT"
IsNullable="0"
IsTechnical="1"
IsPK="1"
IsKey="1" />

Why Comments Matter
Some attributes simply cannot exist in pure SQL syntax:
- Business Keys (
IsBK) - Technical Fields (
IsTechnical) - History Flags (
IsHist) - Origin References (linking to Bronze/Silver layers)
Embedding these semantics as inline comments makes them machine-readable.
Example:
CREATE TABLE [Silver_Store].[Application_Cities_Key]
/*IsHist:0; Name:Application_Cities; OriginId:Bronze.Bronze_Access.Application_Cities; HistObjectId:Silver.Silver_Store.Application_Cities_Hist1*/
(
[DWH_Application_CitiesId] INT IDENTITY(2, 1) NOT NULL /*IsTechnical; IsPK*/
,[CityID] INT /*IsPK:0; IsBK:1*/
)
From Unsupervised to Supervised Metadata
During the build process, Analytixus first generates unsupervised metadata:
<Object Id="Silver.Silver_Store.Application_Cities_Key"
ObjectType="Table"
IsHist="0"
Name="Application_Cities"
OriginId="Bronze.Bronze_Access.Application_Cities"
HistObjectId="Silver.Silver_Store.Application_Cities_Hist1">
<Column ColumnName="CityID" DataType="INT" IsBK="1" />
...
<Origins>
<Object Id="Bronze.Bronze_Access.Application_Cities" SchemaName="Bronze_Access" ObjectName="Application_Cities" ViewName="Bronze_Access.Application_Cities" ObjectType="View">
<Column Id="Bronze.Bronze_Access.Application_Cities.DWH_SOURCE_ID" ColumnName="DWH_SOURCE_ID" Ordinal="1" DataType="INT" IsNullable="0" IsPK="0" IsBK="0" IsKey="0" BasisDataType="INTEGER" OriginAliasName="Stage" OriginColumnName="DWH_SOURCE_ID" OriginId="Bronze.Bronze_Store.Application_Cities.DWH_SOURCE_ID" IsTechnical="1" />
...
</Object>
</Origins>
<Hist>
<Object Id="Silver.Silver_Store.Application_Cities_Hist1" SchemaName="Silver_Store" ObjectName="Application_Cities_Hist1" TableName="Silver_Store.Application_Cities_Hist1" ObjectType="Table" IsHist="1" Name="Application_Cities" OriginId="Bronze.Bronze_Access.Application_Cities" KeyObjectId="Silver.Silver_Store.Application_Cities_Key">
<Column Id="Silver.Silver_Store.Application_Cities_Hist1.DWH_Application_CitiesId" ColumnName="DWH_Application_CitiesId" Ordinal="1" DataType="INT" IsNullable="1" IsPK="0" IsBK="0" IsKey="0" BasisDataType="INTEGER" IsTechnical="1" />
...
</Object>
</Hist>
</Object>
Then, attributes like HistObjectId and OriginId are resolved, and the metadata is enriched into supervised metadata:
- History objects are linked.
- Origins are connected to their Bronze counterparts.
- Relationships and constraints are automatically added.
This metadata can now be consumed by further XSLT templates and DNA-ML models.
Benefits of Using Comments as Metadata
- ✅ Flexibility: Developers influence metadata directly in the code.
- ✅ Consistency: Attributes are extracted systematically.
- ✅ Traceability: Origins and history objects provide lineage across layers.
- ✅ Automation: Downstream templates can act on enriched metadata.
Conclusion
By using code comments as metadata carriers, we bridge the gap between SQL definitions and rich semantic models.
In Analytixus, comments are not just documentation — they are drivers of automation. They ensure that attributes like IsBK, IsTechnical, and IsHist become first-class citizens in the metadata layer.
This approach transforms unsupervised metadata into supervised metadata, enabling automation, lineage, and governance without slowing down development.
