Data processing in enterprises is complex, and continues to grow in complexity. If you need to manage data scientist or engineer productivity, timeliness of data delivery, data discovery, information correctness, or critical path reliability, CompilerWorks Lineage will solve these fundamental data processing challenges.

CompilerWorks Lineage analyzes code – it never touches data. It builds a lineage fabric (unified model) at the database column level; on multiple data repositories; across the entire enterprise. The lineage fabric reveals insights into data infrastructure that increase data processing, data engineering and data science productivity.

See it in action

DEMO VIDEO – Lineage – GUI

DATA ENGINEERING

Timeliness of Delivery

Meeting and beating SLAs

Data engineering departments are inevitably judged on their ability to meet (or beat) SLAs. Data engineering is held accountable for SLAs but lacks control of the load users are putting on the data infrastructure. CompilerWorks Lineage reveals issues that impact SLAs and highlights the control points to meet (and beat) SLAs.

To optimize SLAs there are several approaches:

  • discover and optimize the critical paths to high-value datasets;
  • ensure source datasets are always up to date;
  • remove redundant and unnecessary processing from the critical path;
  • discover duplication of effort and opportunities to share costs.

The lineage fabric directly addresses these issues. It offers performance optimizations beyond the capabilities of the database. The database does an excellent job of making individual queries execute efficiently; the lineage fabric can be used to optimize a series of queries (or even pipelines) to increase processing speed and reduce data weight.Add block

DATA SCIENCE

Data and Product Discovery

Deriving increased value from existing data inventory

Where does an analyst or data scientist start when constructing a new analysis or pipeline? If data engineers document their code and table metadata then a simple search will suffice. Unfortunately, maintaining correct documentation is time consuming and expensive; most data engineering organizations fail to meet this ideal.

CompilerWorks Lineage tracks data from its source through all downstream processes (and across multiple data repositories.) The graphs and reports it produces direct data scientists to source data and pipeline processes that already exist. Data analysts and engineers can quickly find source data, identify existing data processing pipelines and leverage existing data processing and activities rather than rebuilding datasets organically.

CompilerWorks Lineage enables analyst teams to create new business value, and provides an immediate, continuously-updated inventory of data assets and consumption.

Why automate?

Keep the lineage fabric and the insights it delivers up-to-date as the data infrastructure evolves. More accurate and much faster than a human.

Enterprise Wide

Builds column level lineage
across multiple data systems.

Error Tolerance

Tolerant of noisy, incomplete, and incorrect data, and will make intelligent deductions.

Absolute Semantic Accuracy

Column level accuracy down to the individual opcode, expression, and join

Information Correctness

Measuring and building trust in datasets

An enterprise’s data warehouse is the foundation of its planning and decision-making. Data errors can be very costly. CompilerWorks Lineage identifies all dataset and process dependencies. Imagine being able to immediately notify all users dependent on a particular dataset when there is an error landing a particular table/column.

CompilerWorks Lineage exposes all upstream and downstream dependencies. Once a data issue is corrected users can automatically propagate the information to ALL downstream pipelines, datasets and business users dependent on that information.

GDPR , PII, and Infosec Compliance

Mark an individual column as “secure data,” and the lineage fabric will propagate that tag throughout the entire data infrastructure. It takes into account whether the data is an exact copy, a partial copy, or only a portion of the data ‘leaks’ (e.g. the MAX function will leak one data point), and whether the secure data influences a final data point. The propagation cross the boundary between multiple data repositories and als accounts for data exports.

The result is that a robust audit of GDPR requirements is instantly available. The impact of an individual instance of secure data can be cleansed from the organization as required to satisfy legal and operational criteria.

Warehouse Cost Control

Coping with organic growth and eliminating resource waste

CompilerWorks Lineage tracks the usage and consumption of all data sets and propagates this information to generate “unused” and “eventually-unused” annotations at column level, keeping the data warehouse free of useless data and processing.

The reduction in data weight either reduces flexible cloud data warehouse costs or the need for additional technology capex and opex. Reduction in data weight helps avoid development freezes when warehouse capacity is reached, and postpones the need for migration to more expensive resources.

Critical Path Reliability

Monitoring the reliability of data delivery infrastructure

If a pipeline in a critical path fails, or worse, is executed ad hoc or unreliably, all datasets downstream of the failure or potential failure may be affected. CompilerWorks computes all effects at column granularity so that issues can be reported to customers in a timely manner. CompilerWorks Lineage identifies:

  • whether all processes on the path to a critical dataset are reliable;
  • whether all pipelines are under proper organizational management;
  • whether a dataset may be untrustworthy because a preceding process failed.