Our technology has two core applications — Transpiler and Lineage

The result is lower cost, increased performance, and streamlined management for all data processing.

CompilerWorks core technology performs two, easy to state steps:

01


Compile SQL code

Compile SQL code (and associated procedural language) into a computer algebra representation (mathematical model). CompilerWorks does not touch actual data.

02


Emit the algebraic representation into either:

  • A lineage fabric: graph of relationships between all elements in the model.
  • SQL Code (and a procedural language when required.

Core Technology

CompilerWorks’ core technology comprises of a suite of bespoke compilers, a common algebraic representation, and emitters for both a lineage fabric and specific SQL dialects.

SQL code is the fundamental input in the process. The compilers convert the SQL code to an algebraic representation (AR) which preserves a great deal of metadata; for each individual instruction, the AR captures origin, location, pipeline, responsible users and additional user‑specified metadata. This metadata is available to the emitters for both the lineage fabric and in the emitted SQL code.

There are two classes of emitters, one produces the lineage fabric – the foundation of the lineage solution, and the other is a set of emitters for each SQL dialect supported.


CompilerWorks Builds Real Compilers

Parsing isn’t enough. Semantics are required.

There are many technology implementations that can parse and emit simple SQL. Only CompilerWorks can:

  • Typecheck it, identify whether a particular /-sign means integer division, floating point division, or division of an interval by a numeric.
  • Compile this information into a full computer algebra, and solve for the most efficient (and human-friendly) way to express the same instruction in a target dialect that has different operators, functions and semantics.

The bottom-line: CompilerWorks is aware of, and cares about semantics, not just syntax.

But I’ve heard so much about parsing!

Parsing is easy to talk about – it’s largely a solved problem. Most importantly, it’s a tiny part of the job, and not sufficient to solve the problem of translating code. The result is CompilerWorks parses, typechecks, performs operator and function selection, and compiles all languages into the core computer algebra representation.

Accuracy and Correctness

How accurate can CompilerWorks be?

Traditionally, a set of rules are followed to rewrite a query and then an execution test is performed to ensure the query will run on the target platform. This approach is prone to error. The transpiler is designed to produce the same answer on both the source and target systems.

The alternative is human driven conversion that can yield unpredictable results. The default mode in CompilerWorks is itself built from a rather interesting set of choices: under what circumstances should CompilerWorks preserve ultimate accuracy, and under what circumstances should CompilerWorks simplify? The computer-algebra engine has the ability to compute this, along with many other decisions designed to give customers the code they need: clean, maintainable, understandable code which gives the right answer for the intended use case.


Computer Algebra with Serious Power

What Questions Need to be Answered?

The computer algebra engine at the core of the CompilerWorks product suite is incredibly powerful and drives both the transpiler and the analytics products.

The computer algebra engine is central to the ability to answer complex questions about the enterprise’s data environment, perform analyses, or emulate features unsupported in a given target dialect. It is powerful enough to apply business logic rules, structural transformations, and even write and install custom plugins for special purpose tasks such as generating sampling queries.

This fine grained power enables CompilerWorks to answer almost limitless questions about the enterprise’s data infrastructure. Users can identify why two departments’ sales figures differ, by comparing the entire path from source to result as composed functions, or to find opportunities to reduce workload by sharing identical computations. A query can be posed to the engine, and the core technology will expose the results (either through an API or in a rich user interface.)


Language Cross-Compatibility

Migrate to a platform which doesn’t support PL

CompilerWorks’ transpiler has no pair-wise dependencies; support for a particular language enables transpilation from that language to any other supported language.

CompilerWorks’ computer algebra represents the superset, not the common subset, of all possible semantics. When CompilerWorks emit to a target dialect, the completeness of the result is constrained only by the expressiveness of the target language. If there is a way to express the required semantics within the target dialect then our algebra engine will do it.

A CompilerWorks target environment is not a single language, it is a stack of languages. For instance, a Teradata source environment may be shell scripts, which invoke BTEQ , which invokes Teradata SQL and SPL. A suitable target environment might be a Python control script which executes BigQuery , with JavaScript for custom UDFs. The computer algebra engine makes no account of which language in the source-language stack generated a particular algebraic term; expressing the desired program in the target language stack is a separate, standalone challenge. A dialect which does not support WITH RECURSIVE may use a Python or JavaScript loop outside the SQL to express the required semantics; while the transpiler may push BTEQ control structures down into JavaScript .

The computer algebra engine automatically configures itself with plugins based on the language-stacks given, and will perform these transformations with no user intervention. All users have to state is the objective: “Move this code from X+Y+Z to A+B+C,” and the transpiler will move it, or explain what is has achieved and what (and why) some things can’t be done.

CompilerWorks transpiles to unavailable functionality in the target SQL dialect:

How does CompilerWorks handle EXEC IMMEDIATE from PL/SQL? Contact Us if you really want to know.


Usability, Performance, and Deliverability

CompilerWorks products are fast. What does that mean?

CompilerWorks’ core technology reduces the time from decision to result. Customers pose a challenge, a question or make a decision: CompilerWorks will deliver the analysis, the answers, the cost reduction, the proof of compliance, or the new code fully up and running on the target platform FAST.

CompilerWorks’ core technology changes the game: CompilerWorks understands the entire code base, and will guide users through the process of analysis or migration, requiring a minimum of human or institutional knowledge and effort.

CompilerWorks will run on the enterprise’s code and log files with no minimum requirement for completeness or correctness. The generated output report will contain line, column, chapter and verse on what the compiler found and how best to act upon it. The product will guide users, ensuring that there are complete inputs, validating data, translations, and testing analyses so the process of migrating an entire code base between platforms can be handled (almost) mechanically and requires no specialist skills or understanding of the code base.

All CompilerWorks products are error tolerant, and will ingest and make sense of incomplete, outdated, erroneous and broken code; if a human can make sense of the code then our compilers will make sense of it – and it will infer the consequences, omissions, deductions, intuitions and corrections to be included in the automatically generated transpile report. The majority of manual labor is completed by our compilers. A human is directed where to intervene only when necessary.

CompilerWorks’ technical performance is excellent: it starts with a custom LR(k) parser front-end, a low-allocation middle-end and ends with a high-performance, custom, computer algebra core. CompilerWorks transpilers will fit on a laptop or container with only 4Gb of RAM, but scale efficiently to saturate all CPUs on a 24-core server for ultimate scalability.

  • A laptop will do a multi-thousand script code migration in under a minute.
  • A server-grade system can analyze and process code at over 250,000 statements a second, delivering the ability to maintain an updated lineage model for an entire internet-scale organization.

Why hunt for five or ten complex statements for a Proof-of-Concept when CompilerWorks will translate an entire code base? CompilerWorks will automatically identify the most complex or incompatible statements, and translate them as accurately as possible, all in under a minute! CompilerWorks changes the migration process: it dramatically reduces the risk and time (and increases the predictability) of migration projects.

CompilerWorks’ Supporting Infrastructure

The complexity of data processing in the enterprise means that it an enterprise ready solution require more than revolutionary technology.

Supporting infrastructure required for a robust enterprise solution built around CompilerWorks’ core technology includes:

01


Handling SQL from multiple sources, including SQL generated by other systems.

02


Standardized capture of data repository metadata.

03


Flexible configuration of the core technology

04


All style of user interface to the core technology .

Infrastructure Schematic

Turning technology into solutions

SQL code does not live in isolation in the enterprise. It is inevitably wrapped in another language, whether that be simply scripting, business intelligence tools or ETL/data integration tools.

CompilerWorks’ infrastructure is incredibly flexible and is configurable to support:

  • ingesting SQL from whatever ecosystem it is stored within;
  • [optional] applying transformations within the algebraic engine;
  • producing wrapped SQL for use with the target system;
  • exposing a full complement of user interfaces to both the lineage and transpiler capabilities.

Ecosystem Integration

What about Microstrategy Tableau Business Objects , and so forth?

Ingest is the process of extracting SQL from the encapsulating code or file format, be it python, Java, shell, XML, JSON, YAML, query log files, or vendor-specific tool formats. In some cases this requires customer-specific integration work as there is no robust standard for wrapping SQL for execution in large enterprises. CompilerWorks have a set of standard engines, parsers, and heuristics which handle most ingests without issue.

e.g. For custom SQL embedded in business intelligence tools (BI tools). CompilerWorks’ will ingest custom SQL from a wide range of BI tool file formats, transpile it, and re-insert the converted code into the source BI tool’s file format.

If the SQL is embedded in Python scripts that use macros to build the executable SQL then in most cases the transpiler can be configured to extract, transpile, and replace the SQL in place, in the Python code.


User Interface

How can a user extract the value in the core technology?

CompilerWorks exposes the power and flexibility of its core technology in every conceivable manner. The underlying philosophy is to enable data engineering and data analysts to create value in whatever way suits them.

GUI

Used by analysts and for “quick hit” data engineer tasks, e.g.:

  • Data discovery and exploration activities.
  • Identification of end-users.
  • Activity audits (and source code) for specific tables.
  • Transpile of a limited number of SQL statements.

Command Line Interface (CLI)

CompilerWorks’ has worked with many data engineering departments where the preferred mode of interaction is through a CLI.

Beyond this preference it is clear that some tasks, like bulk conversion of an entire SQL code base is amenable to a CLI.

Application Programming Interface (API)

  • Use GraphQL to integrate the data fabric with in‑house data warehouse management processes.
  • Automatically re-writing queries to work with sampled data (increase productivity during development.)
  • Restructuring a data warehouse schema for domain driven design.

Compilerworks’ customers continue to create new use cases for the API.


“CompilerWorks lineage fabric helps control our data infrastructure and improve the productivity of the 3,000+ data engineers/analysts who use it.”

Director of Data Engineering | MONOPOLY COMPANY


Automatically analyze, convert, and optimize enterprise data processing code

Learn more about how our two core applications can change how you migrate and maintain enterprise-wide data processing.