• Transpiling


    Lift and Shift

    Between Data Repositories

    No code rewrites required to use your existing business rules and analytics with low cost infrastructure.

    Automatically convert code from high cost database and data warehouse infrastructure to low cost cloud and open source solutions.

  • Data Lineage


    Scan and Plan


    Enterprise Wide Data Processing

    Optimize data processing globally - not simply one query at a time.

    Automatically enhance at a global level and remove redundant and uneccessary processing.

  • Query Linting


    Screen and Clean


    Self-Service User Activity

    Intercept self-service user queries before they impact SLAs.

    Automatically optimize all query activity to improve performance and reduce cost.

Solutions


Lift and Shift (Code Transpiling)

Lift and shift data processing code from your current data processing platforms to a new more cost effective open source or cloud data processing platform.

Automatic Code Transpiling
(Offline)

CompilerWorks compiles from one dialect of SQL (and control language) and emits equivalent code in another.
  • Equivalent code produced for new platform
  • Automatically transform data structures for optimal execution
  • Clean, human readable code emitted
  • Automatically adheres to coding standards

Source Language

(SQL and Procedural)
 

Database Schema

Algebraic Representation

Compiler:
Transpilation from Source (proprietary) platform to equivalent code on target platform (open), including requested transformations.

 

Target Language

(open platform)

Platform Emulation
(Online)

Use CompilerWorks in front of a data processing repository - make it appear like the proprietary system it is emulating.
  • Looks like a proprietary platform to end users
  • Compiles code on the fly to execute on the new platform
  • Automatically persists compiled code to deliver performance

Source Language

(SQL and Procedural)

Platform Emulator

Compiler and Executor:
Transpilation from Source (proprietary) platform and execution/control of execution on target platform (open).

 

Target Platform


Scan and Plan (Global Lineage at column/attribute level)

Scan and plan data processing lineage across multiple data processing platforms in the entire enterprise.

CompilerWorks generates a unified view of data lineage from actual code executed on OLTP and OLAP systems. This unified view crosses pipeline and data repository boundaries and is not reliant on metadata.

At the enterprise level this means a clear understanding of how any data source affects a final report, even after many stages of processing and data transfer (e.g. how placing a web advertisement leads to fulfilling an online order).

This unified view of data processing directly addresses several application areas:
  • Maps data usage across the enterprise
  • Identifies and removes unused processing activity
  • Optimizes processing, making improvements not available to human analysts
  • Eliminates redundancy
  • Restructures analyses across multiple pipelines
Typically an analyst or developer can construct an individual query efficiently. The executing data repository itself should optimize execution of the query.

An experienced analyst/developer will consider the interactions of a sequence of queries in a data processing pipeline and structure the component queries to interact with each other efficiently. The database itself has no ‘vision’ to perform optimizations across a series of queries.

Without CompilerWorks there is little, if any, ablity to view/optimize data processing at the pipeline level, let alone across the enterprise.

With CompilerWorks analysts naturally build efficient analyses that integrate in the enterprise wide data processing landscape.
Enterprise wide lineage

Screen and Clean (Query Linting)

Screen and clean queries from self-service or ad hoc users before they negatively impact SLAs.

Data engineering is responsible for data processing SLAs. Data analysts can easily impact SLAs by submitting ill-formed queries.

The database engine itself is great at optimizing individual queries, but it is not good at identifying queries that should not be executed. Issues like cross joins on large tables, joins on certain reference keys, identifying poorly selected join-order and aggregation issues, et.c.

CompilerWorks' linting solution analyzes queries before execution to protect your infrastructure from rogue queries:
  • Compiles the query into an algebraic representation
  • Applies optimizing transformations
  • (optionally) Restructure query based on data already identified in the lineage model
  • Emits the transformed query, a quality score and notes on how to improve the query

Queries are submitted for linting prior to execution. Two basic approaches are available:
  • In-line linting before all queries are executed
  • Off-line process that qualifies queries before execution

Without CompilerWorks ad hoc queries can easily impact SLAs for all users.

With CompilerWorks data engineering can control the impact of ad hoc users and scientifically protect SLAs.

Technology

CompilerWorks has developed a robust infrastructure that:
  • Ingests data analysis/processing code, extracting the core SQL.
  • Compiles that dialect of SQL (and associated procedural language) into an Algebraic Representation (AR) stored in a proprietary virtual machine code.
  • Applies transformations to the virtual machine code in the AR.
  • Emits either
    • lineage model for the entire code base, or
    • equivalent code (in a different dialect of) SQL and a procedural language if required.

Our compilers are robust. They make multiple passes to synthesize an execution environment based on the implications of the DDL/DML, reconciling this with the available metadata, and carrying the results throughout the compilation flow.

Once the code is in the AR, transformations and optimizations are systematically applied to the entire code base. Rather than being limited to the scope of code a developer or architect can review, our compilers handle code enterprise wide, across multiple data repositories.

Source Code

SQL encapsulated in many formats:

  • Python, Java, shell, XML, JSON, YAML, query log files, or vendor-specific tool formats (e.g. Tableau workbooks)
Ingest

Query(s) in a growing number of languages and dialects.

including associated procedural or execution language(s).

Compile

Algebraic Representation

Transformations


  • Correct feature translation between databases
  • User-specified business rules, e.g. partitioning, normalization, denormalization, renaming tables, et.c.

Optimizations


  • General optimizations, e.g. introduction of semijoins or hoisting GROUP BY above JOIN
  • Dialect-specific optimizations, e.g. rearrangement of CAST, CASE, or CONCAT based on target dialect
Emit

Compiler Output

Code (or executable)


Target platforms include

  • Cloud Platforms: Snowflake, AWS Redshift and Aurora, Google BigQuery
  • Open Platforms: Presto, Hive, Spark, Postgres, MySQL, Splice Machine, et.c.
  • Proprietary Platforms: Oracle, Teradata, Netezza, Vertica, et.c.
  • Procedural execution languages: Python, Java, BTEQ, et.c.

Lineage Model


Incorporates all data provenance and lineage, enabling a broad range of static code analyses.

  • Unused tables/attributes
  • Replicated processing/data
  • Inefficient processing pipelines

Clients say about us

  • “CompilerWorks reduces the time to migrate CODE by an order of magnitude from legacy data systems to our cloud data warehouse.”
    VP Customer and Product Strategy, Cloud Datawarehouse Provider
  • With CompilerWorks we have full visibilty and can take control of how data flows and is processed through our infrastructure.
    Data Director, Transportation Platform Company
  • “Migrating from one datawarehouse to another was a science with CompilerWorks. We actually ran the two datawarehouses along side eachother in real-time to verify the conversion.”
    Director Data Engineering, Consumer Delivery Company
  • “We want Oracle users to be able to execute their queries on our high scalable, ACID compliant infrastructure. CompilerWorks has the knowledge and technology to make that a reality.”
    VP Products and Operations, Cloud Data Repository Provider

Demonstration

Lineage GUI (Scan and Plan)


Transpiling (Lift and Shift)

CompilerWorks output code is easily maintainable by humans, even to the point of being able to selectively discard source language semantics in order to generate 'cleaner' human-maintainable code in the target language.

Below is an example of Vertica SQL converted to Presto (executing on Hadoop.)

Note: using CompilerWorks to optimize the query on Vertica, execution time was reduced by a factor of 3, and data volumes supported were increased by a factor of 8.

Vertica SQL

In this particular case (edited for anonymization) several senior developers at the client had spent several man months trying to modify the Vertica query to execute on Presto - they failed!

Presto

CompilerWorks' compiler not only rose to the challenge but could handle more data faster. On Presto performance was 4x (with practically no data volume limit) and on Vertica the optimized queries could handle 4x more data.
SELECT max(f.customer_id) customer_id ,f.ds date ,CAST(date_trunc('WEEK', CAST(f.ds AS DATE)) AS VARCHAR) week ,CAST(date_trunc('MONTH', CAST(f.ds AS DATE)) AS VARCHAR) month ,table5.field1 field1 ,table5.field2 field2 ,f.field3 field3 ,table2.field4 field4 ,table3.field5 field5 ,table4.field6 field6 ,f.field7 field7 ,f.field8 field8 ,table7.field8_description field8_description ,f.field9 field9 ,table6.field10 field10 ,table6.field11 field11 ,table9.field12 field12 ,table9.field13 field13 ,table9.field14 field14 ,table9.field15 field15 ,table1.field16 field16 ... (same for table1.field17 through table1.field21) ... ,table1.field22 field22 ,f.field23 field23 ,f.field24 field24 ,table11.field24_desc field24_desc ,f.field25 field25 ,CAST(coalesce(f.field26, 0) AS VARCHAR) || CAST(coalesce(f.field27, 0) AS VARCHAR) || CAST(coalesce(f.field28, 0) AS VARCHAR) ,f.field29 field29 ,table8.field30 field30 ,table12.field31 field31 ,table12.field32 field32 ,f.field33 etl_field33 ,CAST(date_trunc('QUARTER', CAST(f.ds AS DATE)) AS VARCHAR) quarter ,sum(f.field34) field34 ,sum(f.field35) field35 ,sum(f.field36) / 100 field36a ,sum(f.field37) / 100 field37a ,sum(f.field38) field38 ... (same for f.field39 through f.field85) ... ,sum(f.field86) field86 ,sum(coalesce(f.field36, 0) * coalesce(f.field87, 1) * coalesce(table10.field88, 1)) / 100 field89 ,sum(coalesce(f.field37, 0) * coalesce(f.field87, 1) * coalesce(table10.field88, 1)) / 100 field90 FROM main_table f LEFT OUTER JOIN table1_<DATEID_NODASH> table1 ON f.field33 = table1.field33 AND CAST(f.ds AS DATE) BETWEEN CAST(table1.effective_start_date AS DATE) AND CAST(table1.effective_end_date AS DATE) LEFT OUTER JOIN table2 table2 ON f.id_table2 = table2.id_table2 AND table2.ds = '<DATEID>' LEFT OUTER JOIN table3 table3 ON f.id_table3 = table3.id_table3 AND table3.ds = '<DATEID>' LEFT OUTER JOIN table4 table4 ON f.id_table4 = table4.id_table4 AND table4.ds = '<DATEID>' LEFT OUTER JOIN table5 table5 ON f.table5_id = table5.key_id AND table5.ds = '<DATEID>' LEFT OUTER JOIN table6 table6 ON f.field9 = table6.field10_abbr AND table6.ds = '<DATEID>' LEFT OUTER JOIN table7 table7 ON f.id_table7 = table7.id_table7 AND table7.ds = '<DATEID>' LEFT OUTER JOIN table8_<DATEID_NODASH> table8 ON f.field29 = table8.id LEFT OUTER JOIN table9 table9 ON f.id_table9 = table9.id_table9 AND table9.ds = '<DATEID>' LEFT OUTER JOIN table10_<DATEID_NODASH> table10 ON f.field91 = table10.field92 AND CAST(table10.quarter_id AS DATE) = date_trunc('QUARTER', current_date) LEFT OUTER JOIN table11 table11 ON f.field24 = table11.field24 AND table11.ds = '<DATEID>' LEFT OUTER JOIN table12_<DATEID_NODASH> table12 ON f.field31 = table12.field31 WHERE CAST(f.ds AS DATE) BETWEEN date_add('DAY', ceiling(273 / 9) * 1 - 1, CAST('<START_DATE>' AS DATE)) AND least(date_add('DAY', ceiling(273 / 9) * 1, CAST('' AS DATE)) - interval '1' day, CAST(' <DATEID>' AS DATE)) AND f.ds = '<DATEID>' GROUP BY 2,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37
  • Returns 91 fields (53 aggregates).
  • Derived from 13 joined tables.
  • 'GROUPED BY' 35 fields
SELECT i.customer_id customer_id ,i.i0_ds date ,CAST(date_trunc('WEEK', CAST(i.i0_ds AS DATE)) AS VARCHAR) week ,CAST(date_trunc('MONTH', CAST(i.i0_ds AS DATE)) AS VARCHAR) month ,i.i1_field1 field1 ... (same for field1 through field7) ... ,i.i8_field8 field8 ,i.i9_field8_description field8_description ,i.i10_field9 field9 ,table6.field10 field10 ,table6.field11 field11 ,i.i11_field12 field12 ,i.i12_field13 field13 ,i.i13_field14 field14 ,i.i14_field15 field15 ,table1.field16 field16 ,table1.field17 field17 ,table1.field18 field18 ,table1.field19 field19 ,table1.field20 field20 ,table1.field21 field21 ,table1.field22 field22 ,i.i15_field23 field23 ,i.i16_field24 field24 ,table11.field24_desc field24_desc ,i.i17_field25 field25 ,i.i18_field28 field28 ,i.i19_field29 field29 ,table8.field30 field30 ,i.i20_field31 field31 ,i.i21_field32 field32 ,i.i22_field33 etl_field33 ,CAST(date_trunc('QUARTER', CAST(i.i0_ds AS DATE)) AS VARCHAR) quarter ,i.i24_field34 field34 ,i.i25_field35 field35 ,i.i26_field36a field36a ,i.i27_field37a field37a ... (same for field38 through field187) ... ,i.i78_field90 field90 FROM (SELECT f.ds i0_ds ,table5.field1 i1_field1 ,table5.field2 i2_field2 ,f.field3 i3_field3 ,table2.field4 i4_field4 ,table3.field5 i5_field5 ,table4.field6 i6_field6 ,f.field7 i7_field7 ,f.field8 i8_field8 ,table7.field8_description i9_field8_description ,f.field9 i10_field9 ,table9.field12 i11_field12 ,table9.field13 i12_field13 ,table9.field14 i13_field14 ,table9.field15 i14_field15 ,f.field23 i15_field23 ,f.field24 i16_field24 ,f.field25 i17_field25 ,concat(coalesce(CAST(f.field26 AS VARCHAR), '0'), coalesce(CAST(f.field27 AS VARCHAR), '0') i18_field28 ,f.field29 i19_field29 ,table12.field31 i20_field31 ,table12.field32 i21_field32 ,f.field33 i22_field33 ,max(f.customer_id) customer_id ,sum(f.field34) i24_field34 ,sum(f.field35) i25_field35 ,sum(f.field36) / 100 i26_field36a ,sum(f.field37) / 100 i27_field37a ,sum(f.field38) i28_field38 ... (same for f.field39 through f.field85) ... ,sum(f.field86) i76_field86 ,sum(coalesce(f.field36, 0) * coalesce(f.field87, 1) * coalesce(table10.field88, 1)) / 100 i77_field89 ,sum(coalesce(f.field37, 0) * coalesce(f.field87, 1) * coalesce(table10.field88, 1)) / 100 i78_field90 FROM main_table f LEFT OUTER JOIN table2 table2 ON f.id_table2 = table2.id_table2 AND table2.ds = '<DATEID>' LEFT OUTER JOIN table3 table3 ON f.id_table3 = table3.id_table3 AND table3.ds = '<DATEID>' LEFT OUTER JOIN table4 table4 ON f.id_table4 = table4.id_table4 AND table4.ds = '<DATEID>' LEFT OUTER JOIN table5 table5 ON f.table5_id = table5.key_id AND table5.ds = '<DATEID>' LEFT OUTER JOIN table7 table7 ON f.id_table7 = table7.id_table7 AND table7.ds = '<DATEID>' LEFT OUTER JOIN table9 table9 ON f.id_table9 = table9.id_table9 AND table9.ds = '<DATEID>' LEFT OUTER JOIN table10_<DATEID_NODASH> table10 ON f.field91 = table10.field92 AND CAST(table10.quarter_id AS DATE) = date_trunc('QUARTER', current_date) LEFT OUTER JOIN table12_<DATEID_NODASH> table12 ON f.field31 = table12.field31 WHERE CAST(f.ds AS DATE) BETWEEN date_add('DAY', ceiling(273 / 9) * 1 - 1, DATE ''<START_DATE>') AND least(date_add('DAY', ceiling(273 / 9) * 1, DATE ''<START_DATE>') - interval '1' day, DATE '<DATEID>') GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 ) AS i LEFT OUTER JOIN table11 table11 ON i.i16_field24 = table11.field24 AND table11.ds = '<DATEID>' LEFT OUTER JOIN table8_<DATEID_NODASH> table8 ON i.i19_field29 = table8.id LEFT OUTER JOIN table6 table6 ON i.i10_field9 = table6.field10_abbr AND table6.ds = '<DATEID>' LEFT OUTER JOIN table1_<DATEID_NODASH> table1 ON i.i22_field33 = table1.field33 AND CAST(i.i0_ds AS DATE) BETWEEN CAST(table1.effective_start_date AS DATE) AND CAST(table1.effective_end_date AS DATE)
  • Compilation automatically optimizes into "SELECT FROM (SELECT)".
  • Interim SELECT from 9 tables, requires 23 'GROUP BY's and calculates all aggregates.
  • Main SELECT from interim table and joins 4 more tables.

Our Team

Shevek

Shevek

CTO
Aditya Alurkar

Aditya Alurkar

VP of Many Things
Future Team Members

Our Team

  • 3 continents, 7 countries
  • 10 nationalities
Gerald Wluka

Gerald Wluka

CEO