• Automatic 

    Processing 

    Optimization 

    Compile your current queries and business logic to execute faster on your current (or new) data processing platform.

    Automatically improve performance by taking into account enterprise wide data processing pipelines.

  • Portability

     Across

     Repositories

    "Lift and Shift" from high cost database and data warehouse infrastructure to low cost cloud and open source solutions.

    No code rewrites required to use your existing business rules and analytics with low cost infrastructure.

  • Corral 

    Enterprise Wide 

    Data Processing

    Map data processing within (and across) data repositories.

    Globally analyse analyst and developer processing pipelines at the enterprise level.

Solutions

Enterprise-wide Data Processing Code Analysis and Optimization (Lineage Model)

CompilerWorks takes data processing code from multiple data processing platforms and compiles this code into a unified view of data processing across the enterprise.


Typically an analyst or developer can construct an individual query efficiently. The executing data repository itself should optimize execution of the query.

A strong or experienced analyst/developer will consider the interactions of a sequence of queries in a data processing pipeline and structure the component queries to interact with each other efficiently. The database itself has no ‘vision’ to perform optimizations across a series of queries.

At the enterprise level there is little, if any, ablity to view/optimize data processing at the pipeline level.

CompilerWorks systematically delivers a unified view of how one query interacts with all others. This unified view crosses pipeline (and data repository) boundaries, even when data processing is maintained by different analysts or separate teams.
At the enterprise level, CompilerWorks provides a clear understanding of how any data source affects a final report, even after many stages of processing and out‑of‑band data transfer (e.g. how placing a web advertisement leads to fulfilling an online order).

This unified view of data processing directly addresses several application areas:
  • Map data usage across the enterprise
  • Identify and remove unused data processing activity
  • Optimize processing, making improvements not easily available to human analysts
  • Eliminate redundancy
  • Restructure analyses across multiple pipelines

Code Transpiling

Automatic Code Transpiling
(Offline)

CompilerWorks compiles from one dialect of SQL (and control language) and emits equivalent code in another.
  • Equivalent code produced for new platform
  • Automatically transform data structures for optimal execution
  • Clean, human readable code emitted
  • Automatically adheres to coding standards

Source Language

(SQL and Procedural)
 

Database Schema

Intermediate Representation

Compiler:
Transpilation from Source (proprietary) platform to equivalent code on target platform (open), including requested transformations.

 

Target Language

(open platform)

Platform Emulation
(Online)

Use CompilerWorks in front of an open source data processing repository - now it will appear like the proprietary system it is emulating.
  • Looks like a proprietary platform to end users
  • Compiles code on the fly to execute on the new platform
  • Automatically persists compiled code to deliver performance

Source Language

(SQL and Procedural)

Platform Emulator

Compiler and Executor:
Transpilation from Source (proprietary) platform and execution/control of execution on target platform (open).

 

Target Platform


Technology

CompilerWorks has developed a robust infrastructure that:
  • Ingests data analysis/processing code, extracting the core SQL
  • Compiles that dialect of SQL (and associated procedural language) into an Intermediate Representation (IR)
  • Applies transformations within the IR
  • Emits either a lineage model for the entire code base, or equivalent code in a different dialect of SQL (and associated procedural language).

Our compilers are robust. They make multiple passes over the code to synthesize an execution environment based on the implications of the DDL/DML and reconciles this with the available metadata and carries the results throughout the compilation flow.

Once represented in the IR, transformations and optimizations are systematically applied to the entire code base. Rather than being limited to the scope of code a developer or architect can review, our compilers can handle code enterprise wide, across multiple data repositories.

Source Code

SQL encapsulated in many formats:

  • Python, Java, shell, XML, JSON, YAML, query log files, or vendor-specific tool formats (such as Tableau)
Ingest

Query(s) in a growing number of languages and dialects.

including associated procedural or execution language.
Compile

Intermediate Representation

Transformations


  • Correct feature translation between databases
  • User-specified business rules, e.g. partitioning, normalization, denormalization, renaming tables, et.c.

Optimizations


  • General optimizations, e.g. introduction of semijoins or hoisting GROUP BY above JOIN
  • Dialect-specific optimizations, e.g. rearrangement of CAST, CASE, or CONCAT based on target dialect
Emit

Compiler Output

Lineage Model


The lineage model (to the attribute level) covers all data provenance and lineage, and incorporates a broad range of static code analyses.

  • Multi-paths to source
  • Data analysis cycles
  • Unused tables/attributes

Code (or executable)


A wide range of target platforms are supported.

  • Cloud Platforms: AWS Redshift and Aurora, Google BigQuery, Snowflake
  • Open platforms: Presto, Hive, Spark, Postgres, MySQL, Splice Machine, et.c.
  • Proprietary platforms: Oracle PL/SQL, Vertica SQL, Teradata SQL, Netezza SQL, et.c.
  • Procedural execution languages: Java, Python, BTEQ, et.c.

Clients say about us

  • “We estimated it would take 10,000 developer hours to move our library of queries from Vertica. We did it in a few hundred with the help of CompilerWorks.”
    — GM Data Engineering, Major Social Networking Co.
  • “CompilerWorks separates business interests from technical implementation, letting our clients to put less effort into managing technology platforms and invest more time applying analytics to drive their business.”
    — CTO, Clarity Solution Group, North America’s largest data analytics professional services firm
  • “We wanted Oracle users to be able to port their query libraries to our high scalable, ACID compliant infrastructure. CompilerWorks has the knowledge and technology to make that a reality.”
    — SVP Marketing, Data Repository "Manufacturer"

Example (Transpiling)

CompilerWorks output code is very, very maintainable by humans, even to the point of being able to selectively discard source language semantics in order to generate 'better' human-maintainable code in the target language.

Below is an example of Vertica SQL converted to Presto (executing on Hadoop.)

Note: using CompilerWorks to optimize the query on Vertica, execution time was reduced by a factor of 3, and data volumes supported were increased by a factor of 8.

Vertica SQL

In this particular case (edited for anonymization) several senior developers at the client had spent several man months trying to modify the Vertica query to execute on Presto - they failed!

Presto

CompilerWorks' compiler not only rose to the challenge but could handle more data faster. On Presto performance was 4x (with practically no data volume limit) and on Vertica the optimized queries could handle 4x more data.
SELECT max(f.customer_id) customer_id ,f.ds date ,CAST(date_trunc('WEEK', CAST(f.ds AS DATE)) AS VARCHAR) week ,CAST(date_trunc('MONTH', CAST(f.ds AS DATE)) AS VARCHAR) month ,table5.field1 field1 ,table5.field2 field2 ,f.field3 field3 ,table2.field4 field4 ,table3.field5 field5 ,table4.field6 field6 ,f.field7 field7 ,f.field8 field8 ,table7.field8_description field8_description ,f.field9 field9 ,table6.field10 field10 ,table6.field11 field11 ,table9.field12 field12 ,table9.field13 field13 ,table9.field14 field14 ,table9.field15 field15 ,table1.field16 field16 ... (same for table1.field17 through table1.field21) ... ,table1.field22 field22 ,f.field23 field23 ,f.field24 field24 ,table11.field24_desc field24_desc ,f.field25 field25 ,CAST(coalesce(f.field26, 0) AS VARCHAR) || CAST(coalesce(f.field27, 0) AS VARCHAR) || CAST(coalesce(f.field28, 0) AS VARCHAR) ,f.field29 field29 ,table8.field30 field30 ,table12.field31 field31 ,table12.field32 field32 ,f.field33 etl_field33 ,CAST(date_trunc('QUARTER', CAST(f.ds AS DATE)) AS VARCHAR) quarter ,sum(f.field34) field34 ,sum(f.field35) field35 ,sum(f.field36) / 100 field36a ,sum(f.field37) / 100 field37a ,sum(f.field38) field38 ... (same for f.field39 through f.field85) ... ,sum(f.field86) field86 ,sum(coalesce(f.field36, 0) * coalesce(f.field87, 1) * coalesce(table10.field88, 1)) / 100 field89 ,sum(coalesce(f.field37, 0) * coalesce(f.field87, 1) * coalesce(table10.field88, 1)) / 100 field90 FROM main_table f LEFT OUTER JOIN table1_<DATEID_NODASH> table1 ON f.field33 = table1.field33 AND CAST(f.ds AS DATE) BETWEEN CAST(table1.effective_start_date AS DATE) AND CAST(table1.effective_end_date AS DATE) LEFT OUTER JOIN table2 table2 ON f.id_table2 = table2.id_table2 AND table2.ds = '<DATEID>' LEFT OUTER JOIN table3 table3 ON f.id_table3 = table3.id_table3 AND table3.ds = '<DATEID>' LEFT OUTER JOIN table4 table4 ON f.id_table4 = table4.id_table4 AND table4.ds = '<DATEID>' LEFT OUTER JOIN table5 table5 ON f.table5_id = table5.key_id AND table5.ds = '<DATEID>' LEFT OUTER JOIN table6 table6 ON f.field9 = table6.field10_abbr AND table6.ds = '<DATEID>' LEFT OUTER JOIN table7 table7 ON f.id_table7 = table7.id_table7 AND table7.ds = '<DATEID>' LEFT OUTER JOIN table8_<DATEID_NODASH> table8 ON f.field29 = table8.id LEFT OUTER JOIN table9 table9 ON f.id_table9 = table9.id_table9 AND table9.ds = '<DATEID>' LEFT OUTER JOIN table10_<DATEID_NODASH> table10 ON f.field91 = table10.field92 AND CAST(table10.quarter_id AS DATE) = date_trunc('QUARTER', current_date) LEFT OUTER JOIN table11 table11 ON f.field24 = table11.field24 AND table11.ds = '<DATEID>' LEFT OUTER JOIN table12_<DATEID_NODASH> table12 ON f.field31 = table12.field31 WHERE CAST(f.ds AS DATE) BETWEEN date_add('DAY', ceiling(273 / 9) * 1 - 1, CAST('<START_DATE>' AS DATE)) AND least(date_add('DAY', ceiling(273 / 9) * 1, CAST('' AS DATE)) - interval '1' day, CAST(' <DATEID>' AS DATE)) AND f.ds = '<DATEID>' GROUP BY 2,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37
  • Returns 91 fields (53 aggregates).
  • Derived from 13 joined tables.
  • 'GROUPED BY' 35 fields
SELECT i.customer_id customer_id ,i.i0_ds date ,CAST(date_trunc('WEEK', CAST(i.i0_ds AS DATE)) AS VARCHAR) week ,CAST(date_trunc('MONTH', CAST(i.i0_ds AS DATE)) AS VARCHAR) month ,i.i1_field1 field1 ... (same for field1 through field7) ... ,i.i8_field8 field8 ,i.i9_field8_description field8_description ,i.i10_field9 field9 ,table6.field10 field10 ,table6.field11 field11 ,i.i11_field12 field12 ,i.i12_field13 field13 ,i.i13_field14 field14 ,i.i14_field15 field15 ,table1.field16 field16 ,table1.field17 field17 ,table1.field18 field18 ,table1.field19 field19 ,table1.field20 field20 ,table1.field21 field21 ,table1.field22 field22 ,i.i15_field23 field23 ,i.i16_field24 field24 ,table11.field24_desc field24_desc ,i.i17_field25 field25 ,i.i18_field28 field28 ,i.i19_field29 field29 ,table8.field30 field30 ,i.i20_field31 field31 ,i.i21_field32 field32 ,i.i22_field33 etl_field33 ,CAST(date_trunc('QUARTER', CAST(i.i0_ds AS DATE)) AS VARCHAR) quarter ,i.i24_field34 field34 ,i.i25_field35 field35 ,i.i26_field36a field36a ,i.i27_field37a field37a ... (same for field38 through field187) ... ,i.i78_field90 field90 FROM (SELECT f.ds i0_ds ,table5.field1 i1_field1 ,table5.field2 i2_field2 ,f.field3 i3_field3 ,table2.field4 i4_field4 ,table3.field5 i5_field5 ,table4.field6 i6_field6 ,f.field7 i7_field7 ,f.field8 i8_field8 ,table7.field8_description i9_field8_description ,f.field9 i10_field9 ,table9.field12 i11_field12 ,table9.field13 i12_field13 ,table9.field14 i13_field14 ,table9.field15 i14_field15 ,f.field23 i15_field23 ,f.field24 i16_field24 ,f.field25 i17_field25 ,concat(coalesce(CAST(f.field26 AS VARCHAR), '0'), coalesce(CAST(f.field27 AS VARCHAR), '0') i18_field28 ,f.field29 i19_field29 ,table12.field31 i20_field31 ,table12.field32 i21_field32 ,f.field33 i22_field33 ,max(f.customer_id) customer_id ,sum(f.field34) i24_field34 ,sum(f.field35) i25_field35 ,sum(f.field36) / 100 i26_field36a ,sum(f.field37) / 100 i27_field37a ,sum(f.field38) i28_field38 ... (same for f.field39 through f.field85) ... ,sum(f.field86) i76_field86 ,sum(coalesce(f.field36, 0) * coalesce(f.field87, 1) * coalesce(table10.field88, 1)) / 100 i77_field89 ,sum(coalesce(f.field37, 0) * coalesce(f.field87, 1) * coalesce(table10.field88, 1)) / 100 i78_field90 FROM main_table f LEFT OUTER JOIN table2 table2 ON f.id_table2 = table2.id_table2 AND table2.ds = '<DATEID>' LEFT OUTER JOIN table3 table3 ON f.id_table3 = table3.id_table3 AND table3.ds = '<DATEID>' LEFT OUTER JOIN table4 table4 ON f.id_table4 = table4.id_table4 AND table4.ds = '<DATEID>' LEFT OUTER JOIN table5 table5 ON f.table5_id = table5.key_id AND table5.ds = '<DATEID>' LEFT OUTER JOIN table7 table7 ON f.id_table7 = table7.id_table7 AND table7.ds = '<DATEID>' LEFT OUTER JOIN table9 table9 ON f.id_table9 = table9.id_table9 AND table9.ds = '<DATEID>' LEFT OUTER JOIN table10_<DATEID_NODASH> table10 ON f.field91 = table10.field92 AND CAST(table10.quarter_id AS DATE) = date_trunc('QUARTER', current_date) LEFT OUTER JOIN table12_<DATEID_NODASH> table12 ON f.field31 = table12.field31 WHERE CAST(f.ds AS DATE) BETWEEN date_add('DAY', ceiling(273 / 9) * 1 - 1, DATE ''<START_DATE>') AND least(date_add('DAY', ceiling(273 / 9) * 1, DATE ''<START_DATE>') - interval '1' day, DATE '<DATEID>') GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 ) AS i LEFT OUTER JOIN table11 table11 ON i.i16_field24 = table11.field24 AND table11.ds = '<DATEID>' LEFT OUTER JOIN table8_<DATEID_NODASH> table8 ON i.i19_field29 = table8.id LEFT OUTER JOIN table6 table6 ON i.i10_field9 = table6.field10_abbr AND table6.ds = '<DATEID>' LEFT OUTER JOIN table1_<DATEID_NODASH> table1 ON i.i22_field33 = table1.field33 AND CAST(i.i0_ds AS DATE) BETWEEN CAST(table1.effective_start_date AS DATE) AND CAST(table1.effective_end_date AS DATE)
  • Compilation automatically optimizes into "SELECT FROM (SELECT)".
  • Interim SELECT from 9 tables, requires 23 'GROUP BY's and calculates all aggregates.
  • Main SELECT from interim table and joins 4 more tables.

Our Team

Shevek

Shevek

CTO
Gerald Wluka

Gerald Wluka

CEO
Future Team Members

Join Our Team

TBD