PayPal’s Moment of Truth

Big Tech redeemed itself during the COVID-19 pandemic. E-commerce companies stepped up and became a de facto part of our critical infrastructure, handling the huge increase in online business without missing a beat.

The new normal was a boon for companies like Amazon, Instacart, and Uber Eats, but at PayPal it was reason for panic.

PayPal data analytics and data science teams are responsible for compliance, risk processing, and fraud protection, among other things. As a financial services company, these workloads are business-critical.

RegulatoBreached SLAs and Delayed Decision Making

Record-breaking daily payment activity had impacted data warehouse ETL processing causing SLAs to be breached and delaying analytics-dependent business decisions.

PayPal’s on-prem data warehouse infrastructure could not keep up. Data engineers decided the most scalable path forward involved migrating the Teradata data warehouse to the cloud.

Several PayPal workloads had already moved to Google Cloud Platform. After a short evaluation, data engineers opted to move the warehouse to Google BigQuery.

The first step in the migration project was to scope the workload.

CompilerWorks and BigQuery

Using CompilerWorks Lineage solution, PayPal’s team processed Jupyter Notebooks, Tableau dashboards, and UC4 logs to create a lineage graph showing all tables, schemas, scheduled jobs, notebooks, and dashboards.

Data warehouse users validated the Lineage output to confirm active workloads. Redundant and duplicate processes were then deprecated. This significantly reduced the migration workload.

Eliminating Tedious and Error-Prone Manual Processing

Data engineers then used CompilerWorks Transpiler to recreate Teradata DDLs, DMLs, and SQL code in BigQuery.

Transpiler automation was critical to eliminating error-prone manual intervention and easing the PayPal data warehouse users’ transition to BigQuery. In all, Transpiler converted over ten thousand SQL queries in users’ jobs, Tableau dashboards, and Jupyter Notebooks.

During testing and validation, Transpiler continued to poll the on-prem Teradata infrastructure for changes and synchronize these with BigQuery.

The data engineering team now has 15 petabytes stored in Google BigQuery and an additional 80 petabytes in GCP. With CompilerWorks help, PayPal data warehouse users have transitioned well to BigQuery and are enjoying the improvements in query performance and load times.

To find out more about PayPal’s transition to BigQuery, read Romit Mehta’s superb write-up on Medium.