Open to Senior & Principal Roles · Riyadh, Saudi Arabia · Global Remote
Principal Big Data Engineer

Basit
Ali

Principal Big Data & Cloud Data Engineer | Data Architect
Databricks · Azure · AWS · GCP · Cloudera · Spark & Streaming · Kafka · Snowflake · AI/ML & GenAI · Microsoft Fabric · Azure Synapse

9+ years architecting and delivering large-scale data platforms across healthcare, telecom, government, retail, and IoT — on Azure, AWS, Cloudera, and Microsoft Fabric.

0+ Years
0+ Industries
0+ Technologies
0+ Cloud Platforms

Background

Building Data Platforms
That Deliver at Scale

I'm a Principal Big Data Engineer with deep expertise in designing secure, scalable data architectures and real-time processing systems that drive real business value. From raw ingestion to insight — I build it end-to-end.

Currently at Lean Business Services, Riyadh, where I'm architecting a national-scale Patient Profile Platform using Apache Spark, consolidating healthcare data from hospitals across the Kingdom into a unified enterprise Data Lake.

Previously at Ai-Elements (STC Group) and Systems Limited, I led multi-cloud data platform delivery across Azure, AWS, Databricks, and Azure Synapse — implementing medallion architectures on Microsoft Fabric and driving engineering excellence across teams.

Equally comfortable driving architecture strategy at the whiteboard and shipping production-grade Spark pipelines.

Healthcare Telecom Retail Government IoT Financial Services
🏗️
Data Architecture
Medallion, Lakehouse, Data Mesh, Data Vault, Data Fabric
☁️
Multi-Cloud Platforms
Azure · AWS · Cloudera · Microsoft Fabric · GCP
Real-Time Streaming
Spark Streaming · Kafka · Apache NiFi · Kinesis
🤖
AI/ML & GenAI
RAG Applications · Agentic AI · Deep Learning · ML Pipelines
🛡️
Governance & Quality
Data privacy, compliance, metadata mgmt, RBAC, lineage

Technical Stack

Platform & Practice Depth

Big Data
Apache Spark Spark Streaming Apache Hudi Hadoop / HDFS MapReduce Apache Hive Apache Sqoop Neo4j Elasticsearch Kibana Apache Airflow Apache NiFi
Azure Stack
Azure Databricks Azure Synapse Analytics Microsoft Fabric Azure Data Factory Azure Blob Storage Azure Functions Azure Queue Redis (Azure) Azure DevOps
Amazon Web Services
AWS EMR EC2 AWS Glue Lambda Kinesis S3 Redshift Athena Step Functions CloudFormation SNS CloudTrail CloudWatch IAM IoT
Cloudera Platform
Cloudera CDP Hive Impala Flink Kafka Ranger Cloudera Manager
Databases & Storage
Snowflake DBT MongoDB PostgreSQL Elasticsearch Redis SQL Server Oracle MySQL BigQuery
Programming Languages
Python PySpark SQL Node.js Angular Spark (Scala) Django Streamlit
DevOps, AI & Tools
Jenkins GitHub / GitLab Azure DevOps CI/CD Pipelines Jira MS Visio Generative AI RAG Applications Agentic AI Deep Learning ML Pipelines

Career

9+ Years. Multiple Industries.
Consistent Delivery.

Senior Big Data Engineer
Feb 2025 — Present
Lean Business Services · Riyadh, Saudi Arabia
  • Leading MOH & MOD national healthcare data transformation — architecting scalable, event-driven pipelines for real-time ingestion across hospitals, clinics, and regional health systems aligned with Vision 2030.
  • Designed patient linkage algorithms to match and merge identities across disparate hospital systems, enabling a 360° longitudinal patient view.
  • Engineered a metadata-driven ingestion framework using Streamlit & Python to automate data onboarding from SQL, MySQL, Oracle, and NoSQL sources without manual coding.
  • Built real-time pipelines using Spark Streaming, Kafka, and Apache NiFi for high-volume healthcare data streams with low-latency clinical decision-making.
  • Implemented strong data governance and privacy safeguards compliant with national healthcare data security standards.
  • Reduced infrastructure costs by optimizing event-driven architectures and decommissioning inefficient legacy workflows.
Senior Big Data Engineer
Sep 2023 — Jan 2025
Ai-Elements · STC Group · Riyadh, Saudi Arabia
  • Designed and implemented robust end-to-end pipelines using Airflow, NiFi, and Spark, enabling real-time ETL from diverse systems.
  • Applied Clean Code practices and modernized legacy codebases — improving maintainability, streamlining onboarding, and raising engineering standards across the team.
  • Built a Cloudera-based PoC for AI-driven insights using Spark and Hadoop to process large datasets and train ML models.
  • Managed and optimized on-premises data workflows, ensuring security, reliability, and compliance.
Senior Big Data Engineer
Jul 2019 — Sep 2023
Systems Limited · Islamabad, Pakistan
  • Designed and delivered Microsoft Fabric Medallion Architecture and scalable pipelines using Apache Spark, Hadoop, and Big Data ecosystems.
  • Engineered AWS-based platforms using EMR, Glue, Lambda, Kinesis, Step Functions, Redshift, and S3 for batch and real-time processing.
  • Built multi-cloud pipelines integrating Azure (Databricks, Synapse, ADF) and AWS for near real-time ETL and cross-platform data movement.
  • Implemented CI/CD automation using Jenkins, integrating build pipelines with AWS to orchestrate Spark jobs and data workflows.
  • Championed Clean Code, SOLID, and TDD across the team — improving code quality, accelerating delivery velocity, and modernizing legacy systems — recognized as Team of the Year 2021 and Star Performer 2020 & 2021.
Big Data Developer
Feb 2018 — Jun 2019
Metis International · Islamabad, Pakistan + Dubai (Onsite)
  • Developed a web-based Big Data portal using Python and Django with Neo4j graph-based data visualization.
  • At Dubai Future Accelerators (onsite): Built social media analysis tools for crime, hate speech, and sentiment analysis using Python.
  • Identified drug-related social media activity by location, supporting the Anti-Narcotics Department.
  • Built real-time alert systems, live vehicle tracking, geofencing, and analytical dashboards using Angular 2/5.
Independent Data Architect · Freelance
Selected Engagements · 2019 — 2023
Independent Consultant · U.S. & International Clients · Remote
  • Designed and delivered a Microsoft Fabric end-to-end analytics pipeline for a U.S. enterprise client — integrating structured and unstructured datasets into a unified, governed format with a multidimensional analytics cube.
  • Embedded predictive ML models into the pipeline enabling trend forecasting and near real-time reporting — directly contributing to new revenue opportunities for the client.
  • Optimized the platform for scalability and cloud-native efficiency using Microsoft Fabric infrastructure, significantly improving processing speed and data delivery latency.
  • Delivered solutions fully aligned to client data governance standards, from ETL architecture through production operationalization.

Key Engagements

Featured Work

★ Featured · Independent Consulting · U.S. Client · Microsoft Fabric
Enterprise Fabric Analytics Platform — U.S. Client

Independently designed and delivered a complex, end-to-end Microsoft Fabric data pipeline for a U.S.-based enterprise client. Integrated structured and unstructured datasets into a unified, governed format and multidimensional analytics cube — embedding predictive ML models for trend forecasting and near real-time reporting, directly contributing to new revenue opportunities for the client.

Microsoft Fabric Predictive ML ETL / ELT Multidimensional Cube Data Governance Cloud Cost Optimization Near Real-Time
National Patient Profile Platform

National-scale unified patient Data Lake consolidating clinical data from hospitals across KSA. Included patient identity linkage, real-time ingestion via Kafka & Spark Streaming, and metadata-driven onboarding framework.

Apache Spark Kafka Apache NiFi Streamlit Python Oracle
AI / Internal Practice
Cloudera AI-Driven Insights PoC

Built a Cloudera-based proof of concept for AI-driven analytics using Spark and Hadoop for large-scale dataset processing, ML model training, and performance optimization.

Cloudera CDP Apache Spark Hadoop ML Models Python
Multi-Cloud · Retail / Enterprise
Microsoft Fabric Medallion Platform

Designed and delivered Medallion Architecture on Microsoft Fabric with scalable Spark pipelines and BI layers, adopted as the delivery standard for enterprise data platform programs.

Microsoft Fabric Apache Spark Delta Lake ADF Databricks
AWS · Multi-Cloud
AWS Enterprise Data Platform

End-to-end AWS data platform with EMR, Glue, Kinesis Firehose for real-time ingestion, Apache Hudi for incremental processing, and Step Functions for automated orchestration at scale.

AWS EMR Glue Kinesis Apache Hudi Redshift Lambda
Telecom · UAE
Social Media Intelligence Platform

Python-based social media analysis for crime pattern detection, hate speech classification, and drug-related activity geolocation — deployed in support of Dubai's Anti-Narcotics Department.

Python NLP Sentiment Analysis Geolocation Social Media APIs
IoT / Fleet
Real-Time Fleet Analytics Dashboard

Front-end application with real-time vehicle tracking, geofencing, fuel monitoring, speed analytics, and live alert systems built with Angular 2/5 for fleet management operations.

Angular 5 Real-Time Alerts Geofencing Node.js IoT

Credentials

Certifications & Training

Microsoft Certified · Associate
Azure Data Engineer Associate
Cloudera
Technical Expert (CTE) Accreditation
Cloudera
Technical Specialist (CTS) Accreditation
Cloudera
Technical Professional (CTP) Accreditation
Cloudera
Data-in-Motion Hands-on Experience
LUMS · REDC
Future Leadership Program
Systems Limited
🏆 Team of the Year 2021 · Star Performer 2020 & 2021

Contact

Let's Build Something
at Scale

Whether you're architecting a new cloud data platform, migrating to Microsoft Fabric, or need a senior Big Data engineer to lead delivery — let's talk.

I'm based in Riyadh, Saudi Arabia and open to senior/principal roles across the region and globally.