AI Scientist & Data Engineer

Building the
Data Layer
That Powers AI

I architect end-to-end data systems — from ETL pipelines and data warehouses to LLM training and BI dashboards — helping organizations turn raw data into intelligent decisions.

Available for remote roles
100% Job Success
on Upwork
5+ Years of Data
Engineering
85%+ Pipeline Cost
Reduction
70% Query Time
Reduction
Python SQL Java BigQuery PostgreSQL DBT Apache Airflow Kafka Terraform GitHub Actions Claude Code Jira LLMs / RLHF Metabase
About Me

Data-Driven.
Systems-Focused.
Impact-Oriented.

I'm Chukwudi (Samuel) Ogbekile — an AI Scientist and Data Engineer based in Abuja, Nigeria. I design and build the full data infrastructure that powers intelligent systems: from batch and streaming pipelines to cloud warehouses, LLM training pipelines, and executive BI dashboards.

With a First-Class degree in Electrical & Electronics Engineering (4.66/5.0) from the University of Lagos and a distinction in MSc Financial Engineering (95/100) from WorldQuant University, I bring engineering precision to every layer of the data stack — from schema design and query optimization to ML model deployment on AWS SageMaker.

Currently training AI models at Turing across campaigns including function calling API, SWE Benchmark for GitHub issue resolution, and code evaluation — while also managing client data ecosystems as a Top Rated Plus freelancer on Upwork with a 100% Job Success Score.

Previously Lead Data Engineer at BuyPower (YC W17), leading a team of 5 engineers, reducing operational costs by 60%, and cutting query execution times by 70%. Passionate about energy data, AI model improvement, and mentoring the next generation of engineers.

MSc, Financial Engineering
WorldQuant University (New Orleans, USA)
Apr 2020 — Feb 2022 · Grade: 95/100 · Distinction
🏆 Final Project: GARCH & Markov-Switching Model on COVID-19 Impact
BSc, Electrical & Electronics Engineering
University of Lagos
Dec 2015 — Jan 2020 · GPA: 4.66/5.0 · First Class Honours
🏆 MTN Foundation Scholarship 2018
🏆 First Class Endowment Award — 2016, 2017, 2018, 2019
Best Graduating Student — Computer Engineering
Yaba College of Technology
2015 · Award
Areas of Expertise

What I Build

Data Engineering & ETL Pipelines
Architecting scalable batch and streaming pipelines replicating data from MongoDB, Firebase, and MySQL into cloud warehouses. Reduced GCP pipeline costs by 85%+ through CI/CD implementation.
Apache Airflow AWS Glue Airbyte Kafka Fivetran Terraform Docker
🧠
LLM Training & AI Development
Training and evaluating LLMs via RLHF data curation, function calling API campaigns, SWE Benchmark for GitHub issue resolution, and code evaluation to improve AI assistant performance.
RLHF Function Calling API SWE Benchmark Model Evaluation F1 / Precision / Recall
☁️
Cloud Data Warehousing
Designing and managing data warehouse environments on GCP BigQuery, Snowflake, and Amazon Redshift. Schema and query optimizations that cut execution times by 70%.
BigQuery Snowflake Redshift Cloud SQL Supabase AWS RDS
🔧
Analytics Engineering with DBT
Building observability and transformation layers with DBT. Implementing CI/CD for data pipelines using Cloud Build, Cloud Triggers, and GitHub Actions. Reduced costs 85%+.
DBT GitHub Actions Cloud Build Jenkins SQL GitLab CI
📊
Data Visualization & BI
Building multi-team dashboards and observability systems. Saved 15+ hours of manual reporting weekly. Delivered KPI sales reports that drove 40% annual revenue growth for clients.
Looker Studio Metabase Power BI Tableau Mode R Shiny Streamlit
🤖
Machine Learning & Financial Modelling
Deploying XGBoost, neural networks, and statistical models (GARCH, Black-Scholes, Markov-Switching) in production. Built ML models on BigQuery and AWS SageMaker for real business outcomes.
XGBoost Scikit-learn Keras AWS SageMaker Apache Spark R
Experience

Work History

Sep 2023 — Present · Contract · Remote
AI Scientist / Trainer
Turing
  • Developing and training AI models by curating LLM and RLHF data, and adjusting parameters to optimize performance.
  • Collaborated on function calling API campaign responsible for GPT model building, and SWE Benchmark to facilitate GitHub code issue resolution.
  • Participated in code evaluation campaigns to improve AI assistant response quality across multiple programming languages.
  • Assessing model performance using precision, recall, and F1 score metrics, and implementing targeted improvements based on evaluation results.
Sep 2021 — Mar 2024 · Full-time · Abuja, Nigeria
Lead Data Engineer
BuyPower (YC W17) · buypower.ng
  • Designed batch and streaming pipelines replicating logs from MongoDB, Firebase, and MySQL (Amazon RDS) into BigQuery data warehouse.
  • Led a team of 5 data engineers collaborating with engineering management to improve ETL architecture and reduce operational costs by 60%.
  • Built DBT-based observability ETL systems, reducing time-to-insight from days to seconds across multiple teams.
  • Implemented schema and query optimization techniques reducing overall query execution time by 70%.
  • Automated reporting in Looker Studio saving 15+ hours of manual work per week; conducted financial analysis recovering tens of millions of naira.
Sep 2021 — Present · Freelance · Remote
Data Analytics Engineer (Top Rated Plus)
Upwork
  • Built end-to-end ETL pipeline migrating 60+ Amazon Seller CSV reports into BigQuery using GCP VM, Python, FastAPI, and Streamlit backfill UI. (Siturna)
  • Reduced GCP Cloud Composer 2 pipeline costs by over 85% via CI/CD tools, Cloud Build, Cloud Triggers, and Scheduler. (WebVerif)
  • Built ETL pipeline for ~80GB tabular datasets with AWS Glue and developed XGBoost ML model to predict NYC taxi prices on AWS SageMaker. (Confidential)
  • Delivered data-driven insights for multinational clients using Python, Metabase, DBT, BigQuery, and GitLab. (Équipe Totem Inc.)
Jun 2022 — Aug 2022 · Full-time · Remote
Data Scientist
Turing.com
  • Built data migration pipelines and ETL processes using Apache Airflow, GCP, and BigQuery, improving data accuracy and efficiency.
  • Collaborated on data QA to identify and resolve quality issues in product development workflows.
  • Tested and provided feedback for ML models, resulting in measurable improvements in efficiency and accuracy.
Apr 2020 — Aug 2021 · Full-time · Lagos, Nigeria
Data Analyst
Smarterise
  • Delivered summarized energy consumption and power quality insights that saved clients more than 30% of annual operational budget.
  • Built a comprehensive KPI sales report for a multinational client, enabling data-driven decisions that boosted annual sales revenue by 40%.
  • Applied ML models to predict energy load profiles for research and recommendation purposes.
Aug 2021 — Feb 2022 · Part-time · Remote
Python Tutor / Mentor
Kodland UK
  • Taught Python and Scratch programming to learners aged 8–17 with no prior coding experience.
  • Mentored students on projects including chatbots, PyGame, Discord/Telegram Bots, Tkinter, and Figma product design.
GitHub Portfolio

Selected Projects

01
Data2Bot ETL Pipeline — PostgreSQL & S3
Structured, OOP-based ETL pipeline that extracts raw data from S3 buckets, stages it in PostgreSQL, and exports analytics results back to S3. Built with Python, SQL, and structured directory architecture inspired by production-grade data systems.
Python PostgreSQL AWS S3 SQL OOP
02
🔒 Private Client Project
Amazon Marketing Backfill UI
Built an end-to-end scalable ETL pipeline migrating 60+ Amazon Seller CSV report URLs into BigQuery using a GCP VM Instance and Python scripts. Developed a Streamlit + Flask UI hosted on Google App Engine for iterative historical data backfilling — eliminating manual data loads for the client entirely.
Python Streamlit Flask BigQuery GCP App Engine FastAPI
03
GridDB Kafka Streaming Pipeline
Real-time data streaming pipeline integrating GridDB with Apache Kafka. Demonstrates expertise in event-driven architectures and high-throughput data ingestion patterns for modern data platforms.
Apache Kafka GridDB Streaming Python
04
Web3 DEX Analytics Dashboard
Decentralized exchange analytics dashboard tracking on-chain trading activity, liquidity metrics, and volume trends. Applied data engineering and visualization skills to the blockchain data domain.
Web3 Python Analytics Dashboarding
05
Data Engineering Zoomcamp Projects
End-to-end data engineering projects covering containerization, workflow orchestration, data warehousing, batch and stream processing — built through the DataTalks.Club DE Zoomcamp curriculum.
Docker Terraform BigQuery Airflow Spark
View All Repositories on GitHub ↗
Technical Skills

Tools & Technologies

01
Programming Languages
Python SQL Java R C++ HTML CSS
02
Databases
PostgreSQL BigQuery MySQL MongoDB Snowflake Supabase Cloud SQL Amazon RDS Amazon Redshift GridDB
03
Data Engineering & ETL
Apache Airflow DBT Apache Kafka Apache Spark AWS Glue Airbyte Fivetran AWS Data Wrangler PySpark FastAPI
04
Cloud & Infrastructure
GCP AWS (EC2, S3, Glue, SageMaker, Lambda) Terraform Docker Linux GCP VM Instance Cloud Pub/Sub Cloud Functions Google App Engine
05
DevOps & CI/CD
GitHub Actions Jenkins GitLab CI Cloud Build Cloud Triggers Git Docker
06
AI & Machine Learning
Claude Code LLM Training RLHF Scikit-learn Keras XGBoost AWS SageMaker Pandas NumPy Matplotlib R Shiny
07
Visualization & BI
Metabase Looker Studio Power BI Tableau Mode Analytics Streamlit Sigma
08
Management & Collaboration
Jira Slack Confluence ClickUp Notion Microsoft Office
Freelance Work

Upwork Track Record

⭐ Top Rated Plus · Verified Identity

On Upwork, I specialize in Data Engineering, ETL, ML, and Analytics projects for clients across industries. Every contract closed with a 5.0 rating — a track record built on clear communication, technical depth, and delivering results that go beyond scope.

View Upwork Profile ↗
100%
Job Success
18
Total Jobs
830+
Total Hours
★★★★★
"Samuel is a fantastic developer. He always works professionally and is very solution-oriented. I highly recommend him and will definitely work with him again."
Data Engineer for Automated Analytics Pipeline · Jan 2026
★★★★★
"Samuel showcased skill and diligence while working on our DBT project with BigQuery. His hard work greatly contributed to the project's success. Recommended, and I would hire again."
BigQuery & DBT Expert for Data Analytics · Sep 2023
★★★★★
"Samuel did a great job on ETL and ML tasks using Amazon Web Services capabilities. We'd be happy to work with this freelancer again."
AWS ETL & ML Project · Upwork Enterprise Client · Jul 2023
★★★★★
"Great quality of work. 100% true Python and Pandas specialist. Would recommend anyone who is solving python web scraping and will hire again."
Data Extraction / Web Scraping · Aug 2021
★★★★★
"Mr. Samuel Ogbekile is very meticulous in his work. He delivered more than what we asked for!"
Crypto Trading Script · Jan 2022
★★★★★
"Thx for the work on our data warehouse, dbt and metabase. It was a pleasure working with you. Closing the contract because the project is over."
DBT & BI Reports Creation · Apr 2024
Community & Volunteering

Giving Back

🌍
NYSC — SDGs Advocate
Apr 2020 — Feb 2021
Educated and mentored students on the United Nations' 17 Sustainable Development Goals and moral values during National Youth Service Corps.
🎓
Tutor & Mentor — University of Lagos
Jan 2017 — Nov 2019
Provided tutorial sessions on difficult engineering courses to coursemates, and mentored freshers on course syllabuses and academic strategies.
Energy Data Advocacy
Ongoing
Advocating for data-driven approaches to Nigeria's power sector challenges — applying data science to energy data for smarter infrastructure decisions.

Let's Build
Something Meaningful

Open to remote Data Engineering, AI Engineering, and Analytics Engineering roles. Also available for freelance collaborations on data infrastructure, LLM projects, and analytics systems.

ogbekilechukwudi@gmail.com

Available for remote collaborations worldwide · Typically responds within 4 hours