Data Engineer & ML Practitioner

Rain
Liao

Master's student in Data Science at the University of Adelaide. Building scalable ETL pipelines, cloud infrastructure, and intelligent data systems across AWS and Azure.

Rain Liao
PythonSQLAirflowdbtDocker AWSAzurePySparkNLPPyTorch LangChainPower BIQGISDynamoDBFreqtrade PythonSQLAirflowdbtDocker AWSAzurePySparkNLPPyTorch LangChainPower BIQGISDynamoDBFreqtrade
Who I am

Curious by nature,
rigorous by practice

Data Engineer with hands-on experience designing and deploying scalable ETL pipelines, cloud infrastructure, and data storage solutions across AWS and Azure. I enjoy building systems that make data usable — from ingestion and transformation through to structured storage and reliable delivery.

Equally comfortable writing production Python, wrangling SQL, and spinning up containerised environments that teams can depend on. My background bridges civil engineering foundations with modern data engineering — giving me a unique lens for tackling complex, real-world systems.

Outside of work, I stay grounded through church, my jogging club, and tennis — communities that keep my teamwork and communication skills sharp.

Data Engineering NLP & LLMs Cloud Infrastructure Quant Finance Full-Stack
Languages
Python · SQL · R · C++ · JavaScript
Data Engineering
ETL/ELT · Airflow · dbt · PySpark · Batch & Realtime · Data Quality
Cloud & DevOps
AWS (EC2, Lambda, S3, DynamoDB, CloudWatch) · Azure VM · Blob · Docker · Linux
ML / AI
PyTorch · Scikit-learn · LangChain RAG · Transformers · BERTopic
Data & Viz
Power BI · QGIS · Superset · Pandas · Selenium
Web
React · Vite · Django REST · REST API Design
Career

Experience

2025
Univ. of Adelaide
Data Science Intern — SET Faculty
Designed and deployed an end-to-end ETL pipeline (Python, Selenium, NLTK, Azure Blob, Airflow) to ingest, clean, and structure 100,000+ unstructured social media posts into a query-ready format for downstream NLP analysis. Engineered a scalable data processing workflow supporting transformer-based modelling and a LangChain RAG system, reducing model uncertainty by 20% through statistical validation.
PythonAirflowAzure BlobSeleniumNLTKLangChainRAG
2024
AIML
Quantitative Trading Assistant — AIML
Built and maintained a high-volume data pipeline processing 100GB+ of Binance market data using Python and Freqtrade, integrating structured sentiment signals with technical indicators for automated strategy execution. Containerised the full research environment with Docker and deployed to Azure VM — cutting setup time by 80% and ensuring a reproducible, production-ready pipeline for live trade execution.
PythonFreqtradeDockerAzure VMGPT-4o Sentiment
2023
NCREE Taiwan
Special Case Technician — NCREE
Scraped, processed, and structured large-scale weather and seismic datasets using Pandas and QGIS, with reliable storage in AWS DynamoDB and S3 for scalable downstream access. Architected and deployed a real-time data ingestion and alerting system on AWS (EC2, Lambda) integrating REST APIs and Google Maps API — with full observability via CloudWatch and SNS on stable Linux infrastructure.
AWS EC2LambdaDynamoDBS3CloudWatchQGISSNS
2025 –
Olive Adelaide
Front of House — Olive Fine Dining
Delivering excellent guest service through clear communication, teamwork, and quick problem-solving in a fast-paced fine dining environment — sharpening interpersonal and client-facing skills alongside a technical career.
CommunicationTeamworkClient Service
2023
NCSIS Competition
Team Leader — National Smart Innovation Competition
Led a multidisciplinary team to build a full-stack typhoon lethality prediction system using React, Python, and SQL — coordinating task allocation and technical decisions via Jira and Google Meet to deliver a working prototype on schedule.
ReactDjangoSQLJiraTeam Lead
Credentials

Education &
Achievements

Feb 2024 – Dec 2025
Master of Data Science
The University of Adelaide, Australia
Sep 2018 – Jun 2023
Bachelor of Science in Civil Engineering
Chung Yuan Christian University, Taiwan
Languages
English  ·  IELTS 7.5
Mandarin — Native
DP-700
Fabric Data Engineer Associate
Microsoft · DP-700
DP-100
Azure Data Scientist Associate
Microsoft · DP-100
PL-300
Power BI Data Analyst Associate
Microsoft · PL-300 · 2025
📄
First-Author Conference Publication
ICRTCIS 2025 · Springer Nature · Proceedings 2026
🏆
ALTA 2025 — ADR Detection
Ranked 5th / 150 participants · +20% accuracy over baseline
Portfolio

Selected Projects

01
Personal · 2026
Real-Time Weather Analytics Pipeline
Docker Compose · Airflow · dbt · Superset
End-to-end containerised weather data pipeline orchestrated by Airflow — from API ingestion, through dbt transformations, to Superset dashboards. Demonstrates production-grade data engineering with full stack observability and reproducible environments.
02
Published · Springer Nature
Mental Health Topic Modelling
Python · MentalBERT · BERTopic · Network Analysis
Published in Springer Nature (ICRTCIS 2024). Applied MentalBERT-enhanced BERTopic to 100,000+ Beyond Blue forum posts, surfacing recurring themes around medication, therapy, and work stress. Built a visual subtopic network revealing how Australians express distress and engage in peer support.
03
Competition · 5th / 150
ALTA 2025 — ADR Detection
Python · NLP · Language Encoders · ML
Ranked 5th of 150 participants in the ALTA 2025 shared task. Developed and optimised ML and language encoder models achieving +20% accuracy improvement over baseline for adverse drug reaction detection in clinical text.
04
Binance Delist Strategy
Python · Freqtrade · Algo Trading
Automated trading system that detects Binance delisting announcements in real-time, tracks affected asset price movements, and executes trades with stop-loss and take-profit mechanisms — capitalising on rapid market reactions with disciplined risk management.
05
NLP Media Data Classifier
Python · NLP · Web Scraping · Ranking
Stack Overflow comment classifier to improve information retrieval quality. Implemented automated crawling, cleaning, and preprocessing pipelines, then built a relevance-based ranking model to surface the most useful answers to developer queries.
06
Web Crawler — Weather Data
Python · QGIS · GIS · Pandas
Automated scraping of rainfall and temperature data from Taiwan's Central Weather Bureau. Processed into GIS-compatible formats, integrated National Land Survey shapefiles, and generated distribution maps in QGIS — forming the foundation for an AI weather chatbot.
07
API Regex Searching Tool
Python · React (Vite) · Django REST · LLM
Full-stack web application enabling users to upload files, apply regular expressions, and retrieve processed results via a RESTful API. Supports pluggable LLM integration via API key for advanced text processing workflows.
08
HydroGuard — Typhoon Risk
Python · DNN · Django · React · SQL
Full-stack typhoon risk assessment tool evaluating location-specific danger via instantaneous maximum wind acceleration. A DNN trained on historical typhoon data challenges conventional severity metrics — providing clearer impact insights for travel, logistics, and emergency planning.