Bach Nguyen — AI Engineer

IT & Management student at Ming Chuan University building LLM and NLP systems — from fully-local RAG pipelines and PhoBERT fine-tuning to ML models that ship as live Streamlit apps.

Tools & technologies

OpenAI Anthropic LangChain MCP Qdrant Supabase

// Selected work

Local RAG System — private LLM knowledge assistant

A fully private, 100% local Retrieval-Augmented Generation system built on Llama 3.1 and ChromaDB, with advanced retrieval to minimize hallucinations and a ChatGPT-like chat interface.

Engineered a hybrid-search + reranking pipeline with semantic chunking; runs entirely offline via Ollama with multi-file processing and multi-turn conversation memory.

llama 3.1 chromadb langchain chainlit ollama

Student Feedback Sentiment Analysis — fine-tuning PhoBERT

An NLP sentiment classifier fine-tuned from PhoBERT on the UIT-VSFC corpus, sorting 16,000+ Vietnamese student feedbacks into positive, neutral, and negative.

Fine-tuned vinai/phobert-base with PyTorch & Hugging Face Transformers using specialized Vietnamese tokenization, then deployed a live web app on Streamlit Cloud backed by the Hugging Face Hub.

pytorch transformers phobert streamlit cloud

F1 Race Outcome Prediction — telemetry-driven ML

A data-driven machine learning project predicting Formula 1 finishing positions from 2018–2025 telemetry, weather, and driver-performance data aggregated via the FastF1 API.

Engineered high-impact features (constructor reliability, driver consistency) and tuned models with Optuna to reach a Mean Absolute Error of ~3.0, served through a Streamlit analytics dashboard.

python xgboost optuna streamlit

// Research interests

Where my curiosity goes

The problems I keep coming back to — building language-model systems that are reliable, private, and measurably better.

Retrieval-Augmented Generation

Hybrid search, reranking, and semantic chunking to feed LLMs the right context and cut hallucinations — with privacy-first, fully-local deployment.

rag
Model fine-tuning

Adapting transformer models like PhoBERT to domain- and language-specific tasks, with a focus on Vietnamese and low-resource NLP.

nlp
Data pipeline automation

Building reliable ingestion, cleaning, and feature pipelines so models are trained and served on data you can actually trust.

mlops
Feature engineering

Turning raw, abstract signals into high-impact, model-ready features — the quiet work that often moves metrics more than the model does.

ml

“The hard part of AI engineering isn’t the model — it’s the quiet machinery of evaluation that tells you whether anything actually got better.”

— research philosophy

// Education & languages

Background

Taoyuan, Taiwan

Information Technology & Management

Ming Chuan University

GPA 3.9 / 4.0 · Relevant coursework: Introduction to Artificial Intelligence, Cloud Computing, Data Structures.

Languages

English · 中文 · Tiếng Việt

English — Fluent (IELTS 7.0) · Chinese — Beginner (TOCFL A2) · Vietnamese — Native.

// Get in touch

Let's build something
worth shipping.