Milestone 2: MLOps Infrastructure
Building Containers and ML Components
Key Date
- Due: 10/16
Overview
This milestone focuses on building the core infrastructure for your MLOps projects. You will set up working environments and containerize components for reproducibility.
- LLM Teams: Build a RAG workflow, including data collection, chunking, and integration with a vector database.
- Vision / Other Teams: Create a robust data setup, fine-tune models for the specific task, and run documented experiments on different model architectures.
By the end, each team should have the required components in place and a mock-up of the final application (refining or extending Milestone 1).
Template Repository
Objectives & Deliverables
1. Virtual Environment Setup
- Create virtual machines/environments to support containers.
- Provide clear setup documentation.
Deliverable:
- Screenshot of running instances (cloud or local).
2. End-to-End Containerized Pipeline
- Containerize each component (e.g., ingestion, preprocessing, RAG steps, etc.).
- Compose them into a single pipeline that runs start-to-finish.
- The pipeline must be runnable with one command (e.g.,
docker compose up
ormake run
).
Deliverables:
- Dockerfiles + build instructions
pyproject.toml
(usinguv
) for each container- Scripts or
docker-compose.yml
(when applicable) - Documentation explaining the pipeline and exact run instructions
- Evidence it works end-to-end (logs + a small sample input → output artifact)
3. Teams Using LLMs
- Implement a RAG pipeline with data collection, chunking, and vector database integration.
Deliverables:
- Containerized RAG pipeline with scripts for chunking, vectorization, and DB integration
- Documentation of the pipeline design and usage
- Logs showing pipeline runs with sample data
#### 4. Teams Using Vision or Other Modalities
- Create a robust data ingestion and preprocessing pipeline.
- Fine-tune models for your specific task.
- Experiment with different model architectures and document results.
Deliverables:
- Containerized pipeline for data ingestion and preprocessing
- Model fine-tuning scripts with documentation of datasets, hyperparameters, and model versions
- Experiment logs showing results across models, architectures, or techniques
5. Application Mock-up
- Submit a working prototype or mock-up.
- If submitted in Milestone 1, refine or extend it based on feedback.
Deliverables:
- Application mock-up or wireframe showing UI and back-end interaction