Milestone 1: Project Proposals & Team Formation
Key Dates
- Due: 09/29
- Staff feedback: TBD
Overview
For the first milestone, your team will propose a project that aligns with your personal, professional, and academic interests and passions.
Allowing you to propose your own projects will enhance your engagement and lead to better learning outcomes. This approach will also foster independence, critical thinking, and creativity, preparing you for real-world scenarios where you may be required to initiate and lead your own projects. Call on your inner data scientist and take charge of your project experience.
Objectives
Complete the four steps below to submit a valid project proposal.
1. Create Teams (Groups of 3–4 Students)
- Platform for team formation: You may use the Ed platform to find teammates. Alternatively, you may form teams independently.
- Team registration: Once you have finalized your team, please enter your team name and the names of all team members in this shared spreadsheet.
2. Submit Statement of Work (Project Proposal)
- Title and Authors: An engaging, relevant, and informative title, plus names of all team members with their email addresses.
- Background and Motivation: Brief background on the topic you have chosen. Explain why you find it interesting or important, and mention any previous background, research interests, or readings that have influenced your choice.
- Problem Statement (short): Clearly outline the problem or question your project aims to solve in one or two sentences. You will expand on scope and objectives in Step 4.
3. Discuss Data Sources
Data is the backbone of any data science project, making it crucial to identify appropriate datasets for your work. Your Statement of Work must address:
- Source of data: Where the data comes from (e.g., public repository, generated by the team, etc.).
- Description of dataset: Brief overview of what the dataset contains (time-series, images, text, etc.).
- Key attributes: The variables or features most relevant to your problem.
- Relevance to the project: How the data is suited to solving the problem you’ve posed.
- Data quality concerns: Any potential challenges (missing data, inconsistencies, merging multiple datasets) and your preliminary plan to tackle them.
Statements of Work that do not include information on available and relevant data will not be accepted.
4. Define Scope and Preliminary Design
For a project to be considered comprehensive, it should ideally include a few of the following minimum components:
- Large or heterogeneous data: A sizable or diverse dataset that requires careful handling and processing.
- Scalability: Consider how your solution will scale for many users.
- Complex models: Explore models that are challenging to train, showcasing your understanding of MLOps challenges.
- Computationally expensive inference: If your project involves inference models, they should be computationally intensive.
Your Statement of Work must also include:
- Scope and objectives: Expand on your Problem Statement by clearly outlining the boundaries of your project and listing the primary goals or outcomes.
- Learning emphasis: Opt for models and methods that your team understands.
- Application mock design: Preliminary design or sketch for the application (wireframes or a more detailed prototype).
- Research and development: References to papers, blog posts, or other scholarly materials that support your project.
- Fun factor: Choose a topic or approach that makes the process engaging for your team.
- Limitations and risks: Anticipated challenges such as data quality issues or technical constraints.
- Milestones: Key milestones for both your project and application development, with tentative deadlines.
Deliverables
- Deliverable: A Statement of Work (SOW) proposal
- Length: 1–2 pages
- Format: PDF
- Submission: Upload via Canvas
- Team registration: Ensure your team name and members are listed in the shared spreadsheet
⚠️ Proposals without (a) data (Step 3) or (b) scope and preliminary design (Step 4) will not be accepted.
Sample Proposal
Below is a sample submission for reference.
ButterFlyer

Title and Authors
- Title: ButterFlyer
- Authors: Pavlov Protovief, Paolo Primopadre, Pablo El Padron
- Contact: pavlos@pleasedonotemailme.com
Background and Motivation
Butterflies are ecologically important and widely recognized as indicators of biodiversity. Identifying butterfly species in the wild can be challenging for non-experts. This project aims to combine computer vision with natural language processing to create an engaging educational tool.
Problem Statement (short)
Develop an application that can identify various species of butterflies in the wild using computer vision and provide educational content through a chatbot interface.
Data Sources
- Source: Open-source databases, user-generated content from platforms like iNaturalist, and field data collection by team members.
- Description: Images of various butterfly species, annotated with species names, geographic location, and date.
- Key attributes: High-resolution labeled images with metadata (species, location, time).
- Relevance: Essential for training a robust computer vision model capable of accurate species identification.
- Data quality concerns: Some images may be poorly labeled or low quality. We plan to clean these or supplement with additional data.
Scope and Objectives
- Species identification via image recognition + chatbot educational support.
- Objectives:
- Collect and preprocess a diverse dataset of butterfly images.
- Develop a computer vision model to identify butterfly species.
- Implement a scalable backend to handle multiple queries simultaneously.
- Design an intuitive and user-friendly frontend.
- Integrate a chatbot for answering user questions about butterflies.
Minimum Components for a Good Project
- Large data: Varied dataset of butterfly images from multiple sources.
- Scalability: Backend must support multiple users simultaneously.
- Complex models: Deep learning–based computer vision for species recognition.
- Computationally expensive inference: Optimize inference to minimize latency.
Learning Emphasis
The project emphasizes convolutional neural networks for image recognition and NLP methods for chatbot interaction, both directly related to course concepts.
Application Mock Design
- Interface 1: Camera interface for capturing butterfly images.
- Interface 2: Chatbot interface to provide educational information.
(Wireframe here.)
Research and Development
References go here.
Fun Factor
Combining biodiversity with AI makes this project enjoyable and creative, blending technology with real-world ecological exploration.
Milestones
- Data collection and preprocessing — [Tentative Deadline]
- Computer vision model development — [Tentative Deadline]
- Backend implementation — [Tentative Deadline]
- Frontend development — [Tentative Deadline]
- Chatbot integration — [Tentative Deadline]
- Final testing and deployment — [Tentative Deadline]