Project Milestone 1 (the Promytheus phase): Project Proposals & Team Formation

Key dates:

  • Project proposals due: 09/25
  • Staff feedback: 09/29

Objectives

For the first milestone, your team will propose a project that aligns with your personal, professional, and academic interests and passions.

Allowing you to propose your own projects will enhance your engagement and lead to better learning outcomes. This approach will also foster independence, critical thinking, and creativity, preparing you for real-world scenarios where you may be required to initiate and lead your own projects. Call on your inner data scientist and take charge of your project experience.


Step 1: Create Teams (Groups of 3–4 Students)

Platform for Team Formation:
You may use the Ed platform to find teammates. Alternatively, you may form teams independently.

Team Registration:
Once you have finalized your team, please enter your team name and the names of all team members in this shared spreadsheet.


Step 2: Submit Statement of Work (Project Proposal)

Components of the Statement of Work

Title and Authors

  • Title: An engaging, relevant, and informative title that captures the essence of your project.
  • Authors: Names of all team members and their respective email addresses.

Background and Motivation

  • Provide a brief background on the topic you have chosen. Explain why you find it interesting or important, and mention any previous background, research interests, or readings that have influenced your choice.

Problem Statement (short)

  • Clearly outline the problem or question your project aims to solve in one or two sentences. You will expand on scope and objectives later in Step 4.

Note: In Step 3, you will describe the data you will use to address this problem.


Step 3: Discuss Data Sources

Data is the backbone of any data science project and therefore of any MLOps project, making it crucial to identify appropriate datasets for your work. In your Statement of Work, you must address the following aspects regarding data:

Source of Data

  • Identify where the data comes from (e.g., public repository, generated by the team, etc.).

Description of Dataset

  • Offer a brief overview of what the dataset contains. Is it time-series data, images, textual data, etc.?

Key Attributes

  • Describe the variables or features most relevant to your problem.

Relevance to the Project

  • Explain how the data is suited to solving the problem or question you’ve posed. Why is this dataset useful or relevant?

Data Quality Concerns

  • If applicable, indicate any potential challenges related to data quality (e.g., missing data, inconsistencies, or merging multiple datasets). Mention your preliminary plan to tackle these issues.

Important Note
Statements of Work that do not include information on available and relevant data will not be accepted.


Step 4: Define Scope and Preliminary Design

The scope of your project is largely up to you and your team. Whether it’s simple or complex, the aim should be to align with the course’s learning objectives. However, for a project to be considered comprehensive, it should ideally include a few of the following minimum components:

Minimum Components for a Good Project

  • Large or Heterogeneous Data: Your project should involve a sizable or diverse dataset that requires careful handling and processing.
  • Scalability: Consider how your solution will scale for many users, particularly in the application you intend to build.
  • Complex Models: The project should explore models that are challenging to train, showcasing your understanding of MLOps challenges.
  • Computationally Expensive Inference: If your project involves inference models, they should be computationally intensive to align with real-world challenges.

Scope and Objectives

  • Expand on your Problem Statement from Step 2 by clearly outlining the boundaries of your project and listing the primary goals or outcomes. These should align with the minimum components outlined above.

Learning Emphasis

  • Opt for models and methods that your team understands. The project should reflect your grasp of course concepts.

Application Mock Design

  • Include a preliminary design or sketch for the application you intend to develop. This could range from simple wireframes to a more detailed, clickable prototype.

Research and Development

  • Reference papers, blog posts, or other scholarly materials that support your project and align with your objectives.

Fun Factor

  • The project should also be enjoyable. Choose a topic or approach that makes the process engaging for your team.

Limitations and Risks

  • Discuss any anticipated challenges or limitations, such as data quality issues or technical constraints.

Milestones

  • List key milestones for both your project and application development. Include tentative deadlines if possible.

Deliverables & Submission Guidelines

  • Deliverable: A Statement of Work (SOW) proposal
  • Length: 1–2 pages
  • Format: PDF
  • Submission: Upload via Canvas
  • Team Registration: Ensure your team name and members are listed in the shared spreadsheet

⚠️ Proposals without (a) data (Step 3) or (b) scope and preliminary design (Step 4) will not be accepted.


Sample Proposals

Below is a sample submission for reference:


ButterFlyer

butterfly

Title and Authors

  • Title: ButterFlyer
  • Authors:
    • Pavlov Protovief
    • Paolo Primopadre
    • Pablo El Padron
  • Contact: pavlos@pleasedonotemailme.com

Background and Motivation

Butterflies are ecologically important and widely recognized as indicators of biodiversity. Identifying butterfly species in the wild can be challenging for non-experts. This project aims to combine computer vision with natural language processing to create an engaging educational tool.

Problem Statement (short)

Develop an application that can identify various species of butterflies in the wild using computer vision and provide educational content through a chatbot interface.

Data Sources

  • Source: Data will be collated from open-source databases, user-generated content from platforms like iNaturalist, and field data collection by team members.
  • Description: Images of various butterfly species, ideally annotated with species names, geographic location, and date.
  • Key Attributes: High-resolution labeled images with metadata (species, location, time).
  • Relevance: Essential for training a robust computer vision model capable of accurate species identification.
  • Data Quality Concerns: Some images may be poorly labeled or low quality. We plan to clean these or supplement with additional data.

Scope and Objectives

Expanded from Problem Statement (Step 2):

  • Species identification via image recognition + chatbot educational support.
  • Objectives:
    1. Collect and preprocess a diverse dataset of butterfly images.
    2. Develop a computer vision model to identify butterfly species.
    3. Implement a scalable backend to handle multiple queries simultaneously.
    4. Design an intuitive and user-friendly frontend.
    5. Integrate a chatbot for answering user questions about butterflies.

Minimum Components for a Good Project

  • Large Data: Varied dataset of butterfly images from multiple sources.

  • Scalability: Backend must support multiple users simultaneously.

  • Complex Models: Deep learning–based computer vision for species recognition.

  • Computationally Expensive Inference: Optimize inference to minimize latency.

Learning Emphasis

The project emphasizes convolutional neural networks for image recognition and NLP methods for chatbot interaction, both directly related to course concepts.

Application Mock Design

  • Interface 1: Camera interface for capturing butterfly images.

  • Interface 2: Chatbot interface to provide educational information.

    (Wireframe here.)

Research and Development

Reference go here

Fun Factor

Combining biodiversity with AI makes this project enjoyable and creative, blending technology with real-world ecological exploration.

Milestones

  1. Data collection and preprocessing – [Tentative Deadline]
  2. Computer vision model development – [Tentative Deadline]
  3. Backend implementation – [Tentative Deadline]
  4. Frontend development – [Tentative Deadline]
  5. Chatbot integration – [Tentative Deadline]
  6. Final testing and deployment – [Tentative Deadline]