Optimizing Commodity Prices: Anomaly Detection and Maximizing Profitability for Farmers using 2-D Manifold Analysis
Pavlos Protopapas (pavlos@seas.harvard.edu) and Raphael Pellegrin (raphaelpellegrin@g.harvard.edu)
Senecal Capital is a financial risk analytics company with two main areas of expertise: an internal hedge fund and a data science consulting service. It is a startup focused on developing a world-class framework for addressing cross-disciplinary data challenges, with a specific emphasis on issues that arise in financial markets. They utilize a modern, cloud-based, python-based infrastructure to address a wide range of intriguing issues, ranging from portfolio optimization to assisting farmers across the country in obtaining better prices for their grain.
The objective of this capstone project is to create a framework for analyzing data that is meant to lie on 2-dimensional manifolds, also known as surfaces, and identifying abnormalities. These surfaces are commonly found in finance, with examples such as implied volatility and commodity prices by geographic location. The project's initial goal is to assist farmers in determining when grain prices in their local area are particularly low or high, with the ultimate aim of maximizing farm profitability. The project will involve time-series analysis, combinatorial optimization, and software engineering.
Unlike financial assets which have a single price, commodities that have to be physically transported have distinct prices at each location which account for many factors, including but not limited to, local supply & demand; proximity to logistical hubs like railroads, rivers, and ports; and national or regional freight costs. We have 30 years of historical data on prices by latitude and longitude. We assume that these prices should in theory, lie on an underlying manifold. The observed prices vary from the manifold due to some daily noise. The goal is to flag abnormal noise realizations. For those outliers, we need a tool to notify farmers and their advisors to do further research on making a sale or refraining from selling. This data should be delivered via a cloud-based reporting tool which generates interactive charts and PDF reports with actionable recommendations for clients. The student may also investigate building a dashboard or web application to deliver the data. The flagging mechanism should be flexible enough to handle various models (different assumptions about the underlying manifold) which can be increasingly complex as we refine the model. Additional refinements to the model can include taking in historical data on energy prices, freight rates, and pre-selected distances to locations of interest. By leveraging satellite data, one might also get insight about crop quality, weather forecasts, extreme weather events and elevator constructions - factors that might affect future crop prices. Time series analysis. We will leverage 30 years of daily data about grain elevators prices. Some avenues for exploration involve analysing the missing data and trying to fill in gaps. Seasonality trends might also be explored to improve the students’ models. It will also be interesting to try predicting prices’ future values given the historical data by leveraging the latest time series prediction techniques. Optimisation. An important aspect of the project involves combinatorial optimisation, a branch of optimisation where problems can be formulated or reformulated on graph structures. With additional data (such as transportation costs in different counties, road networks) one can build such an optimisation model that can then be used to inform farmers about the best location to sell their crops. Software engineering. To present the models, the students can opt to engineer a dashboard or a website. Desired functionalities include getting tailored statistics for a given location.
Pavlos Protopapas (pavlos@seas.harvard.edu) and Raphael Pellegrin (raphaelpellegrin@g.harvard.edu)
Senecal Capital is a financial risk analytics company with two main areas of expertise: an internal hedge fund and a data science consulting service. It is a startup focused on developing a world-class framework for addressing cross-disciplinary data challenges, with a specific emphasis on issues that arise in financial markets. They utilize a modern, cloud-based, python-based infrastructure to address a wide range of intriguing issues, ranging from portfolio optimization to assisting farmers across the country in obtaining better prices for their grain.
The objective of this capstone project is to create a framework for analyzing data that is meant to lie on 2-dimensional manifolds, also known as surfaces, and identifying abnormalities. These surfaces are commonly found in finance, with examples such as implied volatility and commodity prices by geographic location. The project's initial goal is to assist farmers in determining when grain prices in their local area are particularly low or high, with the ultimate aim of maximizing farm profitability. The project will involve time-series analysis, combinatorial optimization, and software engineering.
Unlike financial assets which have a single price, commodities that have to be physically transported have distinct prices at each location which account for many factors, including but not limited to, local supply & demand; proximity to logistical hubs like railroads, rivers, and ports; and national or regional freight costs. We have 30 years of historical data on prices by latitude and longitude. We assume that these prices should in theory, lie on an underlying manifold. The observed prices vary from the manifold due to some daily noise. The goal is to flag abnormal noise realizations. For those outliers, we need a tool to notify farmers and their advisors to do further research on making a sale or refraining from selling. This data should be delivered via a cloud-based reporting tool which generates interactive charts and PDF reports with actionable recommendations for clients. The student may also investigate building a dashboard or web application to deliver the data. The flagging mechanism should be flexible enough to handle various models (different assumptions about the underlying manifold) which can be increasingly complex as we refine the model. Additional refinements to the model can include taking in historical data on energy prices, freight rates, and pre-selected distances to locations of interest. By leveraging satellite data, one might also get insight about crop quality, weather forecasts, extreme weather events and elevator constructions - factors that might affect future crop prices. Time series analysis. We will leverage 30 years of daily data about grain elevators prices. Some avenues for exploration involve analysing the missing data and trying to fill in gaps. Seasonality trends might also be explored to improve the students’ models. It will also be interesting to try predicting prices’ future values given the historical data by leveraging the latest time series prediction techniques. Optimisation. An important aspect of the project involves combinatorial optimisation, a branch of optimisation where problems can be formulated or reformulated on graph structures. With additional data (such as transportation costs in different counties, road networks) one can build such an optimisation model that can then be used to inform farmers about the best location to sell their crops. Software engineering. To present the models, the students can opt to engineer a dashboard or a website. Desired functionalities include getting tailored statistics for a given location.