Climate Change Solutions AI Project
Pavlos Protopapas (pavlos@seas.harvard.edu) and MarcAntonio Awada (mawada@hbs.edu)
Project Overview
- The goal of this data analytics project is to apply Topic modeling (TOPICS) and natural language processing (NLP) to company’s 10-K company or product description to identify the extent of “climate solutions” products
- Timeline: Spring semester 6 months from 2022 February to
- Roles (to be decided)
- Professor George Serafeim/Professor Shirley Lu: manages research project
- MarcAntonio and Professor Pavlos Protopapas manage the data science research part of the project and Team / Pavlos manages student work
Climate Solutions
- Definition: The development, deployment, and scaling of new technologies, products, and services in a transition to a low-carbon economy
- Climate solutions examples: renewable energy, electrification of transportation and processes, battery technology, energy and process efficiency, circularity, new agricultural practices, and plant-based protein alternatives to meat
- Climate solutions should reflect a spectrum from traditional (GM) to pure play (Tesla), and changes over time (GM 2014 to 2021, see Exhibit 1 for examples)
Data Analytics
- Apply TOPICS and NLP to annual 10-K filing on item 1 business description
- The TOPICS will allow us to classify 11 different categories (from the Appendix Lexicon spreadsheet, General- Carbone capture) that capture “Climate Solutions”). Companies will have different probabilistic weighting among the 11 categories. This would give a good understanding of a company’s climate solution characteristics and its evolution in time.
- Output: For each firm-year, create a continuous measure (0 to 100) to capture the extent of description on “climate solutions”. Measured as the number of climate solution keywords divided by the total number of words.
- Robustness measure: give higher weight to earlier pages since it means the company is placing higher emphasis on climate solutions if mentioned earlier in item 1. Can also create a robustness measure restricted to the first two pages or first 500 words.
- Lexicon: Start with the lexicon used in George’s paper to classify topics that are “climate solutions” (see Attachment 1 and Exhibit 2), and then apply NLP to expand the lexicon
About the data
- Source: Baker library (contact person: Patrick Clapp handles the data; Alex Caracuzzo)
- 167,505 firm-year files in 28 folders for 279GB
- Firms: all US firms traded on US exchanges subject to SEC reporting requirements
- Years: 2002-2022
- Nature of document: 10k filing in .htm format (to extract Item 1 on business description)
- Specific text: Item 1 Business Description
- ‘“Business” requires a description of the company’s business, including its main products and services, what subsidiaries it owns, and what markets it operates in. This section may also include information about recent events, competition the company faces, regulations that apply to it, labor issues, special operating costs, or seasonal factors. This is a good place to start to understand how the company operates.’ (source: SEC)
- Usually around 15-20 pages
Research Opportunities
- RQ1: Are firms changing their business model to become closer to pure climate solution organizations?
- Establish this result, including validation tests (small sample with specific climate solutions products, CDP data, patents, recruiting)
- RQ2: What are the key drivers of this transition?
- External: political pressure, shareholder proxy votes, green investors
- Internal: technology capability/ breakthrough, management incentives
- RQ3: What are the consequences of this transition?
- Financial: Return on sales, stock return, sales growth, R&D or CAPEX over sales
- Non-financial: decarbonization
- Prediction 1: we may observe non-linear relation in return on sales
- Revenue: grows as demand for climate solutions grows for a 2-degree scenario
- Costs: go up as the firm transitions (e.g., having both EV and ICE), but goes down as the firm reach pure play (e.g., left with EV)
- Prediction 2: We might observe different consequences with different drivers of transition (external vs internal drivers)
Exhibit 1 Examples
General Motors: Transition from barely mentioning electrical vehicle in 2014, to highlighting electrical vehicle on the first page of item 1
- 2014: https://investor.gm.com/node/15981/html
- 2021: https://www.sec.gov/Archives/edgar/data/1467858/000146785822000034/gm-20211231.htm
AES: Transition from a “power generation and utility company” in 2012, to “an industry leader in developing and growing the solutions that will enable the transition to low-carbon sources of energy” in 2021
- 2012: https://www.sec.gov/Archives/edgar/data/874761/000119312513077580/d472985d10k.htm
- 2021: https://www.sec.gov/Archives/edgar/data/874761/000087476122000022/aes-20211231.htm
Exhibit 2 Lexicon
Appendix 1 includes 138 keywords (if include variation with hyphens, total 187 keywords) relating to climate solutions. 11 categories, 1 for general, and 10 in the following business areas: Agriculture and food; building and housing; carbon capture, utilization, and storage (CCUS); energy generation; energy storage; materials; nature-based solutions; recycling and circularity; and transportation.