Lessons learned in Machine Learning in Portfolio Management
As we are now in our third year of optimizing Portfolio Management using Machine Learning and created hundred thousands of predictions along the journey, we are happy to share some of our amazing successes as well as some dreadful disappointments and share our learnings.
So what is it that Machine Learning can do for Portfolio Management? Machine learning can give an early heads up to relevant issues in project and portfolio management. For example some of the projects in your portfolio will not be finished within budget. Machine Learning can help you predict the likelihood for a budget overrun for each of the ongoing projects. Besides budget overrun Machine Learning can also predict the likelihood to not deliver on time or even the likelihood of cancellation of a project on the current roadmap. It can also be set up for predicting the actual value delivered by projects such as the chance that a project delivers less than e.g. 10% of its expected value.
Ever since Nobel laureate Harry Markowitz published his essay on modern portfolio theory this topic has received a lot of attention. It was mapped from shares to projects where portfolio management is common nowadays. The reason why Markowitz’ work was so appealing was that he diversified investments to maximize return given a certain level of risk. He showed us how this can be realized over an efficient frontier mapping the expected return to accommodating risk.
This key concept of the god father of portfolio management, the efficient frontier, is hardly known by modern project portfolio managers. But how about the input parameter for this model: the risk component? A lot of organizations struggle or even gave up the fight to project the risk of an investment in a project. Now, 50 years after Markowitz’ publication, we finally can predict the risk of a project. Thanks to the introduction of Machine Learning in Portfolio Management.
What does success in Machine Learning in Portfolio Management look like?
With Machine Learning, models can be built to predict e.g. the likelihood of a project budget overrun and these models will generate a value between 0 (no budget overrun) and 1 (a budget overrun will occur) for each of your ongoing projects. Knowing this, you could ask yourself: can I trust this value? So if a project holds a likelhood of 0.8 according to the model, will it indeed need more budget than allocated? This diagnostic capability of the model can be described by the receiver operating characteristic (ROC). This is normally a figure between 0.5 and 1. ROC of 0.8 is seen as good, above 0.9 as very good whereas an ROC of 0.5 is like tossing a coin.
The models we created in our portfolio management solution, Uffective, delivered ROCs from 0.6 upto 0.9. So what are examples of a useless model (ROC < 0.7) and what of a successful model (ROC > 0.8)?
First the useless model. An agile Dutch software company with software development spread over four countries deploys new software releases nearly on a daily basis. After they optimized regression testing using Uffective they asked us to use the meta data of the newely produced software to predict the likelihood that the new software will create customer facing issues. The model we created had an ROC of 0.6 and did not help the customer to focus on the software with the highest likelihood of introducing failures. The input used in this model was only meta data on the software such as who specified the user story, who programmed the code and how long was needed for creating the software. To push the ROC so the model can add value for the customer we will dive into how often the programmer interrupted with other topics whilst programming and also in the actual code created. These new data points will be added as an input feature for the Machine Learning model.
For a large telecom operator we created a model to predict the likelihood of cancellation of a project. The model was tuned to reach an ROC > 0.9. Management as well as project leaders can use the output of the model for decision making as well as scenario analyzes on the project.
How to feed a Machine Learning model: features
The risk of a project is a mulitdimensional construct. The most basic component is on the cancellation of a project as an aborted project will never deliver the expected value. But also budget overruns or delivering later than plan will reduce the expected value. Often neglected but perhaps the most important risk is on finished projects not delivering the expected value.
Normally we start by building a model for the cancellation risk and these models use 20 upto 40 input features like budgetsize, expected value but also time to market, as well as the product owner, scrum master and programmers. The models are normally trained with more than 1.000 rows of data input although we had successfull models with as few as 600 projects.
One lesson learned is that the model improves in case you include the month in which the project was born as in some companies projects initiated during the annual budget cycle seems to hold a higher cancellation rate. Another lesson learned is that the ROC improves in case you train multiple models depending on the maturity of the project. These models can be focussed on the gates (so one model for all projects before the first gate, one before the second gate etc) but also on the sprint cycle (e.g. before starting or after finishing the first sprint).
Models on the more mature projects include features like the amount of days on hold and the time needed to mitigate a red signal. In more advanced environments we use the weekly report text from the projectleader to calculate the ‘sentiment’ of that text, this sentiment is then used as a feature.
We were very surprised to reach ROC > 0.9. We were surprised as these models did not evaluate the content of the project; the meta data seemed to hold a strong prediction capability.
How to use Machine Learning outputs
So after setting up a successful model you can start using the predictions. How do companies use the output of the Machine Learning models?
First they can open up predictions to the project leaders and these project leaders can start running scenarios by changing the input features. They can change the participants in the team but could also rerun the model in case the current red risks are closed. These excersizes normally lead to better data quality as cancelled projects normally hold low data quality. Scenario analyzes are supported by the feature importance function as this helps the colleagues to focus on the features with the highest impact both for reducing as well as increasing the cancellation likelihood.
The risk prediction can also be used to support decision making where gate keepers deciding on projects and allocating resources recieve this new input point. This new input helps them rationalizing decision making whilst using both dimensions, expected value as well as risk.
The model can of course be used for predictions for all ongoing projects. Interesting is what happens when predictions are presented to c level management. When sorted so the most likely to cancel projects are on top, then a ‘normal’ reaction to the first projects on the list is that the managers agree to the model and also think that the projects will be cancelled. This is an interesing finding as both a mathematical model as well as managerial judgement indicate that a project will be cancelled, why not relieve the team from the project and avoid the growth of sunk cost. On the other hand some pet projects will also be on the likekly to cancel list. This will disturb management and a predicted reaction is that they invite themselves to the steering committee or intervene in other ways with the project. As this reaction is a new variable not known by the model out Machine Learning engineers state that this reduces the predictabiliy of the model.
This of course is true but not an issue for out customers as most of them are not in the business of improving the receiver operating characteristic but in improving bottom line results and customer experience.