We’re excited to bring back Transform 2022 in person on July 19 and virtually from July 20-28. Join leaders in AI and data for in-depth discussions and exciting networking opportunities. Register today!
For those who understand its real-world applications and potential, artificial intelligence is one of the most valuable tools we have today. From disease detection to drug discovery to climate change models, AI continuously delivers the insights and solutions that help us address the most pressing challenges of our time.
In financial services, one of the biggest issues we face is inequality in financial inclusion. While this inequality is due to many factors, the common denominator in each case is likely data (or lack thereof). Data is the lifeblood of most organizations, but especially so for organizations looking to implement advanced automation through AI and machine learning. It is therefore incumbent on financial services organizations and the data science community to understand how models can be used to create a more inclusive financial services landscape.
Give a hand
Lending is an essential financial service today. It generates revenue for banks and loan providers, but also provides a basic service for individuals and businesses. Loans can provide a lifeline during tough times or be the boost needed for a fledgling start-up. But in each case, the lending risk must be assessed.
The majority of default risk is now calculated using automated tools. Increasingly, this automation is provided by algorithms that significantly speed up the decision-making process. The data that informs these models is plentiful, but as with any decision-making algorithm, there is a tendency to provide accurate results for a majority group, leaving some individuals and minority groups at a disadvantage, depending on the model used.
This business model is, of course, unsustainable, which is why lenders must consider the most nuanced factors to make “the right decision”. With the boom in demand for loans, particularly as point-of-sale loans such as buy-now-pay-later loans offer flexible new ways to obtain credit, there is now abundant competition in the sector, with traditional lenders, challengers and fintechs all vying for market share. As regulatory and social pressure continues to grow around justice and fair outcomes, organizations that prioritize and codify these principles into their business and data science models will become increasingly attractive to customers.
Building for Equity
When a loan risk model rejects applications, many unsuccessful applicants may implicitly understand the logic behind the decision. They may have applied knowing they were unlikely to meet the acceptance criteria or simply miscalculated their eligibility. But what happens when a member of a minority group or an individual is rejected, based on the fact that he does not belong to the majority group on which a model was formed?
Customers don’t have to be data scientists to understand when an injustice — algorithmic or otherwise — has occurred. If a small business owner has the means to repay his loan, but is rejected for no apparent reason, he will be rightly upset at the mistreatment he is experiencing and may seek out a competitor to provide him with the services he requires. Additionally, if customers from a similar background are also being rejected unfairly, there is potentially something wrong with the model. The most common explanation here is that the bias somehow crept into the model.
Recent history has shown insurance companies using machine learning for insurance premiums that discriminate against older people, online price discrimination, and even product customization driving minorities toward higher rates. The cost of these egregious errors has been severe reputational damage, with customer trust irretrievably lost.
This is where there now needs to be a refocusing of priorities within the data science and financial services communities, which elevates fair outcomes for all above high-performing models that work for the majority. We need to look to prioritize people in addition to model performance.
Elimination of bias in models
Despite regulations that rightly prevent the use of sensitive information in decision-making algorithms, injustice can creep in through the use of biased data. To illustrate how this is possible, here are five examples of how data bias can occur:
- Missing data – This is where a data set is used that may be missing certain fields for particular groups of the population.
- Sample bias – The sample datasets chosen to train models do not accurately represent the population that users intended to model, which means the models will be largely blind to certain minority groups and individuals.
- Exclusion bias – This is when data is removed or not included because it is deemed unimportant. That’s why strong data validation and diverse data science teams are essential.
- Measurement bias – This occurs when the data collected for training does not accurately represent the target population or when erroneous measurements lead to data distortion.
- Labeling Bias – A common pitfall in the data labeling stage of a project, labeling bias occurs when similar types of data are labeled inconsistently. Again, this is more of a validation issue.
While nothing on this list can be described as malicious bias, it is easy to see how bias can find its way into models if a robust framework that incorporates fairness is not included from the outset. a data science project.
Data scientists and machine learning engineers are used to very specific pipelines that have traditionally prioritized high performance. Data is at the heart of modeling, so we start every data science project by exploring our datasets and identifying relationships. We perform exploratory data analysis so that we can understand and explore our data. Then it’s time to enter the pre-processing stage where we mix and clean our data before starting the intense process of feature generation, which helps us create more useful descriptions of the data. We then experiment with different models, adjust parameters and hyperparameters, validate our models, and repeat this cycle until we have achieved the desired performance metrics. Once done, we can produce and deploy our solutions, which we will then maintain in production environments.
It’s a lot of work, but there’s a significant problem that’s not solved within this traditional model. At no point in this cadence of activity is the fairness of the model assessed, nor the bias of the data heavily explored. We need to work with experts in the field, including legal and governance, to understand what fairness means for the issue at hand and seek to mitigate bias from the root of our modeling, i.e. the data.
Simply understanding how biases can find their way into models is a good start when it comes to creating a more inclusive financial services environment. By checking ourselves against the points above and reevaluating our approach to data science projects, we can seek to create models that work for everyone.
Adam Lieberman is the Head of Artificial Intelligence and Machine Learning at Finastra
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including data technicians, can share data insights and innovations.
If you want to learn more about cutting-edge insights and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.
You might even consider writing your own article!
Learn more about DataDecisionMakers