Is the credit scoring written in your DNA?

Is the credit scoring written in your DNA?

Damian Sadowski 15 września 2017

On the use of alternative data for assessing creditworthiness.

Summary:

• Financial Institutions in the underwriting process, analyze whether one is likely to pay off the debt;

• To maximize profit, they should improve credit scoring models in terms of both true-positives (to minimize defaults) and false-negatives (to reduce the false rejection rate and therefore increase outstanding volume and profits);

• Traditional credit scoring is based on the credit history, basic socio-demographic data and simple analytical tools (such as logistic regression) when applying for a new loan. Some banks also use past repayment history of own credit and transactional history within current accounts to have better models for own customers, however logistic regression is used almost universally;

• Traditional models are quite good at minimizing default rate, however the rejection of good borrowers is still a problem – average approval rate in PL is around 50%, which could be improved by alternative underwriting solutions (because at least some of the rejections are due to non-traditional income source), creating huge economic opportunities;

• Having recognized the need for alternative scoring to a) cover underbanked population and new generations whose behavior varies greatly from older and they fail to be covered by traditional scoring based on credit history b) improve the accuracy of current models, dozens of alternative scoring companies have emerged in the recent years (the article presents 30+ such companies);

• Alternative scores are based on:

- Behavioral data – behavior as a single entity (transactional data / personality scoring based on psychometric tests and form filling activity / analyses of mobile activity - including GEO data to derive work/home patterns of activity);

- Social data - behavior in the social space (messaging patterns / social interactions / sharing activity and professional profile);

• Alternative data gives insights about the perspectives of the industry, career’s progress, ambitions, steadiness of one’s approach to life and therefore affordability to pay the loan off. What is more – it gives insights about morality and how much one cares about being right and repaying the debt;

• Future sources of data which could be used for improving accuracy of underwriting models:

- genome analyses to provide information both about health condition which may end in default and personality traits responsible for propensity to repay debt;

- brain scanning to reveal character traits responsible for defaults;

- advanced social-graph scoring to assess creditworthiness based on similarity to members of other network users (closeness of profiles with similar risk attitude) and the profiles of close members (which we have a lot of interactions with).

The way you move your mouse may tell whether you will repay the loan. Well, maybe not exactly. However, if Facebook can predict the behavior better than your spouse why can’t we use social data to mine it for insights on creditworthiness? And which other sources of data that traditional banks would never think of might prove useful?

Scoring – why & how

When you apply for a loan and before it is granted, financial institutions (banks / payday loans companies / FinTechs) assess your creditworthiness to evaluate the likelihood of your default. This usually means one-year default chance, because most defaults in unsecured portfolios happen during the first year. The second and further years are usually defaults attributed to chance – like deaths, divorces or random unemployment. On one hand, lenders want to know, whether you will have enough funds to pay it off and on the other – whether you are willing to repay debts. The higher the probability you will default due to either of reasons, the more they charge you to compensate the cost of risk and if it exceeds certain threshold (varying between institutions and depending on their risk appetite), you will get rejected.

Underwriting is a game between financial institutions and borrowers – where weak scoring models may cause that both sides lose and proper models are a win-win situation.

Model is weak when:

  1. It accepts too many borrowers who later default
  2. It rejects too many borrowers who wouldn’t default

So as current banking models are quite good at the first one – default rates of consumer credit are around 4% (according to Polish Credit Bureau), the rejection of good borrowers is still a problem – average approval rate is around 50% (where many of the remaining 50% of applicants would turn out good borrowers as at least some of the rejections are due to non-traditional income source).

Alternative finance companies like Creamfinance and Wonga have much shorter prediction time - their models are usually focusing on 1-2 month chance to repay – sometimes 3-6 months – and their risk appetite is very high – reflected in the cost of the loan and default rates of around 12% (according to Polish Credit Bureau).

Financial institutions set the cut-off rate and decide whether they want to accept more risky consumers, at the same time minimizing the false rejection rate or whether they want to minimize the risk, at the same time rejecting borrowers who would turn out to be good.
Using alternative sources of data to improve model accuracy could shift the acceptance rate while keeping the default rate at the same level (increasing the classifier “recall” as they say in jargon) or reducing the default rate while keeping the same acceptance rate.

 

There are multiple reasons for the situation, among others: changing behavior patterns, lack of credit history, freelancer jobs etc. Therefore, creating models based on alternative data can both improve the accuracy of assessing currently accepted borrowers but mainly “recycling” the currently rejected ones.

Scoring 1.0 - Traditional scoring – “You paid in the past, you will pay in the future”

Banks for decades have used past credit history for the underwriting process. They have compared the credit history of people who did and did not repay their debt with yours. Simple analytical tools such as logistic regression show correlation between certain patterns of behavior with defaulting. Not only the number and volume of loans and due payments matter, but also how often did you ask for a loan in banks and many other credit-related variables (how fast you apply for additional credit / how much credit is used and repaid or not repaid / etc.). Analyses of the credit history have often been supplemented with basic socio-demographic data (level of education / profession / type of employment contract / age / marital status / region of residence) to improve underwriting and enable scoring of people with little or no history.

BONUS: Traditional credit data may be analyzed with advanced machine learning tools to improve the accuracy of models.

Companies: underwrite.ai

For people with sufficient credit history, it provides quite accurate results, however fails to underwrite people applying for their first credit and does not consider the big picture (i.e. late payment due to hospitalization is just as bad as due payment due to negligence).

Scoring 1.1 – Expanded traditional scoring – based on other repayment data

Working on the same principle, companies use bills and repayment data from other sources – insurance / phone bills/ rent and utility bills and apply similar methodology to evaluate credit scoring. In countries like US and UK these data is often reported to credit bureaus and also affect the traditional score.

Companies: ecredable / prbc

Scoring 2.0 - Alternative scoring – “You are more than just your credit history”

Having recognized the need for alternative scoring, dozens of alternative scoring companies have emerged in the recent years.

I have grouped them in 2 main categories:

  1. Behavioral data – how you behave as a single entity (transactional data / personality scoring based on psychometric tests and form filling activity / analyses of mobile activity - including GEO data to derive work/home patterns of activity);
  2. Social data - how you behave in the social space (your messaging patterns / social interaction / sharing activity and professional profile).
Alternative data show much more than just your repayment patters – they give insights whether you are involved in the perspective industry, whether your career is progressing rapidly, whether you are ambitious, your approach to life is “steady” and you will afford paying the loan off. What is more – they give insights about your morality and how much you care about being right and repaying the debt.

Behavioral data

Transactional data (bank account/e-commerce)

For quite some time, banks use transactional history (your behavior within the account – including both your incomes and outcomes) for the underwriting. It is possible if you are a customer of a certain bank or if you grant access to your account history. Other option is the e-commerce history data, which provide information on you buying patterns.

Companies: Sesame Credit / Zest Finance / Credit Kudos / CreditVidya / Float

Personality scoring

Companies perform Virtual Interviews to measure borrowers’ character, abilities and willingness to repay debt. Based on their beliefs, ethics, intelligence, honesty, financial maturity, lifestyle and attitudes, they perform more personalized underwriting. Even the way you navigate through the application form (i.a. whether you read rules and terms or where do you pause your mouse) or whether you use capital letters may reflect your creditworthiness.

Companies: EFL Global / Shared Lending / aire

Mobile Activity

Mobile phones, due to its intimacy with the user, are believed to be the truest reflection of people’s inner self. Therefore, companies perform underwriting on many variables such as: whether the applicant use gambling apps, save people in the contact list with full names and surnames, recharge their telephone frequently, how many messages does a person receive or even how much a person is travelling each day and GPS data to derive work/home patterns and additional activity to better understand the person behind.

Companies: Juvo / Tala / Saida / Branch / Tiaxa / First Access

Social data

Interactions with people

Based on the either mobile, social media or e-mail interactions (i.a. messages / network of friends and followers / received likes / games activity) or

Interests & Sharing activity

Based on what and how you share & like on Facebook, LinkedIn, Twitter, Instagram, WeChat (representing your interests, beliefs etc.) and i.a. your browser history, companies try to build a comprehensive understanding of who you are in the social space and whether you possess traits representative of a trust-worthy borrower. And that is really interesting because what we usually share is a “better us” - we almost never share sad/bad things to appear better than we really are, but isn’t it true for almost all social situations?

Companies: Lenddo / Kreditech / Friendly Score / Big Data Scoring, China Rapid Finance / Wecash / Hello Soda / Trusting Social / Affirm / Neener Analytics / DemystData / Cignifi

BONUS: Social network may provide the guarantor’s role as i.e. “Honk Kong’s Lenddo asks your closest friends to vouch for your trustworthiness, and penalizes their credit scores should your payments go delinquent” (source).

Professional profile

Based mainly on LinkedIn data, companies analyze professional career including schools attended, academic performance, areas of study, work experience and professional network for the underwriting process. That data gives insights about the perspectives of industry one is involved in, career progression, ambitions and therefore the ability to repay the loan.

Companies: earnest / Neo / Upstart

Scoring 3. 0 - Future

Basically, any data could be used for the credit underwriting, however there are few possible ideas that may prove to be useful in improving the accuracy of models in terms of assessing propensity to repay debt:

a) DNA scoring*

As far as underwriting is concerned, DNA may help in two ways:

b) Brain scanning*

Thanks to MRI (Magnetic Resonance Imaging), we can examine brain structure and there exists a correlation between personality traits and the volume of different brain regions according the research.

c) Advanced social-graph scoring - "You are the average of the five people you most associate with".

Although social scoring models analyze types and the number of interactions, they still fail to capture the idea of multi-layer social underwriting, where your scoring is influenced by your closest surrounding. The idea behind this is simple and may in some way replace the guarantor’s role as is by the Lenddo’s vouching system. In particular, deep learning may be useful in graph analyses – especially temporal graphs representing social interactions, however this field of study is still very young.

Alternative scoring unleashes multi-billion industry for financial institutions as they can score borrowers more precisely thanks to multi-variable deeper understanding and cover generations which fail to be scored by current models due to lack (2.5 billion of underbanked in the world) or changing credit patterns.

Additionally, modern machine learning techniques may further increase the accuracy of the scores – including ensemble modeling – that is already proving to be beating all the algorithms single-handedly thanks to combining models operating on the different sources of data.

For consumers, it will allow to get a fair credit offer, even without a long and shining credit history - based on one’s personal traits. It will also eliminate the negative influence of mistakes from the credit history (both intentional and unintentional) and more precisely predict current creditworthiness.

Whichever way the underwriting is going to evolve, I believe it will more likely resemble “Minority Report” than what the traditional credit score used to be.

__________________________

* Author does not take legal and ethical issues into consideration.
Disclaimer: companies listed in each category may take into consideration data from multiple categories and their attribution is based on the research and suitability assessment performed solely by the author.
Location icon Facebook icon Twitter icon Google+ icon LinkedIn icon Technology icon Business icon Marketing icon Phone icon Mail icon User icon Tag icon Bubble icon Arrow right icon Arrow left icon Calendar PR Contact