P2pLoans

From Wiki2

Title: Lending Club Loan Interest: Factors Contributing Beyond Those Related to FICO Score

Introduction:

The Lending Club is an online bank that claims to "cut the cost and complexities of bank lending and pass the savings on to borrowers.(how-peer-lending-works.action club). In this study we will use lending data on Loans made by Lending Club to find and quantify associations among the items that make up a borrowers profile. These data include information on 'Amount.Requested', 'Amount.Funded.By.Investors', 'Interest.Rate', 'Loan.Length', 'Loan.Purpose','Debt.To.Income.Ratio', 'State', 'Home.Ownership', 'Monthly.Income', 'FICO.Range', 'Open.CREDIT.Lines', 'Revolving.CREDIT.Balance', 'Inquiries.in.the.Last.6.Months'and 'Employment.Length'. All names have been changed to numbers to protect the innocent. The purpose of this study is to "identify and quantify associations between the interest rate of the loan and the other variables in the data set...taking into account the applicant's FICO score." prompt

Methods:

Statistical analysis techniques shall be applied using the 'R' r statistical software package running as a server on a Linux computer.
Data Collection
For our analysis we used sample of 2500 loans made the the Lending Club club program. Data was in the 'R' '.rda' format. Data was for the most part complete for each of the 2500 samples with data not available (<NA>) for some characteristics. Overall out of ~37,000 data items, less than 2% or approximately 660 were <NA>.
A small amount of pre-processing of the data was required. FICO data was reported as a range integer values with each range containg 4 values. This data was transformed to one integer value representing the center of that range. Interest rates were changed from character value to decimal number.
Exploratory Analysis
Exploratory analysis initially determined the strong negative correlation between interest rate and FICO fico score using scatter plots of all the samples. From there each of the other variables was plotted against interest rate. For non-numeric, qualitative data items, the mean interest rate for each category was determined and compared by plotting the data and by simple regression model.simple.
Statistical Modeling
Once candidate characteristics of the data that had strong correlation to interest rate were identified, multiple regression multiple was carried out using the 'R', 'lm' command. The 'lm' command used a formula of the type:
lm(LoansData$Interest.Rate ~ LoansData$FICO.Score + LoansData$Other.Loan.Attribute)
which represents a weighted linear regression model of the type:
<math>I_i = b_0 + \sum{b_j*A_j} +e_i</math>
where
<math>I_i</math> is the Interest Rate for the <math>i</math>th sample borrower
<math>b_0</math> is the base Interest.Rate for all borrowers with the same FICO score
<math>b_j</math> is the weight of <math>A_j</math>th attribute contributing to Interest Rate
<math>A_j</math> is one of the attributes (besides Interest.Rate and FICO.score)
<math>e_i</math> is everything else not accounted for in the model
Reproducibility
All analyses performed were run from an R markdown file p2pLoan.Rmd using the full data set

Results:

FICO Score has a strong negative correlation to Interest Rate with P<.001. Aside from FICO, the clearest correlation found in the data was the positive correlation between Amount.Funded.By.Investors or Amount.Requested and Interest.Rate for borrowers with the same FICO score with a P<.001. This is strange because you would assume that the more you qualify to borrow, the lower your interest rate. From the data, this does not seem to be the case.
Other significant correlations for numeric attributes is the positive correlation between Inquiries.in.the.Last.6.Months and Interest.Rate for people with the same FICO.Score with a P<.001, the positive correlation between Open.Credit.Lines and Interest.Rate for people with the same FICO.Score with a P<.001 and the positive correlation between Revolving.Credit.Balance and Interest.Rate for people with the same FICO.Score with a P<.001.
For qualitative attributes, the most striking correlation is that 36 month loans most always have a lower interest rate than 60 month loans with a p<.001 for people with similar FICO scores. People with similar FICO scores who have mortgages generally have a lower rate than those who rent (P=.01) and there is a barely significant correlation for rates for car loans and house and small-business loans rates. (p~.05, p~.03).

Conclusions:

Who knows what FICO puts in its credit-worthiness formula? One could guess that many of the items that correlate with interest rate for the same FICO score are actually attributes that are part of FICO's secret formula. In that case are not really elements if of a multiple regression linear model but are just part of FICO anyway. Inquiries.in.the.Last.6.Months, Revolving.Credit.Balance and Home.Ownership fall into that category and their affect is likely encapsulated in the FICO score.
FICO doesn't know what you are going to do once you have your score. So here interest rate is correlated with things having nothing to with interest rate. Amount.Funded.By.Investors, Amount.Requested Loan.Length and Loan.Purpose have correlations to interest rate that are INDEPENDENT of FICO score and so they are more interesting and informative.

References

<biblio force=false>

  1. club Lending Club, Suite 300 San Francisco, CA 94105, USA https://www.lendingclub.com/home.action
  1. r The R Project for Statistical Computing, version 2.15.2 (2012-10-26) -- "Trick or Treat" http://www.r-project.org/
  1. prompt Data Analysis Project 1, Coursera, Data Analysis, instriuctor - Jeff Leek, winter 2013, https://class.coursera.org/dataanalysis-001/human_grading/view/courses/294/assessments/4/submissions
  1. simple Simple Linear Correlation and Regression, Coursera, Data Analysis, Jeff Leek, week 4 lecture and notes

and Datahttp://ww2.coastal.edu/kingw/statistics/R-tutorials/simplelinear.html

  1. multiple Multiple Regression, Coursera, Data Analysis, Jeff Leek, week 4 lecture and notes
  2. fico FICO(tm) a proprietory measure of credit-worthiness http://www.fico.com/en/Pages/default.aspx

</biblio>