P2pLoans
From Wiki2
Title: Lending Club Loan Interest: Factors Contributing Beyond FICO Score
Introduction:
- The Lending Club is an online bank that claims to "cut the cost and complexities of bank lending and pass the savings on to borrowers.(how-peer-lending-works.action club). In this study we will use lending data on Loans made by Lending Club to find and quantify associations among the items that make up a borrowers profile. These data include information on 'Amount.Requested', 'Amount.Funded.By.Investors', 'Interest.Rate', 'Loan.Length', 'Loan.Purpose','Debt.To.Income.Ratio', 'State', 'Home.Ownership', 'Monthly.Income', 'FICO.Range', 'Open.CREDIT.Lines', 'Revolving.CREDIT.Balance', 'Inquiries.in.the.Last.6.Months'and 'Employment.Length'. All names have been changed to numbers to protect the innocent. The purpose of this study is to "identify and quantify associations between the interest rate of the loan and the other variables in the data set...taking into account the applicant's FICO score." prompt
Methods:
- Statistical analysis techniques shall be applied using the 'R' r statistical software package running as a server on a Linux computer.
Data Collection
- For our analysis we used sample of 2500 loans made the the Lending Club club program. Data was in the 'R' '.rda' format. Data was for the most part complete for each of the 2500 samples with data not available (<NA>) for some characteristics. Overall out of ~37,000 data items, less than 2% or approximately 660 were <NA>.
- A small amount of pre-processing of the data was required. FICO data was reported as a range integer values with each range containg 4 values. This data was transformed to one integer value representing the center of that range. Interest rates were changed from character value to decimal number.
Exploratory Analysis
- Exploratory analysis initially determined the strong negative correlation between interest rate and FICO fico score using scatter plots of all the samples. From there each of the other variables was plotted against interest rate. For non-numeric, qualitative data items, the mean interest rate for each category was determined and compared by plotting the data and by simple regression model.simple.
Statistical Modeling
- Once candidate characteristics of the data that had strong correlation to interest rate were identified, multiple regression was carried out using the 'R', 'lm' command. The 'lm' command used a formula of the type <math>lm(LoansData$Interest.Rate ~ LoansData$FICO.Score + LoansData$Other.Loan.Attribute)</math>
Reproducibility
Results:
Conclusions:
References
<biblio force=false>
- club Lending Club, Suite 300 San Francisco, CA 94105, USA https://www.lendingclub.com/home.action
- r The R Project for Statistical Computing, version 2.15.2 (2012-10-26) -- "Trick or Treat" http://www.r-project.org/
- prompt Data Analysis Project 1, Coursera, Data Analysis, instriuctor - Jeff Leek, winter 2013, https://class.coursera.org/dataanalysis-001/human_grading/view/courses/294/assessments/4/submissions
- simple Simple Linear Correlation and Regression, Coursera, Data Analysis, Jeff Leek, week 4 lecture and notes
and Datahttp://ww2.coastal.edu/kingw/statistics/R-tutorials/simplelinear.html
- multiple Multiple Regression, Coursera, Data Analysis, Jeff Leek, week 4 lecture and notes
- fico FICO(tm) a proprietory measure of credit-worthiness http://www.fico.com/en/Pages/default.aspx
</biblio>