Difference between revisions of "P2pLoans"

From Wiki2
Line 6: Line 6:
:Statistical analysis techniques shall be applied using the 'R' <cite>r</cite> statistical software package  running as a server on a Linux computer.  
:Statistical analysis techniques shall be applied using the 'R' <cite>r</cite> statistical software package  running as a server on a Linux computer.  
=====Data Collection=====
=====Data Collection=====
:For our analysis we used sample of 2500 loans made the the Lending Club <cite>club</cite> program  
:For our analysis we used sample of 2500 loans made the the Lending Club <cite>club</cite> program. Data was in the 'R' '.rda' format. Data was for the most part complete for each of the 2500 samples with data not available (<NA>) for some characteristics. Overall out of ~37,000 data items, less than 2% or approximately 660 were <NA>.
 
=====Exploratory Analysis=====
=====Exploratory Analysis=====
 
Exploratory analysis initially determined the strong negative correlation between interest rate and FICO <cite>fico</cite>  score using scatter plots of all the samples. From there each of the other variables was plotted against interest rate. For non-numeric, qualitative data items, the mean interest rate for each category was determined and compared by plotting the data and by simple regression model.<cite>simple</cite>
=====Statistical Modeling=====
=====Statistical Modeling=====


Line 26: Line 25:


#prompt  Data Analysis Project 1, Coursera, Data Analysis, instriuctor - Jeff Leek, winter 2013, https://class.coursera.org/dataanalysis-001/human_grading/view/courses/294/assessments/4/submissions
#prompt  Data Analysis Project 1, Coursera, Data Analysis, instriuctor - Jeff Leek, winter 2013, https://class.coursera.org/dataanalysis-001/human_grading/view/courses/294/assessments/4/submissions
#simple Simple Linear Correlation and Regression, Coursera, Data Analysis, Jeff Leek, week 4 lecture and notes
and Datahttp://ww2.coastal.edu/kingw/statistics/R-tutorials/simplelinear.html
#multiple Multiple Regression, Coursera, Data Analysis, Jeff Leek, week 4 lecture and notes
#fico FICO(tm) a proprietory measure of credit-worthiness  http://www.fico.com/en/Pages/default.aspx
</biblio>
</biblio>

Revision as of 21:57, 17 February 2013

Title: Lending Club Loan Interest: Factors Contributing Beyond FICO Score

Introduction:

The Lending Club is an online bank that claims to "cut the cost and complexities of bank lending and pass the savings on to borrowers.(how-peer-lending-works.action club). In this study we will use lending data on Loans made by Lending Club to find and quantify associations among the items that make up a borrowers profile. These data include information on 'Amount.Requested', 'Amount.Funded.By.Investors', 'Interest.Rate', 'Loan.Length', 'Loan.Purpose','Debt.To.Income.Ratio', 'State', 'Home.Ownership', 'Monthly.Income', 'FICO.Range', 'Open.CREDIT.Lines', 'Revolving.CREDIT.Balance', 'Inquiries.in.the.Last.6.Months'and 'Employment.Length'. All names have been changed to numbers to protect the innocent. The purpose of this study is to "identify and quantify associations between the interest rate of the loan and the other variables in the data set...taking into account the applicant's FICO score." prompt

Methods:

Statistical analysis techniques shall be applied using the 'R' r statistical software package running as a server on a Linux computer.
Data Collection
For our analysis we used sample of 2500 loans made the the Lending Club club program. Data was in the 'R' '.rda' format. Data was for the most part complete for each of the 2500 samples with data not available (<NA>) for some characteristics. Overall out of ~37,000 data items, less than 2% or approximately 660 were <NA>.
Exploratory Analysis

Exploratory analysis initially determined the strong negative correlation between interest rate and FICO fico score using scatter plots of all the samples. From there each of the other variables was plotted against interest rate. For non-numeric, qualitative data items, the mean interest rate for each category was determined and compared by plotting the data and by simple regression model.simple

Statistical Modeling
Reproducibility

Results:

Conclusions:

References

<biblio force=false>

  1. club Lending Club, Suite 300 San Francisco, CA 94105, USA https://www.lendingclub.com/home.action
  1. r The R Project for Statistical Computing, version 2.15.2 (2012-10-26) -- "Trick or Treat" http://www.r-project.org/
  1. prompt Data Analysis Project 1, Coursera, Data Analysis, instriuctor - Jeff Leek, winter 2013, https://class.coursera.org/dataanalysis-001/human_grading/view/courses/294/assessments/4/submissions
  1. simple Simple Linear Correlation and Regression, Coursera, Data Analysis, Jeff Leek, week 4 lecture and notes

and Datahttp://ww2.coastal.edu/kingw/statistics/R-tutorials/simplelinear.html

  1. multiple Multiple Regression, Coursera, Data Analysis, Jeff Leek, week 4 lecture and notes
  2. fico FICO(tm) a proprietory measure of credit-worthiness http://www.fico.com/en/Pages/default.aspx

</biblio>