A brief word of caution. The list gives the ten words most likely to appear in loans that paid with looking at the loans that didn't pay. For all we know, these loans might have appeared with the same frequency in loans that didn't pay. Their use in loans that paid may be due to their use in all loans. It might have nothing to do with whether or not the loan pays. We'd have to look at the loans that didn't pay to see.
A good word of caution, but I think you have misunderstood (and looking at my blog I see that I didn't explain anywhere) the method I used to gather the words. Instead of looking at the frequency of words in all Paid loans, as I believe you are thinking, for this study I looked at how likely, when a word was used in a loan, that loan was likely to be paid back.
Allow me to elaborate:
For every loan created successfully on Prosper before the end of 2007 I created a list of all of the words used in the Title and Description of the loan. For every instance of a word in a loan that was Paid I added 1 to a running total of PaidInstances. For every instance of a word in a loan that had any other status I added 1 to a running total of UnpaidInstances.
I then calculated the percentage for the word with the formula: PaidInstances / (PaidInstances + UnpaidInstances)
(Which is to say: PaidInstances / TotalWordUsage)
I reduced the list to words which had been used at least 1000 in the loan set and sorted it from words that were most often in Paid loans to words that were least often in Paid loans and compared that list to the overall likelihood of any loan to be paid back.
I found the word 'lender' at the top, with loans containing the word having been Paid 68.96% of the time. I found the word 'payday' at the bottom, with loans containing the word having been Paid only 38.89% of the time. (This compared to an average Paid percentage, across all loans, of about 61% for this time period.)
Now I think that there is an argument to be made that it would have been better to count each word a maximum of once for each listing -- what I did measures use of the word itself, more than it measures the use of the word in the listing ("help, help, help, help!" in one listing counts 4 times, instead of just once), but I think that the best choice really depends on what you're trying to do with the information.