Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final Report Review--gad87 #12

Open
gloriadazevedo opened this issue Dec 6, 2016 · 0 comments
Open

Final Report Review--gad87 #12

gloriadazevedo opened this issue Dec 6, 2016 · 0 comments

Comments

@gloriadazevedo
Copy link

Nice job guys! I really liked some of the plots that you show for the overfitting/underfitting and the line through your listings points. It was a good idea to use cross validation and lasso regularization to get a sparse model so that you're not overfitting the data to the training data before trying to fit it to the test data. Also that should decrease the overall variance of the (linear) model. I'm not sure why K Nearest neighbors was implemented with all the data, maybe that can be rewritten to be more clear. In the future, you can also try different values for the number of neighbors with cross validation to pick the most optimal number of neighbors for the data. In addition, I'm curious as to why you used decision trees to pick variables for the linear regression instead of a more well known method such as best subset selection or forward backward subset selection since you have a lot of variables. Just some computational considerations to think about as well.

Potential next steps:
I think that it would be interesting to investigate the price of listings over time, since I think the data all came from one day (in July and on a holiday weekend!). That way you can capture seasonal effects in listing prices (prices go up since demand goes up but then also the number of listings increase if people travel away from their home). Other interesting variables would be incorporating location (maybe by neighborhood or close to a major transit stop) and maybe mine some text from the reviews since those are at least two things that I look for when looking at AirBNB listings. In addition, other measures of error would be interesting like the "outside of 10% range" or just the absolute value of the percentage error of the predicted price. In addition, maybe you can investigate fitting some nonlinear models such as polynomials to capture some of that nonlinearity between actual and predicted price.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant