You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nice job guys! I really liked some of the plots that you show for the overfitting/underfitting and the line through your listings points. It was a good idea to use cross validation and lasso regularization to get a sparse model so that you're not overfitting the data to the training data before trying to fit it to the test data. Also that should decrease the overall variance of the (linear) model. I'm not sure why K Nearest neighbors was implemented with all the data, maybe that can be rewritten to be more clear. In the future, you can also try different values for the number of neighbors with cross validation to pick the most optimal number of neighbors for the data. In addition, I'm curious as to why you used decision trees to pick variables for the linear regression instead of a more well known method such as best subset selection or forward backward subset selection since you have a lot of variables. Just some computational considerations to think about as well.
Potential next steps:
I think that it would be interesting to investigate the price of listings over time, since I think the data all came from one day (in July and on a holiday weekend!). That way you can capture seasonal effects in listing prices (prices go up since demand goes up but then also the number of listings increase if people travel away from their home). Other interesting variables would be incorporating location (maybe by neighborhood or close to a major transit stop) and maybe mine some text from the reviews since those are at least two things that I look for when looking at AirBNB listings. In addition, other measures of error would be interesting like the "outside of 10% range" or just the absolute value of the percentage error of the predicted price. In addition, maybe you can investigate fitting some nonlinear models such as polynomials to capture some of that nonlinearity between actual and predicted price.
The text was updated successfully, but these errors were encountered:
Nice job guys! I really liked some of the plots that you show for the overfitting/underfitting and the line through your listings points. It was a good idea to use cross validation and lasso regularization to get a sparse model so that you're not overfitting the data to the training data before trying to fit it to the test data. Also that should decrease the overall variance of the (linear) model. I'm not sure why K Nearest neighbors was implemented with all the data, maybe that can be rewritten to be more clear. In the future, you can also try different values for the number of neighbors with cross validation to pick the most optimal number of neighbors for the data. In addition, I'm curious as to why you used decision trees to pick variables for the linear regression instead of a more well known method such as best subset selection or forward backward subset selection since you have a lot of variables. Just some computational considerations to think about as well.
Potential next steps:
I think that it would be interesting to investigate the price of listings over time, since I think the data all came from one day (in July and on a holiday weekend!). That way you can capture seasonal effects in listing prices (prices go up since demand goes up but then also the number of listings increase if people travel away from their home). Other interesting variables would be incorporating location (maybe by neighborhood or close to a major transit stop) and maybe mine some text from the reviews since those are at least two things that I look for when looking at AirBNB listings. In addition, other measures of error would be interesting like the "outside of 10% range" or just the absolute value of the percentage error of the predicted price. In addition, maybe you can investigate fitting some nonlinear models such as polynomials to capture some of that nonlinearity between actual and predicted price.
The text was updated successfully, but these errors were encountered: