Materials for Applied Data Science profile course INFOMDA2 Battling the curse of dimensionality.
1. Download the corn data and store it in your assignment folder.
2. Pick a property (Moisture, Oil, Starch, or Protein) to predict.
3. Split your data into a training (80%) and test (20%) set.
4. Use the function
plsr from the package
pls to estimate a
partial least squares model, predicting the property using the NIR
spectroscopy measurements in the training data. Make sure that the
features are on the same scale. Use leave-one-out cross-validation
plsr) to estimate out-of-sample performance.
5. Find out which component best predicts the property you chose. Explain how you did this.
6. Create a plot with on the x-axis the wavelength, and on the y-axis the strength of the loading for this component. Explain which wavelengths are most important for predicting the property you are interested in.
7. Pick the number of components included in the model based on the
“one standard deviation” rule (
selectNcomp()). Create predictions for
the test set using the resulting model.
8. Compare your PLS predictions to a LASSO linear regression model
where lambda is selected based on cross-validation with the one standard
deviation rule (using
A zipped folder with:
.Rmdfile with your answers and clean, commented code chunks
.Rmdwithout error upon unzipping!