Materials for Applied Data Science profile course INFOMDA2 Battling the curse of dimensionality.
In this group assignment, your goal is to compare a “standard” analysis to a high-dimensional data analysis on a real dataset. The final product is a 2500-word report, and there is an intermediate moment of peer feedback.
Examples of a standard analysis versus high-dimensional analysis could be:
You will do this assignment in groups of three. You are responsible for creating a group. We suggest having at least weekly meetings. When you first meet up, talk about the following things:
You are welcome to find your own dataset! Choose something you find interesting, e.g., from your own experience or background. The dataset should be amenable to high-dimensional techniques, meaning the number of features / variables / columns should be large relative to the number of examples / participants / rows. There are many potential sources of data, but as an example you can look at https://r-universe.dev/datasets (filter by fields descending), https://archive.ics.uci.edu/, https://datasetsearch.research.google.com/, or your own experience / background.
Week | Suggested task | Deadline |
---|---|---|
1 | Make groups, start looking for dataset, decide how to work together, download template folder | |
2 | Choose, load, and explore dataset, start writing report introduction | |
3 | Decide on methods to compare (maybe look ahead in course), try to find code implementations | |
4 | Write down methods section, outline the results and conclusion section with keywords | December 6th, 10:45: Hand-in draft report |
5 | Give peer feedback, properly start writing code | December 13th, 17:00: Hand-in peer feedback |
6 | Write code, start writing results, incorporate feedback and write response letter | |
Christmas break | ||
7 | Generate results, plots. Write results section, start polishing code and check reproducibility | |
8 | Write discussion, polish everything | January 17th, 10:45: Hand-in final report |
You will produce a report of approximately 2500 words. For the report,
we have a pre-set structure and word limits per section. The structure
can be found in the template itself, here:
report_template.pdf
.
We suggest you download the template folder to get started with the report as well as the code and project files. This folder is the same structure that you will hand in. The template folder can be downloaded here: example folder.
You will produce code to do your research project. This code should be
reproducible, legible, and follow a good style (e.g.,
style.tidyverse.org). If you send your
folder to us, we should be able to re-run your code and understand what
it does without extensive debugging or searching! One crucial part is to
use an RStudio Project (.Rproj
file)
This assignment includes a peer feedback round. Each group will provide feedback on the draft report of one other group. This will be done according to our pre-defined rubric. (rubric under construction)
Should a group miss their draft hand-in deadline, they will not be included in the peer feedback round and will not be able to receive any points for peer feedback quality.
Your final submission will include a short letter indicating how the peer feedback was incorporated in the final report.
The exact grading rubric will be made clearer in the coming weeks, but you can expect approximately the following division: