INFOMDA2

Logo

Materials for Applied Data Science profile course INFOMDA2 Battling the curse of dimensionality.

Group assignment: Is High-Dimensional Analysis Worth it?

In this group assignment, your goal is to compare a “standard” analysis to a high-dimensional data analysis on a real dataset. The final product is a 2500-word report, and there is an intermediate moment of peer feedback.

Examples of a standard analysis versus high-dimensional analysis could be:

Groups

You will do this assignment in groups of three. You are responsible for creating a group. We suggest having at least weekly meetings. When you first meet up, talk about the following things:

You are welcome to find your own dataset! Choose something you find interesting, e.g., from your own experience or background. The dataset should be amenable to high-dimensional techniques, meaning the number of features / variables / columns should be large relative to the number of examples / participants / rows. There are many potential sources of data, but as an example you can look at https://r-universe.dev/datasets (filter by fields descending), https://archive.ics.uci.edu/, https://datasetsearch.research.google.com/, or your own experience / background.

Schedule

Week Suggested task Deadline
1 Make groups, start looking for dataset, decide how to work together, download template folder  
2 Choose, load, and explore dataset, start writing report introduction  
3 Decide on methods to compare (maybe look ahead in course), try to find code implementations  
4 Write down methods section, outline the results and conclusion section with keywords December 6th, 10:45: Hand-in draft report
5 Give peer feedback, properly start writing code December 13th, 17:00: Hand-in peer feedback
6 Write code, start writing results, incorporate feedback and write response letter  
  Christmas break  
7 Generate results, plots. Write results section, start polishing code and check reproducibility  
8 Write discussion, polish everything January 17th, 10:45: Hand-in final report

Assignment components

Report

You will produce a report of approximately 2500 words. For the report, we have a pre-set structure and word limits per section. The structure can be found in the template itself, here: report_template.pdf.

We suggest you download the template folder to get started with the report as well as the code and project files. This folder is the same structure that you will hand in. The template folder can be downloaded here: example folder.

Code

You will produce code to do your research project. This code should be reproducible, legible, and follow a good style (e.g., style.tidyverse.org). If you send your folder to us, we should be able to re-run your code and understand what it does without extensive debugging or searching! One crucial part is to use an RStudio Project (.Rproj file)

Peer feedback

This assignment includes a peer feedback round. Each group will provide feedback on the draft report of one other group. This will be done according to our pre-defined rubric.

Should a group miss their draft hand-in deadline, they will not be included in the peer feedback round and will not be able to receive any points for peer feedback quality.

Response letter

Your final submission will include a short letter indicating how the peer feedback was incorporated in the final report.

Grading

The rubric for the assignment can be found here.

The assignment will determine 25% of your overall grade for the course, with the following division of points: