Materials for Applied Data Science profile course INFOMDA2 *Battling the curse of dimensionality*.

The data to be analyzed in this exercise can be found in the following file.

The data in this file constitute a contingency table of counts, the
classic 1949 Great Britain five-by-five son’s by father’s occupational
mobility table. Import the data into `R`

. The warning message that might
show up in using the function `read.table()`

can be ignored.

The rows of the data table correspond to five different categories of father’s occupation and the columns to the same five different categories of son’s occupation. The cells in the main diagonal of the table refer to fathers and sons with the same occupational category, and this group is important because it measures the total amount of mobility exhibited by the sons. The categories for both nominal variables are:

- upper nonmanual (UN; self-employed professionals, salaried professionals, managers, nonretail salespersons)
- lower nonmanual (LN; proprietors, clerical workers, retail salespersons)
- upper manual (UM; manufacturing craftsmen, other craftsmen, construction crafts- men)
- lower manual (LM; service workers, other operatives, manufacturing operatives, ma- nufacturing laborers, other laborers)
- farm (F; farmers and farm managers, farm laborers)

If the table is called `X`

, then the row and column labels can be
assigned by executing

```
rownames(X) <- c('UN F','LN F','UM F','LM F','F F')
colnames(X) <- c('UN S','LN S','UM S','LM S','F S')
```

Obtain the correspondence table using the function `prop.table()`

. Use
the function `sum()`

to check whether the sum of all elements of the
correspondence table equals one. The matrix of row profiles can be
obtained by using the argument `margin = 1`

in the function
`prop.table()`

and the matrix of column profiles by using the argument
`margin = 2`

. Use the functions `rowSums()`

and `colSums()`

to check
whether the sums of the profiles are all equal to one. Install and load
the R package `ggpubr`

and execute `ggballoonplot(X, fill ='value')`

.

to visualize the correspondence table using a balloon plot. One of the R
packages for correspondence analysis is `ca`

. Install and load this
package.

**1. Apply a correspondence analysis to the GB mobility table. The
function to be used is ca().**

**2. Explore the arguments and values of the function ca() using
?ca. Obtain the row and column standard coordinates.**

**3. Use the function summary() to determine the proportion of total
inertia explained by the first two extracted dimensions.**

**4. Use the function plot() to obtain a symmetric map.**

**5. Use the argument map='rowprincipal' to obtain an asymmetric map
with principal coordinates for rows and standard coordinates for
columns.**

For the lab exercises, you will use the file

This data contains a two-way contingency table that can be used to analyze economic activity of the Polish population in relation to gender and level of education in the second quarter of 2011. The rows of the table refer to different levels of education, that is:

- tertiary (E1),
- post-secondary (E2),
- secondary (E3),
- general secondary (E4),
- basic vocational (E5),
- lower secondary, primary and incomplete primary (E6).

The columns refer to the levels:

- full-time employed females (A1F),
- part-time employed females (A2F),
- unemployed females (A3F),
- economically inactive females (A4F),
- full-time employed males (A1M),
- part-time employed males (A2M),
- unemployed males (A3M),
- economically inactive males (A4M).

Import the data into R and respond to the following items.

**6. Give the rows 1 to 6 the labels E1 to E6, respectively. Give the
columns 1 to 4 the labels A1F to A4F, and the columns 5 to 8 the labels
A1M to A4M, respectively. Give a visualization of the correspondence
matrix.**

**7. Give the proportion of full-time employed females with secondary
level of education.**

**8. Give the matrices of row profiles and column profiles.**

**9. What is the conditional proportion of full-time employed females
given tertiary level of education and what is the conditional proportion
of full-time employed males given tertiary level of education?**

**10. What is the conditional proportion of females with the lowest
level of education given economically inactive? What is the conditonal
proportion of males with the lowest level of education given
economically inactive?**

**11. Apply a correspondence analysis to the data. How large is the
total inertia?**

**12. Set the desired minimum proportion of explained inertia to .85.
How many underlying dimensions are sufficient? What is the proportion of
inertia explained by this number of dimensions?**

**13. Give the symmetric map for the final solution.**