[SOLVED] algorithm Project Preamble

$25

File Name: algorithm_Project_Preamble.zip
File Size: 244.92 KB

5/5 - (1 vote)

Project Preamble
Classification
Choose the response (i.e. target) variable, and a set of feature variables, and apply two different classification techniques (e.g. decision trees v. logistic regression), and compare the model performance. Discuss how feature variables could be better selected and how to best select a threshold value for logistic regression.
For example, using the National Enrollment data, we can predict whether the enrollmentisInterstate, HomeState, or International, or predict whether the enrollment isGo8 or not,or given the name of an institution, a degree (Narrow_FOE), and etc., predictthe average ATAR to be high (e.g. above 80) or not. Note depending on what predication question youd like to be answered, you may need to tidy the data into the right shape. We can start with single-variate models and then multi-variate models, following the same processes demonstrated in the lecture slides. A good demonstration of the following investigations is expected:
understanding what a null model would look like in this context.
understanding what a saturated model would look like for the dataset (e.g. different target values for the same feature values).
aggregating, sub-setting,sampling or reshaping the data for better data preparationif necessary.
transforming the categoricalvariables into numerical for single-variate model selection.
using various measures to select a good combination of variables for multi-variate models.
evaluating models.
Clustering
Choose or compute a set of feature variables to apply a clustering algorithm, visualise the clustering results.
Taking the National Enrollment dataset for example, we can obtain numerical aggregations for each institution each year each degree (Narrow_FOE), the average ATAR, the average Age, the number of international students, the number of home_state students, the number of inter_state students, the ratio of female/male, and so on. With the clustering, one can choose different k values (i.e. the number of clusters) based on your intuition. For example, if you think the clusters corresponding to each institution, then you may need 41 clusters; if you think the clusters correspond to Go8 universities, you may choose k=2. You will also need to demonstrate how to select the best k, in a similar way as demonstrated in the examples used in relevant lecture slides.
Marking Criteria
Data Preparation(10%): Proficient use of data handling functions (e.g. pipes) or packages to construct clean and tidy training and testing datasets for classification. Sensible aggregation, transformation to obtain the right data for clustering.
Classification(10%): Good thorough comparison of two different types of classification models with sensible interpretations of the performance measures, contextually with the right domain intuition.
Clustering(10%): Good implementation ofclustering with adequate investigations, demonstrations and explanation onthe effect of hyper-parameters (e.g.k for k-means) in these unsupervised techniques.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] algorithm Project Preamble
$25