[SOLVED] algorithm html python parallel software network theory COMP5318 Machine Learning and Data Mining

$25

File Name: algorithm_html_python_parallel_software_network_theory_COMP5318__Machine_Learning_and_Data_Mining.zip
File Size: 913.74 KB

5/5 - (1 vote)

COMP5318 Machine Learning and Data Mining

1. Objective

The objective of this assignment is to apply machine learning and data mining methods to solve a real problem. You should compare at least three techniques with at least one, not taught in this course (e.g. AdaBoost, Random forest, XGBoost, ADTree, etc.).

2. Instructions

2.1 Datasets

CIFAR-100, classification, https://www.cs.toronto.edu/~kriz/cifar.html

Note that if the datasets are too big to run, you can consider doing some pre-processing of the datasets or use part of them to train. However, they should be clearly explained in your report.

2.2 Assignment tasks

Choose a dataset from the list above.

Try different Machine Learning methods (at least 3) and compare their performance. At least one of the techniques you use should NOT have already been covered in the course material. You should experiment and clearly discuss your design decision to help you achieve a higher performance and speed. The design options should consider the following aspects:

Choosing an appropriate model and its complexity

Using pre-processing techniques on the datasets (e.g. clustering, feature extraction, etc.)

Computer infrastructure (e.g. parallelizing, speeding-up your code, etc.)

Ease of prototyping (e.g. implementation approach, choice of algorithms and libraries)

You are expected to fine tune each algorithm and explain why one approach outperforms the others.

Since you are expected to use more complex models that have not been discussed in the lectures, you can use most external open source libraries such as: scikit-learn, pandas, Keras, Tensorflow, PyTorch, Theano, Caffe2, or their equivalent in Python 3 to write your own classifiers. Should you require to use any other external libraries, please post on Piazza for confirmation.

You are only allowed to use Python 3 on Jupyter Notebook in this assignment.

3. Report

The report must be organised in a similar way to research papers, and include the following:

In the abstract, succinctly describe the rest of your report.

The introduction section should present the dataset that you chose, discuss its relevance in diverse applications, and give an overview of the methods you used.

You are expected to include a section on the previous work, explaining successful techniques utilised on the same or similar datasets and how they are different to yours.

The next section should discuss the methods you have adopted. Explain the theory behind each of them and discuss your design choices. This part should at least include pre-processing approaches and machine learning techniques used.

The experiment section displays results and comparisons for the implemented algorithms. Include runtime, hardware and software specifications of the computer that you used for performance evaluations. You are then expected to include meaningful comments on the results of your experiments, and reflect on your design choices.

In conclusion, sum up your results and provide suggestion for meaningful future work.

The references section includes all references cited in your report, formatted in a consistent way.

3.1 Evaluation metrics

You should compare the algorithms with a 10-fold cross validation exercise.

Classification task: When evaluating different classifiers, include accuracy, precision, recall and confusion matrix.

Regression task: For regression problems, include Mean Square Error (MSE) and Negative Log Likelihood for the predictions (NLL):

2

1

= log(
|,)
=

log(2
2) +
( ( ))

2

2
2

whereis the actual value to be predicted, D is the training dataset,is a query point, and
( ) and 2 are the prediction mean and variance respectively.

3.2 Report layout

Please follow the format of the MS-Word report template provided.

Length: Ideally 10 to 15 pages up to a maximum of 25 pages with [-10] penalty for each additional page after 25.

4. Submission

4.1 Proceed to Canvas and upload all files separately, as follows:

Report (a PDF file)

You must include an appendix that provides detailed steps on how to successfully run your code, including any external libraries installation required to be able to execute your code.

Code (.ipynb files)

Your code should be written as one or more .ipynb files. You should separate the code file containing the algorithm and parameters that yield the best result from all the other algorithms, so in this case there would be 2 code files to submit.

Another alternative is to have one code file for each method / algorithm, i.e. 3 code files for 3 algorithms, 1 file for each one.

Note: Do NOT submit the dataset. In case your model takes significant time to train, you should submit the trained model as well for evaluation.

Code (PDF files of .ipynb code)

Every .ipynb code file must be saved as a PDF document and included in your submission e.g. if there are 2 .ipynb code files, you should also submit 2 PDF documents, one for each corresponding .ipynb file.

4.3 Your submission should include report and all the code files. A plagiarism checker will be used.

4.4 Clearly provide instructions on how to run your code in the appendix of the report.

4.5 Provide hyperlinks of the datasets you used, any external open-source libraries you used for the experiments and analysis, and versions of the libraries e.g. PyTorch 1.2.

4.6 A penalty of MINUS 20 (twenty) percent per each day after the due date. The maximum delay is 5 (five) days, after which the assignment submission will no longer be accepted.

4.7 The rubric is available in Canvas. Please review it carefully.

FAQ

Can we use transfer learning from VGG16 neural network model in Keras to build our CNN model?Transfer learning is allowed.

Are we allowed to use (and submit) .py files for our classifier implementation provided we comment our code appropriately and describe how to run it, or must we use .ipynb file?Use and submit your code as .ipynb files as per assignment instruction. You also need to submit the PDF version of the code files.

Can we use OpenCV, skicit-image and SVM in sklearn for assignment 2?Yes

The rubric says Code runs and classifies within a feasible time but does not define a specific duration. Can you clarify feasible time?Your code running time should be less than 1 hour when executed on a standard computer.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] algorithm html python parallel software network theory COMP5318 Machine Learning and Data Mining
$25