[Solved] FYS-STK4155 Project2-Defining the data sets to analyze yourself

$25

File Name: FYS-STK4155_Project2-Defining_the_data_sets_to_analyze_yourself.zip
File Size: 593.46 KB

SKU: [Solved] FYS-STK4155 Project2-Defining the data sets to analyze yourself Category: Tag:
5/5 - (1 vote)

Defining the data sets to analyze yourself

For project 3, you can propose own data sets that relate to your research interests or just use existing data sets from say

  1. Kaggle
  2. The University of California at Irvine (UCI) with its machine learning repository.

The approach to the analysis of these new data sets should follow to a large extent what you did in projects 1 and 2. That is:

  1. Whether you end up with a regression or a classification problem, you should employ at least two of the methods we have discussed among linear regression (including Ridge and Lasso), Logistic Regression, Neural Networks, Convolution Neural Networks, Recurrent Neural Networks, Support Vector Machines and Decision Trees, Random Forests, Bagging and Boosting. You could for example explore all of the approaches from decision trees, via bagging and voting classifiers, to random forests, boosting and finally XGboost. If you wish to venture into convolutional neural networks or recurrent neural networks, or extensions of neural networkds, feel free to do so.

For Boosting, feel also free to write your own codes.

  1. For project 3, you should feel free to use your own codes from projects 1 and 2, eventually write your own for SVMs and/or Decision trees/random forests/bagging/boosting or use the available functionality of Scikit-Learn, Tensorflow, etc.
  2. The estimates you used and tested in projects 1 and 2 should also be included, that is the from week 43 and/or the textbook by Yadav et al.

    The basic structure of your project

    Here follows a set up on how to structure your report and analyze the data you have opted for.

    Part a)

    The first part deals with structuring and reading the data, much along the same lines as done in projects 1 and 2. Explain how the data are produced and place them in a proper context.

    Part b)

    You need to include at least two central algorithms, or as an alternative explore methods from decisions tree to bagging, random forests and boosting. Explain the basics of the methods you have chosen to work with. This would be your theory part.

    Part c)

    Then describe your algorithm and its implementation and tests you have performed.

    Part d)

    Then presents your results and findings, link with existing literature and more.

    Part e)

    Finally, here you should present a critical assessment of the methods you have studied and link your results with the existing literature.

    Solving partial differential equations with neural networks

    For this variant of project 3, we will assume that you have some background in the solution of partial differential equations using finite difference schemes. We will study the solution of the diffusion equation in one dimension using a standard explicit scheme and neural networks to solve the same equations.

    For the explicit scheme, you can study for example chapter 10 of the lecture notes in Computational Physics or alternative sources. For the solution of ordinary and partial differential equations using neural networks, the lectures by Kristine Baluka Hein at this course are highly recommended.

    For the machine learning part you can use your own code from project 2 or the functionality of for example Tensorflow/Keras..

    Part a), setting up the problem

    The physical problem can be that of the temperature gradient in a rod of length x=0“>x=0x=0 and ∂2u(x,t)∂x2=∂u(x,t)∂t,t>0,x∈[0,L]“>2u(x,t)x2=u(x,t)t,t>0,x[0,L]2u(x,t)x2=u(x,t)t,t>0,x[0,L]

    or

    t=0“>t=0t=0,

    L=1“>L=1L=1 the length of the u(0,t)=0t≥0,“>u(0,t)=0t0,u(0,t)=0t0,

    and

    u(x,t)“>u(x,t)u(x,t) can be the temperature gradient of a rod. As time increases, the velocity approaches a linear variation with ut≈u(x,t+Δt)−u(x,t)Δt=u(xi,tj+Δt)−u(xi,tj)Δt“>utu(x,t+t)u(x,t)t=u(xi,tj+t)u(xi,tj)tutu(x,t+t)u(x,t)t=u(xi,tj+t)u(xi,tj)t

    and

    uxx≈u(xi+Δx,tj)−2u(xi,tj)+u(xi−Δx,tj)Δx2.“>uxxu(xi+x,tj)2u(xi,tj)+u(xix,tj)x2.uxxu(xi+x,tj)2u(xi,tj)+u(xix,tj)x2.

    Write down the algorithm and the equations you need to implement. Find also the analytical solution to the problem.

    Part b)

    Implement the explicit scheme algorithm and perform tests of the solution for Δx=1/100“>x=1/100x=1/100 using Δt/Δx2≤1/2“>t/x21/2t/x21/2.

    Study the solutions at two time points t2“>t2t2 where u(x,t2)“>u(x,t2)u(x,t2) is almost linear, close to the stationary state.

    Part c) Neural networks

    Study now the lecture notes on solving ODEs and PDEs with neural network and use either your own code from project 2 or the functionality of tensorflow/keras to solve the same equation as in part b). Discuss your results and compare them with the standard explicit scheme. Include also the analytical solution and compare with that.

    Part d) Solving eigenvalue problems

    Follow the discussion in the work of Yi et al. in the article from Computers and Mathematics with Applications 47, 1155 (2004), and use your differential equation solver with neural networks, set up a simple square, real and symmetric https://www.uio.no/english/services/it/education/canvas/ with your normal UiO username and password.

  3. Upload only the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.
  4. In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.
  5. Finally, we encourage you to collaborate. Optimal working groups consist of 2-3 students. You can then hand in a common report.

    Software and needed installations

    If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages, we recommend that you install the following Python packages via pip as

    1. pip install numpy scipy matplotlib ipython scikit-learn tensorflow sympy pandas pillow

    For Python3, replace pip with pip3.

    See below for a discussion of tensorflow and scikit-learn.

    For OSX users we recommend also, after having installed Xcode, to install brew. Brew allows for a seamless installation of additional software via for example

    1. brew install python3

    For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution you can use pip as well and simply install Python as

    1. sudo apt-get install python3 (or python for python2.7)

    etc etc.

    If you dont want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely

    1. Anaconda Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system conda
    2. Enthought canopy is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.

    Popular software packages written in Python for ML are

    These are all freely available at their respective GitHub sites. They encompass communities of developers in the thousands or more. And the number of code developers and contributors keeps increasing.

    1999-2020, Data Analysis and Machine Learning FYS-STK3155/FYS4155:http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html. Released under CC Attribution-NonCommercial 4.0 license

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] FYS-STK4155 Project2-Defining the data sets to analyze yourself
$25