[SOLVED] CS代考计算机代写 Bayesian network Bayesian case study algorithm Hidden Markov Mode decision tree database flex information theory Machine Learning 10-601

30 $

File Name: CS代考计算机代写_Bayesian_network_Bayesian_case_study_algorithm_Hidden_Markov_Mode_decision_tree_database_flex_information_theory_Machine_Learning_10-601.zip
File Size: 1507.2 KB

SKU: 4747514055 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Machine Learning 10-601
Tom M. Mitchell
Machine Learning Department Carnegie Mellon University
January 12, 2015
Today:
•What is machine learning? •Decisiontreelearning
•Courselogistics
Readings:
•“The Discipline of ML” •Mitchell,Chapter3
•Bishop,Chapter14.4
Machine Learning:
Study of algorithms that
•improve their performance P •at some task T
•with experience E
well-defined learning task: 1

Learning to Predict Emergency C-Sections
[Sims et al., 2000]
9714 patient records, each with 215 features
Learning to classify text documents
spam vs
not spam
2

Learning to detect objects in images
(Prof. H. Schneiderman)
Example training images for each orientation
Learn to classify the word a person is thinking about, based on fMRI brain activity
3

Learning prosthetic control from neural implant
[R. Kass L. Castellanos A. Schwartz]
Machine Learning – Practice
Mining Databases
Text analysis
Speech Recognition
Control learning
Object recognition
•Support Vector Machines •Bayesian networks
•Hidden Markov models
•Deep neural networks
•Reinforcement learning •….
4

Machine Learning – Theory
Other theories for
•Reinforcement skill learning
•Semi-supervised learning • Active student querying • …
PAC Learning Theory (supervised concept learning)
# examples (m) representational
error rate (ε)
complexity (H)
failure probability (δ)
… also relating:
•# of mistakes during learning
•learner’s query strategy •convergence rate
•asymptotic performance •bias, variance
Machine Learning in Computer Science
•Machine learning already the preferred approach to –Speech recognition, Natural language processing
–Computer vision
–Medical outcomes analysis
–Robot control –…
•This ML niche is growing (why?)
ML apps.
All software apps.
5

Machine Learning in Computer Science
•Machine learning already the preferred approach to –Speech recognition, Natural language processing
–Computer vision
–Medical outcomes analysis
–Robot control –…
ML apps.
All software apps.
•This ML niche is growing
–Improved machine learning algorithms
–Increased volume of online data
–Increased demand for self-customizing software
Tom’s prediction: ML will be fastest-growing part of CS this century
Economics and Organizational Behavior
Evolution
Computer science
Machine learning
Statistics
Animal learning (Cognitive science, Psychology, Neuroscience)
Adaptive Control Theory
6

What You’ll Learn in This Course
•The primary Machine Learning algorithms
–Logistic regression, Bayesian methods, HMM’s, SVM’s, reinforcement learning, decision tree learning, boosting, unsupervised clustering, …
•How to use them on real data –text, image, structured data
–your own project
•Underlying statistical and computational theory
•Enough to read and understand ML research papers
Course logistics
7

Machine Learning 10-601
website: www.cs.cmu.edu/~ninamf/courses/601sp15
Faculty
•MariaBalcan •TomMitchell
TA’s
•TravisDick
•KirstenEarly
•AhmedHefny
•MicolMarchetti-Bowick •WillieNeiswanger
•AbuSaparov
Course assistant
•SharonCavlovich
See webpage for
•Officehours
•Syllabusdetails
•Recitationsessions •Gradingpolicy
•Honestypolicy
•Latehomeworkpolicy •Piazzapointers
•…
Highlights of Course Logistics
On the wait list?
•Hang in there for first few weeks Homework 1
•Available now, due friday
Grading:
•30% homeworks (~5-6)
•20% course project
•25% first midterm (March 2)
•25% final midterm (April 29)
Academic integrity:
•CheatingàFail class, be expelled from CMU
Late homework:
•full credit when due
•half credit next 48 hrs
•zero credit after that
•we’ll delete your lowest HW score
•must turn in at least n-1 of the n homeworks, even if late
Being present at exams:
•You must be there – plan now.
•Two in-class exams, no other final
8

Maria-Florina Balcan: Nina
•FoundationsforModernMachineLearning

E.g., interactive, distributed, life-long learning
Theoretical Computer Science, especially connections between learning theory & other fields

Approx. Algorithms
Control Theory
Game Theory
Machine Learning Theory
Mechanism Design
Discrete Optimization
Matroid Theory
Travis Dick
•Whencanwelearnmanyconcepts from mostly unlabeled data by exploiting relationships between between concepts.
•Currently:Geometricrelationships
9

Kirstin Early
•Analyzingandpredicting energy consumption
•Reducecosts/usageandhelp people make informed decisions
Predicting energy costs
from features of home and occupant behavior
Energy disaggregation:
decomposing total electric signal into individual appliances
Ahmed Hefny
•Howcanwelearntotrackandpredictthestateofa dynamical system only from noisy observations ?
•Can we exploit supervised learning methods to devise a flexible, local minima-free approach ?
observations (oscillating pendulum) Extracted 2D state trajectory
10

Micol Marchetti-Bowick
How can we use machine learning for biological and medical research?
•Usinggenotypedatatobuildpersonalized models that can predict clinical outcomes
•Integratingdatafrommultiplesourcesto perform cancer subtype analysis
•Structuredsparseregressionmodelsfor genome-wide association studies
sample weight
genetic relatedness
xxxxxxxxx yyyyyyyyy
Gene expression data w/ dendrogram (or have one picture per task)
Willie Neiswanger
•Ifwewanttoapplymachinelearning algorithms to BIG datasets…
•Howcanwedevelopparallel,low-communicationmachine learning algorithms?
•Suchasembarrassinglyparallelalgorithms,wheremachines work independently, without communication.
11

Abu Saparov
•How can knowledge about the world help computers understand natural language?
•What kinds of machine learning tools are needed to understand sentences?​
“Carolyn ate the cake with a fork.” “Carolyn ate the cake with vanilla.”
person_eats_food
consumer
Carolyn
food
cake
instrument
fork
person_eats_food
consumer
Carolyn
food
cake
topping
vanilla
Tom Mitchell
How can we build never-ending learners? Case study: never-ending language learner
(NELL) runs 24×7 to learn to read the web
see http://rtw.ml.cmu.edu
# of beliefs vs.
time (5 years)
mean avg. precision top 1000
reading accuracy vs.
time (5 years)
12

Function Approximation and Decision tree learning
Function approximation
Problem Setting:
•SetofpossibleinstancesX
•Unknowntargetfunctionf:XàY
•SetoffunctionhypothesesH={h|h:XàY}
superscript: ith training example •Trainingexamples{ }ofunknowntargetfunctionf
Output:
•Hypothesish∈Hthatbestapproximatestargetfunctionf
Input:
13

Simple Training Data Set
Day Outlook Temperature Humidity Wind PlayTennis?
A Decision tree for
f: à PlayTennis?
Each internal node: test one discrete-valued attribute Xi Each branch from a node: selects one value for Xi Each leaf node: predict Y (or P(Y|X ∈ leaf))
14

Decision Tree Learning
Problem Setting:
•SetofpossibleinstancesX
–each instance x in X is a feature vector
–e.g., •Unknowntargetfunctionf:XàY
–Y=1 if we play tennis on this day, else 0 •SetoffunctionhypothesesH={h|h:XàY}
–each hypothesis h is a decision tree –trees sorts x to leaf, which assigns y
Decision Tree Learning
Problem Setting:
•SetofpossibleinstancesX
–each instance x in X is a feature vector x = < x1, x2 … xn>
•Unknowntargetfunctionf:XàY –Y is discrete-valued
•SetoffunctionhypothesesH={h|h:XàY} –each hypothesis h is a decision tree
Input:
•Trainingexamples{ }ofunknowntargetfunctionf Output:
•Hypothesish∈Hthatbestapproximatestargetfunctionf
15

Decision Trees
Suppose X =
where Xi are boolean-valued variables
HowwouldyourepresentY=X2 X5 ? Y=X2 ∨X5
How would you represent X2 X5 ∨ X3X4(¬X1)
16

node = Root
[ID3, C4.5, Quinlan]
Sample Entropy
17

Entropy
Entropy H(X) of a random variable X
H(X) is the expected number of bits needed to encode a randomly drawn value of X (under most efficient code)
Why? Information theory:
•Most efficient possible code assigns -log2 P(X=i) bits
to encode the message X=i
•So,expectednumberofbitstocodeonerandomXis:
# of possible values for X
Entropy
Entropy H(X) of a random variable X
Specific conditional entropy H(X|Y=v) of X given Y=v :
Conditional entropy H(X|Y) of X given Y :
Mutual information (aka Information Gain) of X and Y :
18

Information Gain is the mutual information between input attribute A and target variable Y
Information Gain is the expected reduction in entropy of target variable Y for data sample S, due to sorting on variable A
Simple Training Data Set
Day Outlook Temperature Humidity Wind PlayTennis?
19

20

Final Decision Tree for
f: à PlayTennis?
Each internal node: test one discrete-valued attribute Xi Each branch from a node: selects one value for Xi Each leaf node: predict Y
Which Tree Should We Output?
•ID3 performs heuristic
search through space of decision trees
•It stops at smallest acceptable tree. Why?
Occam’s razor: prefer the simplest hypothesis that fits the data
21

Why Prefer Short Hypotheses? (Occam’s Razor) Arguments in favor:
Arguments opposed:
Why Prefer Short Hypotheses? (Occam’s Razor)
Argument in favor:
•Fewershorthypothesesthanlongones
à a short hypothesis that fits the data is less likely to be a statistical coincidence
à highly probable that a sufficiently complex hypothesis will fit the data
Argument opposed:
•Alsofewerhypotheseswithprimenumberofnodes
and attributes beginning with “Z”
•What’ssospecialabout“short”hypotheses?
22

Overfitting
Consider a hypothesis h and its •Error rate over training data: •True error rate over all data:
We say h overfits the training data if Amount of overfitting =
23

24

Split data into training and validation set
Create tree that classifies training set correctly
25

26

You should know:
•Wellposedfunctionapproximationproblems: –Instance space, X
–Sample of labeled training data { }
–Hypothesis space, H = { f: XàY }
•Learningisasearch/optimizationproblemoverH –Various objective functions
•minimize training error (0-1 loss)
•among hypotheses that minimize training error, select smallest (?)
•Decisiontreelearning
–Greedy top-down learning of decision trees (ID3, C4.5, …) –Overfitting and tree/rule post-pruning
–Extensions…
Questions to think about (1)
•ID3 and C4.5 are heuristic algorithms that search through the space of decision trees. Why not just do an exhaustive search?
27

Questions to think about (2)
•Consider target function f: ày, where x1 and x2 are real-valued, y is boolean. What is the set of decision surfaces describable with decision trees that use each attribute at most once?
Questions to think about (3)
•Why use Information Gain to select attributes in decision trees? What other criteria seem reasonable, and what are the tradeoffs in making this choice?
28

Questions to think about (4)
•What is the relationship between learning decision trees, and learning IF-THEN rules
29

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS代考计算机代写 Bayesian network Bayesian case study algorithm Hidden Markov Mode decision tree database flex information theory Machine Learning 10-601
30 $