COMP3308/3608, Lecture 5
ARTIFICIAL INTELLIGENCE
Introduction to Machine Learning. K-Nearest Neighbor. Rule-Based Algorithms: 1R
Reference: Russell and Norvig, p.693-697, 738-741 Witten, Frank, Hall and Pal, ch. 1-2, ch.4: p.91-96, 135-141
Copyright By Assignmentchef assignmentchef
, COMP3308/3608 AI, week 5, 2022 1
Assignment 1 COMP3308
The first three students who finished the assignment are:
They will receive certificates Congratulations! Very well done!
This Photo by Unknown Author is licensed under CC BY
, COMP3308/3608 AI, week 5, 2022 2
Assignment 1 COMP3308 (2)
How do we know?
When you decode the secret message there are instructions
to post a specific phrase on the discussion board
If you are not the first, second or third to finish, do not worry! You have done an amazing job and should be very proud of your search skills!
, COMP3308/3608 AI, week 5, 2022 3
Introduction to Machine Learning (ML)
What is learning and ML?
Classification of ML methods
K-nearest neighbor
Learning rules 1R algorithm
, COMP3308/3608 AI, week 5, 2022 4
Learning and Machine Learning
Machine Learning (ML) is the area of AI that is concerned with writing computer programs that can learn from
examples
domain knowledge userfeedback
ML is the core of AI without an ability to learn, a system cannot be considered intelligent
What does it mean to learn? What do you understand by learning? When are you sure that you have learned something?
, COMP3308/3608 AI, week 5, 2022 5
Definitions of learning
Definitions of learning from dictionary:
1) To get knowledge of something by study, experience, or being taught
2) To become aware by information or from observations
3) To commit to memory
4) To be informed of, ascertain; to receive instruction
Learning is making useful changes in our minds ( )
But when talking about computers (i.e. ML) these definitions
have shortcomings!
, COMP3308/3608 AI, week 5, 2022 6
Learning Definitions Shortcomings
1) and 2) impossible to test if learning has been achieved or not
How do you know if a machine has got knowledge of?
Or if it has become aware? Can computers be aware or
conscious philosophical issue
3) and 4) committing to memory, receiving instructions
Sound too passive; trivial tasks for computers
You can receive instructions and memorize things without being able to benefit from them, e.g. not able to apply new knowledge to new situations
, COMP3308/3608 AI, week 5, 2022 7
Learning Operational Definition
Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time ( )
Computers learn when they change their behavior in a way that makes them perform better in the future
Ties learning to performance rather than knowledge
, COMP3308/3608 AI, week 5, 2022 8
Types of ML
Three main types Supervised
Unsupervised
Reinforcement
Forth type: associations learning developed within the
database community in early 90s
, COMP3308/3608 AI, week 5, 2022 9
Supervised ML Tasks Examples
Classification task: recognizing post codes (=recognizing digits) Given: Handwritten digits and their corresponding label (class) Task: Build a classifier that can recognise new handwritten digits
For the image:
5 people wrote the numbers from 0 to 9
1 example = 1 handwritten digit (50 in total)
each example is labelled with the digit it represents (0,1,or 9) => there are 10 classes
image ref: http://www-inst.eecs.berkeley.edu/~cs188/fa06/projects/classification/4/writeup/img2.gif
Regression task: predicting the exchange rate of AUD
Given: data from previous years (economic indicators, political events), with their corresponding exchange rate
Task: Build a classifier to predict the exchange rate for future days
The difference between classification and regression is in the type of the class (target)
variable nominal vs numeric
, COMP3308/3608 AI, week 5, 2022 10
Supervised Learning Definition
Given: a set of pre-classified (labelled) examples {x,y}, x input vector, y target output
Task: learn a function (classifier, model) which encapsulates the information in these examples (i.e. maps x->y) and can be used predictively
i.e. to predict the value of one variable (y) from the known values of other variables (x)
Why is it called supervised?
2 types of supervised learning
Classification: the variable to be predicted is categorical (i.e. its values belong to a pre-specified, finite set of possibilities)
Regression: the variable to be predicted is numeric
Examples of supervised algorithms: 1R, k-NN, DTs, NB, neural
networks (perceptron, backpropagation), SVM
, COMP3308/3608 AI, week 5, 2022 11
outlook temp. humidity windy play
false no true no false yes false yes false yes true no true yes false no false yes false yes true yes true yes false yes true no
hot high sunny hot high overcast hot high rainy mild high
rainy cool normal
overcast cool normal
cool normal sunny mild high
rainy mild normal
cool normal
overcast mild high
mild normal
overcast hot normal
input data Classification Ex.1
input vector, with 4 features
target class model (classifier) model 1: decision tree
model 3:
model 2: rules
if outlook=sunny then play=no
elseif outlook=overcast then play=yes elseif outlook=rainy then play=yes
We can learn different types of models
, COMP3308/3608 AI, week 5, 2022 12
Classification Ex.2 Driving Motor Vehicles
ALVINN, Pomerleau et al., 1993
Driving a van along a highway
Uses a neural network classifier
Input vectors: derived from the 3032 image (black and white values)
Outputs (classes): 32 classes, corresponding to the turning directions left, straight, right; different degrees
1 labelled example is: input vector + class label (turning direction)
The machine that changed the world https://www.youtube.com/watch?v=oPpMp60vCMY
Early NN: minute 39-41, ALVINN and NetTalk: minute 41-46
, COMP3308/3608 AI, week 5, 2022 13
Classification More Examples
Banking 1: Is a mortgage application a good or bad credit risk?
Banking 2: Is a credit card transaction fraudulent or not?
Medicine: Is a particular disease present or not?
Law: Was a given will written by the real diseased person or by somebody else?
Security: Is a given behavior a possible terrorist threat?
, COMP3308/3608 AI, week 5, 2022 14
CPU performance data
model linear regression
PRP = 56.1 + 0.049 MYCT + 0.015 MMIN
+ 0.006 MMAX + 0.630 CACH
0.270 CHMIN + 1.46 CHMAX
regression tree
Regression Example
Task: Predict computer performance
, COMP3308/3608 AI, week 5, 2022 15
More Regression Examples
Predict electricity demand in the next hour from previous demands
Predict retirement savings from current savings and market indicators
Predict the house prices in Sydney in 2030
Predict the sales of a new product based on advertisement
expenditure
Predict wind velocity based on temperature, humidity, pressure
, COMP3308/3608 AI, week 5, 2022 16
Reinforcement Learning
Each example has a score (grade) instead of correct output
Much less common that supervised learning
Most suited to control systems applications
, COMP3308/3608 AI, week 5, 2022 17
Unsupervised Learning (Clustering)
Given: a collection of input vectors x no target outputs y are given
Task: group (cluster) the input examples into a finite number of clusters so that the examples
From each cluster are similar to each other
From different clusters are dissimilar to each other
Examples of clustering algorithms: k-means, nearest neighbor, hierarchical clustering
, COMP3308/3608 AI, week 5, 2022 18
Clustering Example
Customer profiling
A department store wants to segment its customers into groups and create a special catalog for each group. The attributes for the grouping included customers income, location and physical characteristics (age, height, weight, etc.).
Clustering was used to find clusters of similar customers
A catalogue was created for each cluster based on the cluster
characteristics and mailed to each customer
, COMP3308/3608 AI, week 5, 2022 19
Associations Learning
Find relationships in data
market-basket analysis find combinations of items that occur
typically together
sequential analysis find frequent sequences in data
, COMP3308/3608 AI, week 5, 2022 20
Market-Basket Analysis Example
Uses the information about what customers buy to give us insight into who they are and why they make certain purchases
Ex.1. A grocery store owner is trying to decide if to put bread on sale. He generates association rules and finds what other products are typically purchased with bread. A particular type of cheese is sold 60% of the time the bread is sold and a jam is sold 70% of the time. Based on these findings, he decides:
1) to place some cheese and jam at the end of the aisle where the bread is
2) not to place either of these 3 items on sale at the same time.
, COMP3308/3608 AI, week 5, 2022 21
Frequent Sequences Example
Goal: Given a sequence of events, find frequent sub-sequences These patterns are similar to market-basket analysis but the
relationship is based on time
Ex. 1: The webmaster of company X periodically analyses the web pages log data to determine how the users browse the web pages. He finds that 70% of the cases the users of page A follow one of the following patterns:
A->D->B->C
A->E->B->C
=> A-> C if a frequent pattern
=> he then decides to add a link from page A to page C
Ex.2: Finding sub-sequences in DNA data for particular species, COMP3308/3608 AI, week 5, 2022 22
More ML Applications
Fraud detection
Health care medical insurance fraud, inappropriate medical treatment
Credit card services, phone card and retail fraud
Data: historical transactions and other data
Sport analyzing game statistics (shots blocked, assists and fouls) to gain competitive advantage
When player X is on the floor, player Ys shot accuracy decreases from 75% to 30%
Astronomy
JPL and the Palomar Observatory discovered 22 quasars using ML
Web applications
Mining web logs to discover customer preferences and behavior, analyze
effectiveness of web marketing, improve web site organization
, COMP3308/3608 AI, week 5, 2022 23
Why is ML Important? adapted from http://www.site.uottawa.ca/~nat/Courses/csi5387.html
Some tasks cannot be defined well, except by examples
e.g. recognizing people (man vs women), handwritten digits, etc.
The amount of knowledge available about certain tasks is too big for explicit encoding into rules or difficult to extract from experts)
e.g. medical diagnosis easier to learn from cases (symptoms-> diagnosis)
Need for adaptation
Humans often produce machines that do not work as well as desired in
the environments in which they are used
Environments change over time e.g. a spam email filter; the characteristics of spam email change over time
Relationships and correlations can be hidden within large amounts of data. ML and Data Mining may be able to find these relationships.
, COMP3308/3608 AI, week 5, 2022 24
Machine Learning vs Data Mining
Data Mining (DM): search for hidden patterns in large datasets these patterns should be meaningful, useful and actionable
Most of the techniques used for DM have been developed in ML
DM deals with large and multidimensional data, ML not necessary
DM is applied ML
Motivation for DM
Data explosion huge databases
due to automated data collection tools and mature database technology
examples: supermarket transaction data, credit card usage data, mobile usage data, government statistics, molecular databases, medical records, Wikipedia and other large test collections, etc.
We are drowning in data but starving for knowledge!
, COMP3308/3608 AI, week 5, 2022 25
What jobs will disappear in the 21 century?
mailcarriers
insurance and retail estate agents, autodealesrs
prison guards stockbrokers
teachers orthodontists
pharmers (vaccine carrying tomato)
gene programmers
tissue engineers
hot-line handyman (remote diagnostics)
Turing testers
CEOs truckers housekeepers
What will be the 10 hottest jobs of the 21 century?
narrowcasters (personalised ads)
Time magazine, June 26, 2000
data miners
, COMP3308/3608 AI, week 5, 2022 26
If you were born in 2012.you would work in Data Mining
SMH, 6 April 2012 http://www.smh.com.au/lifestyle/life/whats-the-future-baby-20120405-1wfez.html
My schooling will become more interesting as I go, as todays digital natives grow up to become teachers. Theyll know how to use all the gadgets at their disposal to make learning easier, fun and compatible with my short attention span. Ill always be switched on. Ill crowdsource my big decisions, taking votes among my closest 30 or so net friends. Ill do a university degree of course just about everyone will. Ill probably work in a knowledge-based service industry which will depend on mining data from customer transactions in unimaginable volumes to determine which services to provide to whom, where and when.
Side question: Technology vs chalk & talk teaching which one is better?
https://www.openlearning.com/educationist/ChalkAndTalkOrTechnologyDoIHaveAChoicePartOne, COMP3308/3608 AI, week 5, 2022 27
The 10 Toughest Jobs To Fill In 2016 Data Scientist
Forbes magazine, 24 September 2015 https://www.forbes.com/sites/susanadams/2015/09/24/the-10-toughest-jobs-to-fill- in-2016/#d665d4a6fcca
With the explosion of big data and the need to track it, employers keep on hiring data scientists. But qualified candidates are in short supply. The field is so new, the Bureau of Labor Statistics doesnt even track it as a profession. Yet thousands of companies, from startups that analyze credit card data in order to target marketing and advertising campaigns, to giant corporations like Ford Motor F +0.26% and Price WaterhouseCoopers, are bringing on scores of people who can take gigantic data sets and wrestle them into usable information. As an April report from technology market research firm Forrester put it, Businesses are drowning in data but starving for insights.
, COMP3308/3608 AI, week 5, 2022 28
The 10 Toughest Jobs To Fill In 2017 Data Scientist (again)
Forbes magazine, 8 February 2017 https://www.forbes.com/sites/karstenstrauss/2017/02/08/the-toughest-jobs-to-fill- in-2017/#44c245ee7f14
One job that made the list this year as it did last year is data scientist. Says Kensing: Universities now are just starting to integrate specific majors for that field. Its got a high growth outlook but right now its still a burgeoning field. According to the numbers, the data scientist occupation has a 16% growth outlook over the next eight years, and right now the median annual salary for that position is more than $128,000.
, COMP3308/3608 AI, week 5, 2022 29
Top Emerging Jobs in 2020
Forbes magazine, 5 January 2020 https://www.forbes.com/sites/louiscolumbus/2020/01/05/ai-specialist-is-the-top- emerging-job-in-2020-according-to-linkedin/?sh=1d32f6c37495
Artificial Intelligence Specialist Artificial Intelligence and Machine Learning have both become synonymous with innovation, and LinkedIn data shows thats more than just buzz. Hiring growth for this role has grown 74% annually in the past 4 years and encompasses a few different titles within the space that all have a very specific set of skills despite being spread across industries, including artificial intelligence and machine learning engineer. According to Indeed, Machine Learning Engineer job openings grew 344% between 2015 to 2018 and have an average base salary of $146,085
Data Scientist LinkedIn is seeing a 37% annual increase in demand for Data Scientists and related technical positions today. Data Science is another field that has topped the LinkedIn Emerging Jobs list for three years running. Its a specialty thats continuing to grow significantly across all industries.
, COMP3308/3608 AI, week 5, 2022 30
Classification K-Nearest Neighbor Algorithm
, COMP3308/3608 AI, week 5, 2022 31
Given: a set of pre-labelled examples
14 examples
4 attributes: outlook, temperature, humidity and windy)
the class is play (values: yes, no)
Task: Build a model (classifier) that can be used to predict the class of new (unseen) examples
e.g. predict the class (yes or no) of
attributes (features, variables)
Classification Setup Again
outlook=sunny, temp=hot, humidity=low, windy=true
Examples used to build the model are called training data
Success is measured empirically on another set called test data
Test data hasnt been used to build the classifier; it is also labelled
Performance measure: accuracy proportion of correctly classified test examples
, COMP3308/3608 AI, week 5, 2022 32
outlook temp. humidity windy play
false no true no false yes false yes false yes true no true yes false no false yes false yes true yes true yes false yes true no
hot high sunny hot high overcast hot high rainy mild high
rainy cool normal
overcast cool normal
cool normal sunny mild high
rainy mild normal
cool normal
sunny mild normal overcast mild high
overcast hot normal
Nominal and Numeric Attributes
2 types of attributes (features):
numeric (continuous) their values are numbers
nominal (categorical) their values belong to a pre-specified, finite set of possibilities
outlook temp. humidity windy play
false no true no false yes false yes false yes true no true yes false no false yes false yes true yes true yes false yes true no
hot high sunny hot high overcast hot high rainy mild high
rainy cool normal
overcast cool normal
cool normal sunny mild high
rainy mild normal
cool normal
sunny mild normal overcast mild high
overcast hot normal
CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.