[SOLVED] algorithm math statistic Introduction

$25

File Name: algorithm_math_statistic_Introduction.zip
File Size: 348.54 KB

5/5 - (1 vote)

Introduction
CS1210 Computer Science I: Fundamentals
Homework 4: The Naive Bayes Classifaction Algorithm Due Friday, December 13 at 11:59PM
In this assignment, were going to construct a classifier system, that is, a machine learning system that can predict an outcome for a new record by indirect comparison to other records. The specific classifier well be building is a Naive Bayes classifier, that is, a probabilistic classifier that predicts an output for a new record based on observed outputs from similar records. Naive Bayes systems are really simple, and are useful mostly because they are exceedingly fast. Also, they are well suited to domains where you have catagorical data, that is, attributes that have multiple values that are not necessarily ordered like the traveltoschool attribute, whos values are car, bus, walk, and skateboard.
An example should help make this algorithm clear. Lets say I told you that only 24 of 100 current US Senators are women. If I picked a Senator at random, would you guess that that Senator was a man or a woman?
We can estimate the probability using frequencies; 24100 are W, 76100 are M, so PW 0. 24 and PM0. 76. I would certainly guess M, and be right most of the time. Note: this estimate is totally accurate, because we are not sampling from the population of Senators but actually counting the entire population. Normally, we would sample a few of the population under study and then estimate the probability in the larger population from the frequency counts obtained on the sample, thereby implicitly assuming the sample is representative of the population at large.
Now say that I point out that 53 of 100 current US Senators are Republicans, leaving 47 Democrats well count the two Independents as Democrats, since Bernie Sanders and Angus King caucus with the Democrats. If I were to pick a Senator at random, would you guess that the Senator is a Democrat or a Republican? Using the same idea as before, I would be inclined to guess R, but only just slightly so, given that PD0. 47 and PR0. 53. But what if I told you that the Senator I picked at random was a woman. Would this change your guess as to whether the Senator was a Democrat?
It should! Of the 24 women Senators, fully 17 are Democrats, so PD W 17240. 71 much larger than PD0. 47 which I would estimate if I had no knowledge about the gender of the selected Senator. Here, we use PDW to represent a conditional probability, that is, the probability that my randomly selected Senator is a Democrat given thator conditioned onthe fact that they are female.
Now what if I again pick a Senator at random and ask you to estimate the probability that the selected Senator is both a Democrat and a woman, PDW? Normally, the joint probability of two events happening together is the product of their individual probabilities. So, we might guess that PDWPDPW0. 470. 240. 11. But more intuitively, we can see the answer should be 17100, or 0.17, since there are 17 Democratic women out of a total population of 100 Senators. Why the discrepancy? Because as we have already seen, these two events are not independent: if they were, then woman Democrats and woman Republicans should occur at the same rate.
Instead, in this case where the events are not independent, the joint probability PDW is related mathematically to the conditional probability PDW as follows:
PDWPWPDW
If we plug in the numbers, we see 0. 240. 710. 17, which is the same as the intuitive answer we got from counting the 17 Democratic women in the US Senate. Likewise, PDWPDPWD0. 47PWD0. 17 yields a PWD0. 36, the probability that a
1

known Democratic Senator is also female. These equations hint at some deeper relationship between PD, PW , PDW , PW D and PDW .
Thomas Bayes, an 18th century English statistician and clergyman, introduced his Bayes Rule, which establishes the mathematical relationships on probabilities between events when they are not independent one from the other:
PABPBAPA PB
or, in our case:
PWD PDWPW PD
Again, with our numbers, we see:
PW D0. 710. 240. 36 0. 47
which we can confirm numerically, as 1747 Democratic Senators, or 36, are women.
Another Example
Now lets consider another, more complex, example. Imagine you own a lousy car that only starts half the time. Over time, you make the following observations:
PS0. 5 car wont start
PB0. 1 battery is dead
PG0. 2 gas tank is empty
PSG1 car wont start without gas PSB1 car wont start with a bad battery
Where we might think of G and B as two alternative diseases and S as an observable symptom.
Wed like to compute PGS and PBS so you can take corrective action filling the tank or recharging the battery, respectively. Using Bayes Rule:
P GSP G PS G PS 0. 210. 50. 4 PBSPB PSB PS 0. 110. 50. 2
By taking the more probable diagnosis here, PGS, we conclude that the car wont run because it is out of gas note that when comparing the two diseases, we can really just compare the numerators and ignore dividing by PS as we are only interested in which one is larger.
Now lets assume that PR is the probability that the radio doesnt work. Note that while we know that the radio wont work if the battery is dead, we also know that the radio can still work if we are out of gas.
PSR0. 2 wont start and hinky radioyour lucky day! PSRG0. 3
PSRB1 radio wont work without power!
Here, the values of PGSR and PBSR would give the most probable cause of failure given the combinations of observed events.
PGSRPGPSRGPSR0. 20. 30. 20. 3 PBSRPBPSRBPSR0. 110. 20. 5
In this case, its clearly more likely that the battery is the issue if both the radio is out and the car doesnt start.
2

Of course, the problem here is that as I add more and more observable conditions like in HW3, where you know about age, foot length, Internet access, superpowers and so on you need to estimate the joint and conditional probabilities for every combination of observables. And because we may have very few samples of very complicated conditions, it is impossible to estimate the above probabilities with frequencies alone. A single disease with n different TrueFalse observable symptoms would require O2n estimates from data:
Pcombinationofsymptoms PcombinationofsymptomsD
The Naive Bayes algorithm makes a single assumption that reduces this burden.
The Naive Bayes algorithm makes a single assumption that reduces this burden. By assuming that each symptom is an independent event, we can estimate the different values as follows:
PS1S2PS1PS2 PS1S2DPS1DPS2D
These independence assumptions will make the system less reliable if they are violated; however, in practice, the result is often not sufficiently bad to compromise the relative likelihood when comparing alternative diseases.
Predicting Superpowers
In this homework, you will build a Naive Bayes system that can be used to predict one variable or attribute from a subset of the other attributes. So, for example, imagine you were asked what superpower a 160 inch tall righthanded male who likes basketball, drinks tea, and prefers English class would most likely desire? Freezing time? Flying? Invisibility? Telepathy?
To answer this question, you would compare the probability estimate for each of the possible superpowers the outcome based on the combination of known attributes the evidence under the conditional independence assumption. Bayes Rule tells us that:
PoutcomeevidencePevidenceoutcomePoutcome Pevidence
so the approach is simply to compare PlotteryAevidence with PlotteryBevidence and then predict whichever one has the higher value. Note that when computing PlotteryAevidence and PlotteryBevidence they will both have the same denominator, Pevidence, so since we are only interested in whichever one is larger, we can skip the denominator altogether and simply compare their respective PevidenceoutcomePoutcome values directly.
What makes this algorithm naive? Well, the naive part comes from the assumption that Pevidence, which is really the probability of a number of individual pieces of evidence occurring together, can be approximated by the product of the probabilities of each individual piece of evidence. In other words:
PevidenceoutcomePe1e2. . . eN outcomePe1outcome. . .PeN outcome
This is a critical assumption, because it means you dont have to consider all 2N combinations of e1 . . . eN values, rather just each individual one. Thats a huge savings, especially when you consider that some eis are not yesno, but rather have more possible values, meaning the base of that 2N exponent is actually larger than 2.
3

What to Do
This homework consists of just two functions. Download the template file, which contains part of my solution to homework 3 the getData function that reads in a data file. You will implement two functions, train and predict. The train function takes the output of getData and the attribute you wish to predict e.g., Superpower and returns a dictionary containing all the elements you need to make a prediction e.g., all the Peioutcome and Poutcome values. The predict function takes the output of the train function and a sample input consisting of a new record, and returns a list of possible outcomes ordered by their likelihood.
4

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] algorithm math statistic Introduction
$25