[Solved] Reinforcement Learning -Homework 3

$25

File Name: Reinforcement_Learning__Homework_3.zip
File Size: 320.28 KB

SKU: [Solved] Reinforcement Learning -Homework 3 Category: Tag:
5/5 - (1 vote)

Exploration in Reinforcement Learning (theory)

1UCBWe find ourselves in the setting of multi-arm bandits.Sj,t =Nj,t = j,t=t k=1t k=1Xik,k 1(ik = j)1(ik = j)Sj,t Nj,tThe question is to prove whether or not j,t is an unbiased estimator of j. At first sight, one could interpret j,t as the simple mean estimate of j and thus would be unbiased. However, this would only apply if samples Xik,k were independent and identically distributed (iid), which is not the case here in the online on-policy learning of UCB. Whether an arm is pulled or not depends on previous samples and therefore one can expect the estimate to rather have some bias. To prove the biasedness of j,t, or rather to show that it is not unbiased in the general case, we will consider a simple case and compute its analytical bias. Let us consider the setting of Bernoulli bandits as in section 3 with k = 2 binary arms of parameters 1 and 2. One pulls the arm it such thatit E arg max j j,t + U(Nj,t,)We assume here that arms are pulled randomly in case of a tie. The UCB exploration term is infinite for t E {1, 2} where both arms are pulled successively. At t = 3, both arms have been pulled once and one of them is going to be pulled again. We look at the sample mean estimates 1,3 and 2,3 after the third action. 1,3 = 1 P 2 = (1- 1)(1 1 1(1 1 1 2) 2 + 1)(1 2) + 1 2 2 = 1 2 + 1 2) 1 (2 2 1) 2 2) 1 + 1 ( 1,3 = = 1(1 + 1 2(1 -2 ) = 1 ( 1 + 2 2 1 2)1

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] Reinforcement Learning -Homework 3[Solved] Reinforcement Learning -Homework 3
$25