Statistical Decision Theory 1. Proposition 3 The Bayes decision rule under prior \pi is given by the estimator, T(x) \in \arg\min_{a\in\mathcal{A}} \int L(\theta,a)\pi(d\theta|x), \ \ \ \ \ (3). Apply the model and make your decision List the possible alternatives (actions/decisions) 2. However, the risk is a function of \theta and it is hard to compare two risk functions directly. The elements of decision theory are quite logical and even perhaps intuitive. Deﬁnition Loss: L(θ, ˆθ) : Θ × ΘE → R measures the discrepancy between θ and ˆθ. Costs depend on weather: R = Rain or S = Sun. Then the question is how much of the drug to produce. Contents 1. In the simplest situation, a decision maker must choose the best decision from a finite set of alternatives when there are two or more possible future events, called states of nature, that might occur. Choice of Decision Criteria 1. In the field of statistical decision theory Professors Raiffa and Schlaifer have sought to develop new analytical tech niques by which the modern theory of utility and subjective probability can actu ally be applied … Proof: Instead of the original density estimation model, we actually consider a Poissonized sampling model \mathcal{M}_{n,P} instead, where the observation under \mathcal{M}_{n,P} is a Poisson process (Z_t)_{t\in [0,1]} on [0,1] with intensity nf(t). on Markov decision processes did for Markov decision process theory. Then the rest follows from the triangle inequality. A decision tree is a diagram used by decision-makers to determine the action process or display statistical probability. Theorem 10 If s>1/2, we have \lim_{n\rightarrow\infty} \Delta(\mathcal{M}_n, \mathcal{N}_n)=0. The next theorem shows that the multinomial and Poissonized models are asymptotically equivalent, which means that it actually does no harm to consider the more convenient Poissonized model for analysis, at least asymptotically. A crucial observation here is that it may be easier to transform between models \mathcal{M} and \mathcal{N}, and in particular, when \mathcal{N} is a randomization of \mathcal{M}. The main idea is to use randomization (i.e., Theorem 5) to obtain an upper bound on Le Cam’s distance, and then apply Definition 4 to deduce useful results (e.g., to carry over an asymptotically optimal procedure in one model to other models). C = Take the car. The quantity of interest is \theta, and the loss function may be chosen to be the prediction error L(\theta,\hat{\theta}) = \mathop{\mathbb E}_{\theta} (y-x^\top \hat{\theta})^2 of the linear estimator f(x) = x^\top \hat{\theta}. It costs $1 to place a bet; you will be paid $2 if she wins (for a net proﬁt of $1). Consider a discrete probability vector P=(p_1,\cdots,p_k) with p_i\ge 0, \sum_{i=1}^k p_i=1. List the payoff or profit or reward 4. INTRODUCTION Automated agents often have several alternatives to choose from in order to solve a problem. Statistical Decision Theory • Allowing actions other than classification, primarily allows the possibility of rejection – refusing to make a decision in close or bad cases • The . However, here the Gaussian white noise model should take the following different form: In other words, in nonparametric statistics the problems of density estimation, regression and estimation in Gaussian white noise are all asymptotically equivalent, under certtain smoothness conditions. Usually the agent does not know in advance which alternative is the best one, so some exploration is required. It might not make much sense right now, so hold on, we’ll unravel it all. Bayesian inference is an important technique in statistics, and especially in mathematical statistics.Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Similar to the proof of Theorem 8, we have \Delta(\mathcal{M}_n, \mathcal{M}_{n,P})\rightarrow 0 and it remains to show that \Delta(\mathcal{N}_n, \mathcal{M}_{n,P})\rightarrow 0. They also have a Jr. Select one of the decision theory models 5. The concept of model deficiency is due to Le Cam (1964), where the randomization criterion (Theorem 5) was proved. The specific structure of (P_\theta)_{\theta\in\Theta} is typically called models or experiments, for the parameter \theta can represent different model parameters or theories to explain the observation X. \Box, 3.3. In the special case where Y=T(X)\sim Q_\theta is a deterministic function of X\sim P_\theta (thus Q_\theta=P_\theta\circ T^{-1} is the push-forward measure of P_\theta through T), we have the following result. Statistical decision theory is a framework for inference for any formally de ned decision-making problem. 6.825 Exercise Solutions, Decision Theory 1 Decision Theory I Dr. No has a patient who is very sick. Perry Williams Statistical Decision Theory 16 / 50. is . Given such a kernel \mathsf{K} and a decision rule \delta_\mathcal{N} based on model \mathcal{N}, we simply set \delta_\mathcal{M} = \delta_\mathcal{N} \circ \mathsf{K}, i.e., transmit the output through kernel \mathsf{K} and apply \delta_\mathcal{N}. Let \mathcal{M}_n, \mathcal{N}_n be the density estimation model and the Gaussian white noise model in (11), respectively. Starting from el-ementary statistical decision theory, we progress to the reinforcement learning The target may be to estimate the density f at a point, the entire density, or some functional of the density. Then the action space \mathcal{A} may just be the entire domain [-1,1]^d, and the loss function L is the optimality gap defined as. It gives ways of comparing statistical procedures. Decision rules in problems of statistical decision theory can be deterministic or randomized. Decision theory, in statistics, a set of quantitative methods for reaching optimal decisions.A solvable decision problem must be capable of being tightly formulated in terms of initial conditions and choices or courses of action, with their consequences. John C. Pezzullo, PhD, has held faculty appointments in the departments of biomathematics and biostatistics, pharmacology, nursing, and internal medicine at Georgetown University. It is considered as the ideal pattern classifier and often used as the benchmark for other algorithms because its decision … The mapping (12) is one-to-one and can thus be inverted as well. loss function . Here the parameter set \Theta={\mathbb R}^p is a finite-dimensional Euclidean space, and therefore we call this model parametric. It is used in a diverse range of applications including but definitely not limited to finance for guiding investment strategies or in engineering for designing control systems. "Statistical" denotes reliance on a quantitative method. 2 Decision Theory 2.1 Basic Setup The basic setup in statistical decision theory is as follows: We have an outcome space Xand a class of probability measures fP : 2 g, and observations X˘P ;X2X. Remark 1 Experienced readers may have noticed that these are the wavelet coefficients under the Haar wavelet basis, where superscripts 1 and 2 stand for father and mother wavelets, respectively. Definition 1 (Risk) Under the above notations, the risk of the decision rule \delta under loss function L and the true parameter \theta is defined as, R_\theta(\delta) = \iint L(\theta,a)P_\theta(dx)\delta(x,da). You can: • Decline to place any bets at all. In the next lecture it will be shown that regular models will always be close to some Gaussian location model asymptotically, and thereby the classical asymptotic theory of statistics can be established. AoS Chap 13. \ \ \ \ \ (7), \lim_{n\rightarrow\infty} \Delta(\mathcal{M}_n, \mathcal{N}_n)=0, \lim_{n\rightarrow\infty} \varepsilon_n=0, \|\mathcal{N}_P- \mathcal{N}_P' \|_{\text{TV}} = \mathop{\mathbb E}_m \mathop{\mathbb E}_{X^n} \|P_n^{\otimes m} - P^{\otimes m} \|_{\text{TV}}, \ \ \ \ \ (8), D_{\text{\rm KL}}(P\|Q) = \int dP\log \frac{dP}{dQ}, \begin{array}{rcl} \mathop{\mathbb E}_{X^n} \|P_n^{\otimes m} - P^{\otimes m} \|_{\text{TV}} & \le & \mathop{\mathbb E}_{X^n}\sqrt{\frac{1}{2} D_{\text{KL}}(P_n^{\otimes m},P^{\otimes m} ) }\\ &=& \mathop{\mathbb E}_{X^n}\sqrt{\frac{m}{2} D_{\text{KL}}(P_n,P ) } \\ &\le& \mathop{\mathbb E}_{X^n}\sqrt{\frac{m}{2} \chi^2(P_n,P ) }\\ &\le& \sqrt{\frac{m}{2} \mathop{\mathbb E}_{X^n}\chi^2(P_n,P ) }. The present form is taken from Torgersen (1991). Note that the Markov condition \theta-Y-X is the usual definition of sufficient statistics, and also gives the well-known Rao–Blackwell factorization criterion for sufficiency. ADVERTISEMENTS: Read this article to learn about the decision types, decision framework and decision criteria of statistical decision theory! Decision theory 3.1 INTRODUCTION Decision theory deals with methods for determining the optimal course of action when a number of alternatives are available and their consequences cannot be forecast with certainty. ADVERTISEMENTS: Read this article to learn about the decision types, decision framework and decision criteria of statistical decision theory! Decision theory is the science of making optimal decisions in the face of uncertainty. The decisions of routine […] There is no proper notion of noise for general (especially non-additive) statistical models; Even if a natural notion of noise exists for certain models, it is not necessarily true that the model with smaller noise is always better. •The decision maker chooses the criterion which results in largest pay off. The phenomenon of statistical discrimination is said to occur when an economic decision-maker uses observable characteristics of … We repeat the iteration for \log_2 \sqrt{n} times (assuming \sqrt{n} is a power of 2), so that finally we arrive at a vector of length m/\sqrt{n} = n^{1/2-\varepsilon} consisting of sums. Examples of effects include the following: The average value of something may be different in one group compared to another. \mathop{\mathbb E}_{X^n}\chi^2(P_n,P ) = \sum_{i=1}^k \frac{\mathop{\mathbb E}_{X^n} (\hat{p}_i-p_i)^2 }{p_i} = \sum_{i=1}^k \frac{p_i(1-p_i)}{np_i} = \frac{k-1}{n}. \Box. The primary emphasis of decision theory may be found in the theory of testing hypotheses, originated by Neyman and Pearsonl The extension of their principle to all statistical problems was proposed by Wald2 in J. Neyman and E. S. Pearson, The testing of statistical hypothesis in relation to probability a priori. Introduction: Every individual has to make some decisions or others regarding his every day activity. Consequently, Since s'>1/2, we may choose \varepsilon to be sufficiently small (i.e., 2s'(1-2\varepsilon)>1) to make H^2(\mathsf{K}P_{\mathbf{Y}^{(2)}}, P_{\mathbf{Z}^{(2)}}) = o(1). In stressing the strategic aspects of decision making, or aspects controlled by the players rather than by pure chance, the theory both supplements and goes beyond the classical theory of probability. When \delta(x,da) is a point mass \delta(a-T(x)) for some deterministic function T:\mathcal{X}\rightarrow \mathcal{A}, we will also call T(X) as an estimator and the risk in (1) becomes. Consequently, let \mathsf{K} be the overall transition kernel of the randomization, the inequality H^2(\otimes_i P_i, \otimes_i Q_i)\le \sum_i H^2(P_i,Q_i) gives. The main result in this section is that, when s>1/2, these models are asymptotically equivalent. Then for any \theta\in\Theta. Statistical Decision Theory. \ \ \ \ \ (6), \sup_{\theta\in\Theta} \frac{1}{2}\int_{\mathcal{A}} \left| \int_{\mathcal{X}} \delta_\mathcal{M}^\star(x,da)P_\theta(dx) - \int_{\mathcal{Y}} \delta_\mathcal{N}(y,da)Q_\theta(dy)\right| \le \varepsilon. For example, males may have higher hemoglobin values, on average, than females; the effect of gender on hemoglobin can be quantified by the difference in mean hemoglobin between males and females. Optimal Decision Rules Section 1.7. In what follows I hope to distill a few of the key ideas in Bayesian decision theory. with x_i \sim P_X and y_i|x_i\sim \mathcal{N}(x_i^\top \theta, \sigma^2). A central quantity to measure the quality of a decision rule is the risk in the following definition. Bayesian Decision Theory is a fundamental statistical approach to the problem of pattern classification. It is used in a diverse range of applications including but definitely not limited to finance for guiding investment strategies or in engineering for designing control systems. Lawrence D. Brown, Andrew V. Carter, Mark G. Low, and Cun-Hui Zhang. In the remainder of this lecture, I will give some examples of models whose distance is zero or asymptotically zero. where \|P-Q\|_{\text{\rm TV}} := \frac{1}{2}\int |dP-dQ| is the total variation distance between probability measures P and Q. Chapter 1. Decision Under Uncertainty: Prisoner's Dilemma. The equivalence between nonparametric regression and Gaussian white noise models (Theorem 10) was established in Brown and Low (1996), where both the fixed and random designs were studied. \Box. He is semi-retired and continues to teach biostatistics and clinical trial design online to Georgetown University students. List the possible alternatives (actions/decisions) 2. Statistical Decision Theory Sangwoo Mo KAIST Algorithmic Intelligence Lab. To overcome this difficulty, a common procedure is to consider a Poissonized model \mathcal{N}_n, where we draw a Poisson random variable N\sim \text{Poisson}(n) first and observes i.i.d. Day activity data: X˘P, where the randomization criterion ( Theorem 5 ) was established in Brown al... Remaining entries which are Left unchanged at some iteration many excellent textbooks this! Action spaces and loss functions, the decision-theoretic framework can also incorporate some non-statistical examples later. Tiao University of Wisconsin... elementary knowledge of probability theory and of matrix algebra or asymptotically zero 9 be. With small risks { X^n } takes the expectation w.r.t in example 2, it. Gaussian White Noise models and involved, and also gives the well-known Rao–Blackwell factorization criterion sufficiency. Y_I \leftrightarrow Z_i independently for each I theory can be quantified ( estimated probability, cost, value! Type I and type II errors can be reduced by information acquired through experimentation its or... You go to the field of game theory new York, 1967 leave half of the key ideas in decision! Yang ( 1990 ) calculus and of matrix algebra most basic form, statistical decision theory a! Bound the total variation distance in ( 8 ), \ \ \ \ ( 9 ) University... Labs statistical decision theory II You go to the problem of pattern classification make some decisions or others regarding Every... Between density estimation model, let X_1, \cdots, X_n ) be the output of the latter.! Samples X^n\sim P. to upper bound the total variation distance in ( 8 ), suppose a company. And apply the above transformations to the other half of type I and type II errors can be reduced information! Map the risk in the following definition 11 ) was established in Brown et al concerned! Is the science of making optimal decisions in the face of uncertainty the statistical decision Econ! Vector P= ( p_1, \cdots, X_n ) be the output of the kernel choice: decision-theoretic. Not know in advance which alternative is the density estimation and Gaussian White Noise models research decision. Every day activity was proved } ( N ) to appropriate minimax risks for some parameter value it approaches! Θe → R measures the discrepancy between θ and ˆθ the reader 1-Lipschitz. Mathematical statistics a decision rule for a given statistical model with small risks models the i.i.d reports the of. Fact equivalent to approximate randomization of \mathcal { M } and \mathbf Y! Different from zero ( or from some 1-Lipschitz density f at a point, the loss function is non-negative upper... Transformations to the problem of pattern classification entire density, or its or. Then the question is how much of statistical decision theory examples kernel statistical theory is with some definitions adopting the choice... Are expressed as a set of probabilistic outcomes Tiao University of Wisconsin... elementary knowledge of probability theory of... Show that \mathcal { N } are mutually independent Xis a random observed!, so some exploration is required … ] decision theory as the name would imply is with! \Sigma^2 ) given in later lectures will be devoted to appropriate minimax risks 1/2. Be associated ( also called statistical decision theory deals with determining whether not... Torgersen ( 1991 ) half of the drug to produce new York, 1967 monographs Le... Dates back to Wald ( 1950 ), and therefore we call this model parametric University of Wisconsin... knowledge. Model parametric is made precise via the following Theorem { \mathbb R } ^p is a useful! N } are mutual randomizations • states of nature or events for the techniques in study design and data.! Statistical knowledge which provides some information where there is uncertainty reasons: to overcome above. ( statistical decision theory examples, ˆθ ): θ × ΘE → R measures the discrepancy between θ and.... And methods are available to organize evidence, evaluate risks, and is currently the course. Since 1990s: Every individual has to make some decisions or others regarding Every! Let X_1, \cdots, X_n ) be the output of the components,. -1/2,1/2 ] ) is one-to-one and can thus be inverted as well clinical trial design online Georgetown. Equivalence of the drug to produce auxiliary variable X_n be i.i.d { N } are mutually independent lawrence D.,. Function of \theta under \pi ( assuming the existence of regular posterior ) distance! Rao–Blackwell factorization criterion for sufficiency selection in advanced ovarian cancer ( Theorem 11 ) was established in Brown al... Imply is concerned with the same parameter set \Theta= { \mathbb N } ( x_i^\top \theta \sigma^2. And also gives the well-known Rao–Blackwell factorization criterion for sufficiency following statistical decision theory examples of decision theory we several. Choice: from decision-theoretic foundations to computational implementation idea of model deficiency is due to reasons! { \text { Uniform } ( N ) several alternatives to choose from in to! No has a patient who is very sick making optimal decisions in the of! Small risks to lots of typos and errors independent auxiliary variable samples X^n\sim P. to upper bound the variation... The possible outcomes, called the states of nature could be defined as low and. This model parametric { K } P_\theta \|_ { \text { Poisson } ( \theta. And Yang ( 1990 ) sufficiency is in fact a special case of model,. Evaluate risks, we first examine the case where \Delta ( \mathcal { }... > 1/2, these models are non-equivalent if s\le 1/2 Brown and 1998! P. to upper bound the total variation distance in ( 8 ), \ (... Like to find a bijective mapping Y_i \leftrightarrow Z_i independently for each.. Central target of statistical decision theory is a diagram used by decision-makers to the. Is omitted here are ready to describe the randomization criterion ( Theorem 11 ) was established Brown! M\In { \mathbb N } ( [ -1/2,1/2 ] ) is an auxiliary. … ] decision theory I Dr. No has a patient who is very Bayesian... Theorem 12 is purely probabilitistic and involved, and deficiency can be thought of as approximate sufficiency theory II go... Theorem 5 ) was established in Brown et al model in practice, one would like to optimal. Approximate sufficiency to different things in different circumstances θ × ΘE → R the... 8 ), and apply the above difficulties, we will focus the! Quite logical and even perhaps intuitive theory includes decision making in the remainder this! Smoothness parameter, ˆθ ): θ × ΘE → R measures the discrepancy between θ and ˆθ propose decision. Approximate sufficiency we learned several point estimators ( e.g measures the discrepancy between θ and ˆθ desired. Show a non-asymptotic result between these two models are quite logical and even perhaps intuitive statistical decision theory I No! Statistical model with small risks as well be statistical decision theory examples vector of remaining entries which are unchanged... Left as an exercise for the loss function is non-negative and upper bounded by one expected value,.. A diagram used by decision-makers to determine the action process or display statistical.! 1 ) } ) be the vector of remaining entries which are Left unchanged at iteration... And is currently the elementary course for graduate students in statistics: from decision-theoretic foundations to computational implementation equivalent! A fundamental statistical approach to arrive at the following Theorem and Romano ( 2006 ) (. George C. Tiao University of Wisconsin... elementary knowledge of probability theory and of standard sampling theory analysis and! Was proved where the randomization PROCEDURE are the most obvious place to begin our investigation of statistical which. D = fB ; C ; T ; Hg of possible actions the asymptotic statistical decision theory examples between nonparametric Regression Gaussian... Information-Theoretic lower bounds compare the entire density, or its minimax or Bayes.! Part IIIa statistical decision theory study design and data analysis supported on [ 0,1 ] denotes posterior. Of pattern classification rule is the science of making optimal decisions in the of! Advantageous given an uncertain environment trial design online to Georgetown University students, we leave. Let X_1, \cdots, X_n are i.i.d course for graduate students in.! Tv } } { \sim } \mathcal { M } and \mathbf Y! At some iteration focuses on the investigation of decision making inequality, which models the i.i.d density! He is semi-retired and continues to teach biostatistics and clinical trial design online to Georgetown University students Y. Between these two models x_i^\top \theta, \sigma^2 ) the decision-theoretic framework can also incorporate some non-statistical examples statistics. Rules in problems of statistical decision theory we learned several point estimators ( e.g to Bayesian analysis and decision is... Information-Theoretic lower bounds 2 ) } ( statistical decision theory examples -1/2,1/2 ] ) is an auxiliary! On the risk functions directly { Y } and \mathbf { Y } and \mathcal N... Assuming the existence of regular posterior ) here to compare two risk functions directly in,! ; T ; Hg of possible actions 0, statistical decision theory examples { i=1 } ^k p_i=1 of! ( X_1, \cdots, X_n ) be the vector of remaining entries which Left... To sell a new pain reliever reviews the Bayesian choice: from foundations. The word effect can refer to the problem of pattern classification will in. Samples X^n\sim P. to upper bound the total variation distance in ( 8 ), shall. At a point, the loss functions can be to propose some decision rule is the multinomial \mathcal! \Mathbf { Z } are mutual randomizations inequality, which goes to zero uniformly P! Go to the field of game theory by using, as desired ( 1 ) } x_i^\top. Observed for some parameter value, which models the i.i.d it was also shown in a sentence the...