Dissertations and Theses - Statistics
http://hdl.handle.net/2142/17362
Tue, 30 Jun 2015 19:47:50 GMT2015-06-30T19:47:50ZBayesian Inference in Nonparametric Logistic Regression
http://hdl.handle.net/2142/72584
Bayesian Inference in Nonparametric Logistic Regression
We consider the problem of regressing a dichotomous response variable on a predictor variable. Our interest is in modelling the probability of occurrence of the response as a function of the predictor variable, and in inferences about the estimated function. The log-odds (logit) of the probability is estimated nonparametrically, using generalized smoothing splines.; For purposes of inference, a partially improper stochastic process prior, proposed by Wahba (1978), is specified on the logit. The posterior is rather complicated and a number of questions need to be addressed.; We study the properties of the posterior distribution and give necessary and sufficient conditions under which it is a proper probability measure. These conditions are shown to be equivalent to those for the existence of the posterior mode (which is the smoothing spline estimator) and of the m.l.e. defined on the subspace of prior impropriety or the null space of the prior precision. A simple test for the case of polynomial regression in 1-dimension is also derived.; A Gaussian approximation to the posterior has been proposed by other authors. Our results on the tail behavior of the posterior suggest there are problems with this, but a more complete answer requires finite sample calculations, a computationally imposing task. We use Monte Carlo importance sampling for analysis of the posterior. Some of the posterior quantities that are estimated using this approach are posterior moments and pointwise and simultaneous posterior credibility bands. A comparison of these quantities with those based on the Gaussian approximation provides an assessment of how well the Gaussian approximation works for finite-sample sizes. The frequentist properties of the inferences based on the Bayesian model can also be investigated using this approach.; This Bayesian model with a partially improper prior is mathematically equivalent to the more classical generalized random effects models that have been used. The implications of these results for the logistic regression model with random effects are discussed.
Statistics
Fri, 01 Jan 1993 00:00:00 GMThttp://hdl.handle.net/2142/725841993-01-01T00:00:00ZOn Logistic Regression Approach to Survival Data and Power Divergence Statistics for Life Tables
http://hdl.handle.net/2142/72583
On Logistic Regression Approach to Survival Data and Power Divergence Statistics for Life Tables
Efron (1988) proposes the use of standard logistic regression techniques to estimate hazard rates and survival curves from survival data. These techniques allow statisticians to use parametric regression modeling on survival data in a flexible way that provides both estimates and standard errors. In the first part of this thesis, large sample properties of this logistic regression method are developed. It is shown that under some regularity conditions the hazard rate and survival function estimators are consistent and their corresponding asymptotic normality results also hold. Extension of Efron's method to regression model is proposed and their asymptotic properties again are examined.; The second part of the thesis introduces two classes of power divergence statistics for life tables. The first class of statistics is similar to the power divergence family of Cressie and Read's (1984) for the analysis of contingency tables. The second type of statistics is easier to be interpreted geometrically than the first one. However, these two classes of statistics are asymptotically equivalent. A relatively complete large sample theory, including consistency and asymptotic normality, is provided.
Statistics
Fri, 01 Jan 1993 00:00:00 GMThttp://hdl.handle.net/2142/725831993-01-01T00:00:00ZContributions to Statistical Problems Related to Microarray Data
http://hdl.handle.net/2142/72582
Contributions to Statistical Problems Related to Microarray Data
Microarray is a high throughput technology to measure the gene expression. Analysis of microarray data brings many interesting and challenging problems. This thesis consists three studies related to microarray data. First, we propose a Bayesian model for microarray data and use Bayes Factors to identify differentially expressed genes. Second, we study the cellular differentiation process and proposed a statistical test for detecting early differentiation genes. Third, we further proposed a model-based method for the cellular differentiation problem.
Statistics
Thu, 01 Jan 2009 00:00:00 GMThttp://hdl.handle.net/2142/725822009-01-01T00:00:00ZStatistical Aspects of a New Latent Trait Model
http://hdl.handle.net/2142/71503
Statistical Aspects of a New Latent Trait Model
Current achievement and aptitude test modeling--item response theory--is based on the overly-optimistic assumption of local independence: that examinee's responses to different test questions will be independent conditional on the latent trait (ability) being measured by the questions. A more realistic account is presented here based on Stout's (1988a, 1988b) notion of essential independence in which the average covariance between the examinee's responses is small but not zero.; Essential independence is seen to be more natural psychometrically and more amenable to statistical tests of model fit than local independence. A new theorem proved here shows that the principal difference between local and essential independence is conditional association, a property introduced by Holland & Rosenbaum (1986).; Estimation procedures may still be developed under local independence as long as they are subsequently examined and calibrated for practical use under essential independence. This transition from local to essential independence is illustrated with two useful estimation procedures.; First a computationally simple latent trait distribution estimator, motivated under local independence, is shown to be consistent for estimating the latent distribution in an examinee population under the most general essential independence model. Pilot simulations show that this estimator should be useful even when local independence holds.; Second we examine the behavior of the maximum likelihood estimator $\\vartheta\sb{\rm J}$, computed from the likelihood under local independence, when in fact only essential independence holds. Under a technical strengthening of essential independence which is psychometrically innocuous, we show that $\\vartheta\sb{\rm J}$ continues to be consistent for the latent trait. If we require the average inter-item covariance to go to zero like 1/(test length) and impose a global controlling condition on the questions such as $\varphi$-mixing or association, $\\vartheta\sb{\rm J}$ is asymptotically normal and efficient. The central role of "proportion correct" and its variants in driving the behavior of latent trait estimators is also illustrated.
Education, Tests and Measurements; Statistics; Psychology, Psychometrics
Fri, 01 Jan 1988 00:00:00 GMThttp://hdl.handle.net/2142/715031988-01-01T00:00:00ZSequential Confidence Sets With Beta-Protection in the Presence of Nuisance Parameters
http://hdl.handle.net/2142/71502
Sequential Confidence Sets With Beta-Protection in the Presence of Nuisance Parameters
In this thesis we study sequential procedures for constructing one-sided and bounded sequential confidence sets with $\beta$-protection and coverage probability at least 1 $-$ $\alpha$ for the mean of a distribution in the presence of nuisance parameters.; Let $\{X\sb{n}$: n = 1,2, dots,$\}$ be i i d p-variate (p $\geq$ 1) random variables with distribution $P\sb{\theta}$, $\theta\in\Theta$. The parameter space $\Theta$ is some abstract set for which various choices will be made. The mean $\mu$ = $\mu(\theta)$ of $P\sb{\theta}$ is the parameter of interest, the rest of $\theta$ will be regarded as a nuisance parameter. In the simplest case $\mu$ is real valued and there is given an imprecision function $\delta(\mu)$ $>$ 0 and error probabilities 0 $$ 1). In the univariate case several asymptotic properties of the stopping time are obtained in various limiting situations.
Statistics
Thu, 01 Jan 1987 00:00:00 GMThttp://hdl.handle.net/2142/715021987-01-01T00:00:00ZSequential Confidence Sets With Guaranteed Coverage Probability and Beta-Protection in Multiparameter Families
http://hdl.handle.net/2142/71501
Sequential Confidence Sets With Guaranteed Coverage Probability and Beta-Protection in Multiparameter Families
We consider a class of invariant sequential procedures for constructing one-sided and two-sided confidence sets for a parameter $\gamma$ in R$\sp{k}$, with the property that they have a coverage probability at least 1 - $\alpha$ and probability of covering a certain set of false values at most $\beta$. In addition, a method is proposed that is capable of generating a wide variety of sequential confidence sets (not necessarily equivariant) and some of its properties are investigated. The asymptotic properties of the stopping time are studied and the limiting values of the error probabilities are found as the parameter approaches the boundary points. Applications are made to the problem of simultaneous confidence sets for the mean and variance of a normal random variable and for its multivariate analogue.
Statistics
Thu, 01 Jan 1987 00:00:00 GMThttp://hdl.handle.net/2142/715011987-01-01T00:00:00ZImproving Inadmissible Hypothesis Testing Procedures in Exponential Family Statistical Models (Lrt, Pointwise Compactness, One-Sided Alternatives)
http://hdl.handle.net/2142/71500
Improving Inadmissible Hypothesis Testing Procedures in Exponential Family Statistical Models (Lrt, Pointwise Compactness, One-Sided Alternatives)
This thesis is devoted to constructing tests of hypotheses that dominate a given test which violates certain conditions such as convexity or monotonicity. Many authors, for example, Birnbaum (1955), Matthes and Truax (1967), Ferguson (1967), Eaton (1970) and Marden (1981), (1982), have worked on this type of testing problem. They found that a necessary condition for a test to be admissible is that its acceptance region should satisfy certain convexity and monotonicity conditions. These results are not constructive, however, to the extent that one does not know what test(s) dominate a given inadmissible one.; In this thesis the researcher provides a method of constructing a test which strictly dominates one that violates the convexity or the monotonicity conditions. The construction method exploits the case where the parameter is real valued, in which case Ferguson's (1967) construction in one dimension can be used by conditioning on the sufficient statistics of the nuisance parameters. A sequence of tests can be obtained iteratively, each one strictly dominating the previous test. The convergence of this sequence is investigated. It is shown that the relative convex hulls of the acceptance regions decrease. The researcher can only conjecture that the limiting test is "best" in the sense that it can not be improved any further. The conjecture is solved for the case of a discrete dominating measure.; Moreover, the Likelihood Ratio Test (L.R.T.) conjecture, proposed by J. Marden in an NSF proposal (1982), is addressed and partially solved in this thesis. The L.R.T. conjecture states that the more restrictions which are put on the alternative space, the higher the power of the L.R.T.
Statistics
Wed, 01 Jan 1986 00:00:00 GMThttp://hdl.handle.net/2142/715001986-01-01T00:00:00ZSmallest Simultaneous Confidence Sets, Using Sufficiency and Invariance, With Applications to Manova and Gmanova
http://hdl.handle.net/2142/68341
Smallest Simultaneous Confidence Sets, Using Sufficiency and Invariance, With Applications to Manova and Gmanova
The thesis first describes how sufficiency and invariance considerations can be applied in problems of confidence set estimation to reduce the class of set estimators under investigation. Let X be a random variables taking values in with distribution P(,(theta)), (theta) (epsilon) (THETA), and suppose a confidence set is desired for (gamma) = (gamma)((theta)), where (gamma) takes values in (GAMMA). The main tool used is the representation of a randomized set estimator as a function (phi) : x (GAMMA) (--->) {0, 1}. Sufficiency is defined in terms of the family {P(,(theta),(gamma)):((theta), (gamma)) (epsilon) (THETA) x (GAMMA)} of distributions on x (GAMMA), where P(,(theta),(gamma)) is P(,(theta)) supported on x {(gamma)}. Let T: x (GAMMA) (--->) and let ('T) be the family of induced distributions on . Let S be a function defined on which is sufficient for ('T). Then the class of randomized set estimators based on S is essentially complete among those based on T provided the risk function depends only on E(,(theta))(phi)(X,(gamma)), ((theta),(gamma)) (epsilon) (THETA) x (GAMMA). In applications of interest the above definition of sufficiency is equivalent to the usual one given in terms of {P(,(theta)):(theta) (epsilon) (THETA)}. If G is an invariance group acting on the problem and T above is a maximal invariant under G, then the principle of invariance allows one to restrict attention to set estimators based on the invariantly sufficient function S. Moreover, if G acts transitively on (THETA), then S is an invariant pivotal quantity.; Wijsman (1980, Multivariate Analysis V) describes a method for generating smallest simultaneous confidence sets {A(,i)} for a family of parametric functions {(psi)(,i)((gamma))} starting from a confidence set C(,0) for (gamma). In addition the method determines the confidence set C(,1) (R-HOOK EQ) C(,0) with respect to which the family {A(,i)} is exact; i.e., {(gamma) (epsilon) C(,1)(X)} = {(psi)i((gamma)) (epsilon) Ai(X) (FOR ALL)i}. Confidence sets C(,1) satisfying the above equation for some family {A(,i)} are termed self-reproducing.(,); The thesis extends Wijsman's method to include randomized set estimators and applies the method to find all self-reproducing set estimators defined in terms of an invariantly sufficient function, together with the corresponding smallest simultaneous set estimators. The applications are given in the general multivariate analysis of variance (GMANOVA) model of Potthoff and Roy under full group reduction and in the MANOVA model under both full and partial group reduction. The parameter (gamma) is a q x p matrix M and the families of parametric functions considered are {a'M, a (epsilon) R('q)}, {Mb, b (epsilon) R('p)}, {a'Mb}, and {tr N'M, N:q x p}. As an example, the confidence set determined by the extension of Roy's maximum root criterion to GMANOVA was found to be self-reproducing for all the above families.; The confidence set determined by the step-down procedure in MANOVA is not self-reproducing. Thus its confidence coefficent gives a conservative lower bound on the probability of simultaneous coverage by the induced family of simultaneous confidence sets. The thesis develops an improved approximation for this probability.
Statistics
Thu, 01 Jan 1981 00:00:00 GMThttp://hdl.handle.net/2142/683411981-01-01T00:00:00ZSome Results on Classifying an Observation Into One of Several Multivariate Normal Populations With Equal Covariance Matrices
http://hdl.handle.net/2142/68340
Some Results on Classifying an Observation Into One of Several Multivariate Normal Populations With Equal Covariance Matrices
Statistics
Mon, 01 Jan 1979 00:00:00 GMThttp://hdl.handle.net/2142/683401979-01-01T00:00:00ZStopping Time of Invariant Sequential Probability Ratio Tests in Multivariate Analysis of Variance
http://hdl.handle.net/2142/65060
Stopping Time of Invariant Sequential Probability Ratio Tests in Multivariate Analysis of Variance
Statistics
Tue, 01 Jan 1974 00:00:00 GMThttp://hdl.handle.net/2142/650601974-01-01T00:00:00Z