czwartek, 30-01-2020 - 14:15, 603
Data-Driven Kaplan-Meier One-Sided Two-Sample Tests
In the talk, we discuss existing approaches, known from the literature, to detection of stochastic ordering of the two survival curves as well as pose and solve the novel testing problem on it. Specifically, the null hypothesis asserts the lack of the ordering, while the alternative expresses its existence. An introduced test statistic is a functional of the standardized two-sample Kaplan-Meier process sampling in a randomly selected number of the random points being the observed survival times in the pooled sample and exploits the information contained in a specially defined one-sided weighted log-rank statistic. It automatically weighs the magnitude and sign of their components becoming a sensible procedure in the considered testing problem. As a result, the corresponding test asymptoticly controls the errors of both kinds at the specified significance level α. The conducted simulation study shows that the errors are also satisfactorily controlled when sample sizes are finite. Furthermore, in the comparison to the best and most popular tests, the new solution turns out to be a promising procedure which improves them upon. A real data analysis confirms that findings.
czwartek, 23-01-2020 - 14:15, 603
On irrepresentable condition for LASSO and SLOPE estimators
The irrepresentable condition is a well known condition for sign recovery by LASSO.
In this talk we introduce a similar condition for model recovery by SLOPE.
czwartek, 16-01-2020 - 14:15, 603
Finding structured estimates in matrix regression problems
Classical scalar-response regression methods treat covariates as a vector and estimate a corresponding vector of regression coefficients. In medical applications, however, regressors are often in a form of multi-dimensional arrays. For example, one may be interested in using MRI imaging to identify which brain regions are associated with a health outcome. Vectorizing the two-dimensional image arrays is an unsatisfactory approach since it destroys the inherent spatial structure of the images and can be computationally challenging. We present an alternative approach - regularized matrix regression - where the matrix of regression coefficients is defined as a solution to the specific optimization problem. The method, called SParsity Inducing Nuclear Norm EstimatoR (SpINNEr), simultaneously imposes two penalty types on the regression coefficient matrix - the nuclear norm and the lasso norm - to encourage a low rank matrix solution that also has entry-wise sparsity. A novel implementation of the alternating direction method of multipliers (ADMM) is used to build a fast and efficient numerical solver. Our simulations show that SpINNEr outperforms others methods in estimation accuracy when the response-related entries (representing the brain's functional connectivity) are arranged in well-connected communities. SpINNEr is applied to investigate associations between HIV disease-related outcomes and functional connectivity in the human brain.
czwartek, 05-12-2019 - 14:15, 603
Statistical challenges in mass spectrometry data analysis: shared peptides
Mass spectrometry (MS) is one of the most important technologies for study of proteins. MS experiments generate massive amounts of complex data which require advanced pre-processing and careful statistical analysis.
In bottom-up approach to MS, peptides - smaller segments of proteins - enter the mass spectrometer and thus measurements are made on a peptide level.
Because of this, one of the problems in protein quantification based on MS is the presence of peptides that can be assigned to multiple proteins.
Such peptides are referred to as shared or degenerate peptides.
Since it is not obvious how to assign the abundance of shared peptides to proteins, they are often discarded from the analysis. This leads to a loss of a substantial amount of data.
In this talk, I will first present the basics of Mass Spectrometry data analysis. Then, I will review existing methods for handling shared peptides.
I will finish with a summary of our progress on improving methodology of protein quantification with shared peptides and related statistical challenges.
The talk is based on an ongoing collaboration with Tomasz Burzykowski (Hasselt University) and Jurgen Claesen (Belgian Nuclear Research Centre).
czwartek, 14-11-2019 - 14:15, 603
On the Model Selection Properties and Uniqueness of the Lasso and Related Estimators
Ulrike Schneider (Vienna University of Technology)
We investigate the model selection properties of the Lasso estimator in finite samples with no conditions on the regressor matrix X. We show that which covariates the Lasso estimator may potentially choose in high dimensions (where the number of explanatory variables p exceeds sample
size n) depends only on X and the given penalization weights. This set of potential covariates can be determined through a geometric condition on X and may be small enough (less than or equal to n in cardinality). Related to the geometric conditions in our considerations, we also provide a necessary and sufficient condition for uniqueness of the Lasso solutions. Finally, we discuss how these results carry over to other model selection procedures such as the SLOPE
czwartek, 07-11-2019 - 14:15, 603
Selection of colored saturated Gaussian models
Piotr Graczyk (Université d'Angers)
wtorek, 29-10-2019 - 14:15, 605
Analysis of HDX-MS data: a pristine land for bioinformatics
Michał Burdukiewicz (MI2 DataLab, PW)
Hydrogen-deuterium exchange monitored by mass spectrometry (HDX-MS) has recently become a staple tool in studies of protein structure. The main application of this technique is to compare the structure of a protein altered by several factors (so-called states). Introduced statistical frameworks address the screening part of the analysis, i.e., search for significant differences between states, but miss the post-screening phase of analysis. We critically evaluate existing models and point their strengths and weaknesses. Additionally, we provide a novel solution to a multi-state comparison problem where the region of the interest inside the protein structure is already well-defined.
czwartek, 24-10-2019 - 14:15, 603
Counting faces of random polytopes and applications
Abstract in the attachment
czwartek, 17-10-2019 - 14:15, 603
Statistical inference with missing values
Missing data exist in almost all areas of empirical research. There are various reasons why missing data may occur, including survey non-response, unavailability of measurements, and lost data. In this presentation, I will share my experience on how to do parametric estimation with missing covariates, based on likelihood methods and Expectation-Maximization algorithm. Then I will focus on recent results in a supervised learning setting, for performing logistic regression with missing values. We illustrate the method on a dataset of severely traumatized patients from Paris hospitals to predict the occurrence of hemorrhagic shock, a leading cause of early preventable death in severe trauma cases. The methodology is implemented in the R package misaem.
środa, 07-11-2018 - 14:15, 711/712
Topics on stochastic optimization and long-time approximation of stochastic processes
Stochastic optimization is a way of approximating minima of deterministic functions by a stochastic approach. I will begin my talk by some background on this topic and on the Robbins-Monro algorithm. Then, I will state some recent non-asymptotic results about Ruppert-Polyak algorithm, which is an averaged version of the Robbins-Monro algorithm. In a last part, I will briefly introduce the problem of long-time approximation of diffusion processes and its link with approximation of Gibbs distributions. I will conclude some statistical applications of these methods. This talk is based on collaborations with Sébastien Gadat and Gilles Pagès