Research Activities

Areas of Research Interests

I am interested in solving practical problems in statistics and related fields (e.g., applied probability, computer science, mathematics, actuarial science, and engineering). My publications include five Books, six book chapters, and more than 100 articles (see recent publications or download complete CV). The methods for outlier detection, the Influence measure and the Potential-Residual Plot have been implemented in several statistics packages (e.g., Data Desk, Stata, and SYSTAT). Areas of my research Interests include:

Probability and Statistical Science:

Computer Science:

Mathematics:

Finance and Actaurial Science:

Engineering:

Interdisciplinary:

Back to top of page


Regression Analysis

Back to top of page


Robust Statistics and Outlier Detection

Although it is customary to assume that data are homogeneous, in fact they often contain outliers or subgroups. Scientists and philosophers have recognized for at least 380 years that real data are not homogeneous and that the identification of outliers is an important step in the progress of scientific understanding. Methods that deal with robust estimation and outlier detection are presented in the following articles:

Robust Regression Methods:

Detection of Outliers in Large Data Sets:

Detection of Outliers in Multivariate Data:

Detection of Outliers in Regression Data:

Graphical Methods for the Detection of Outliers:

Back to top of page


Parameter and Quantile Estimation

Back to top of page


Fatigue and Lifetime Data Analysis

Back to top of page


Reliability of Engineering Structures

Back to top of page


Extreme Value Distributions

Back to top of page


Matrix Alegebra

Back to top of page


Perturbed Eigenvalue Problem

Back to top of page


Generalized Inverses

Back to top of page


Statistical Analysis of Employment Discrimination Data

Back to top of page


Probability

Back to top of page


•Expert systems and probabilistic reasoning

Neural and Functional Networks

Back to top of page


Bayesian and Markov Networks

Back to top of page


Data Mining and Visualization

Back to top of page


Software Available

Back to top of page


S- PLUS Code:

function(X) {
# -----------------------------------------------------------------
#  Hadi, Ali S. (1994), "A Modification of a Method for the
#  Detection of Outliers in Multivariate Samples," Journal of the
#  Royal Statistical Society (B), 2, 393-396.
# -----------------------------------------------------------------
  n <- dim(X) [1]
  p <- dim(X) [2]
  h <- trunc((n + p + 1)/2)     id <- 1:n
  r <- p
  out <- 0
  cf <- (1 + ((p + 1)/(n - p)) + (2/(n - 1 - (3*p))) )^2
# cf <- (1 + ((p + 1)/(n - p)) + (1/(n - p - h)) )^2
  alpha <- 0.05
  tol <- max(10^-(p+5), 10^-12)
# -----------------------------------------------------------------
# **  Compute Mahalanobis distance
# -----------------------------------------------------------------
  C <- apply(X, 2, mean)
  S <- var(X)
  if (det(S) < tol) stop ()
  D <- mahalanobis(X, C, S)
  mah.out <- 0
  cv <- qchisq(1-(alpha/n), p)
  for (i in 1:n) if (D[i] >= cv) mah.out <- cbind(mah.out, i)
  mah.out <- mah.out[-1]
  mah <- sqrt(D)
  Xbar <- C
  Covariance <- S   #
# ----------------------------------------------------------------
#  **  Step 0
# ----------------------------------------------------------------
#  **  Compute Di(Cm, Sm)
  C <- apply(X, 2, median)
  C <- t(array(C, dim = c(n, p)))
  Y <- X - C
  S <- ((n - 1)^-1)*(t(Y) %*% Y)
  D <- mahalanobis(X, C[1, ], S)
  Z <- sort.list(D)
# ----------------------------------------------------------------
#  **  Compute Di(Cv, Sv)
  repeat {
    Y <- X[Z[1:h], ]
    C <- apply(Y, 2, mean)
    S <- var(Y)
    if (det(S) > tol) {
       D <- mahalanobis(X, C, S)
       Z <- sort.list(D); break }
    else h <- h + 1
    }
# ----------------------------------------------------------------
#  **  Step 1
# ----------------------------------------------------------------
  repeat {
    r <- r + 1
    if ( h < r) break
    Y <- X[Z[1:r],]
    C <- apply(Y, 2, mean)
    S <- var(Y)
    if (det(S) > tol) {
       D <- mahalanobis(X, C, S)
       Z <- sort.list(D) }
    }
# ----------------------------------------------------------------
#  **  Step 3
# ----------------------------------------------------------------
#  **  Compute Di(Cb, Sb)
  repeat {
    Y <- X[Z[1:h],]
    C <- apply(Y, 2, mean)
    S <- var(Y)
    if (det(S) > tol) {
       D <- mahalanobis(X, C, S)
       Z <- sort.list(D)
       if (D[Z[h + 1]] >= (cf*qchisq(1-(alpha/n), p))) {
            out <- Z[(h + 1) : n]
            break }
       else { h <- h + 1
              if (n <= h) break }
       }
    else { h <- h + 1
          if (n <= h) break }
    }
  D <- sqrt(D/cf)
  dst <- cbind(id, mah, D)
  Outliers <- out
  Cb <- C;
  Sb <- S
  Distances <- dst
  return(Xbar, Covariance, mah.out, Outliers, Cb, Sb, Distances)
  result
}
# ----------------------------------------------------------------
			

Back to top of page