## Research Activities

## Areas of Research Interests

I am interested in solving practical problems in statistics and related fields (e.g., applied probability, computer science, mathematics, actuarial science, and engineering). My publications include five Books, six book chapters, and more than 100 articles (see recent publications or download complete CV). The methods for outlier detection, the Influence measure and the Potential-Residual Plot have been implemented in several statistics packages (e.g., Data Desk, Stata, and SYSTAT). Areas of my research Interests include:

### Probability and Statistical Science:

- Regression analysis
- Regression diagnostics
- Multivariate analysis
- Robust statistics and outlier detection
- Statistical computing and graphics
- Parameter and quantile estimation
- Fatigue and lifetime data analysis
- Extreme Value Distributions
- Statistical analysis of employment discrimination data
- Probability

### Computer Science:

- Uncertainty in artificial intelligence
- Expert systems and probabilistic reasoning
- Neural and Functional network models
- Bayesian and Markov network models

### Mathematics:

### Finance and Actaurial Science:

- Hadi, A. S., El Naggar, A. A., and Abdel Bary, M. N. (2010), "Two Proposals for Enhancing Global Optimal Portfolios,"
*Proceedings of the 22nd Annual Conference on Statistics and Computer Modeling in Human and Social Sciences,*Cairo University, 28–53.

### Engineering:

### Interdisciplinary:

## Regression Analysis

- Hadi, A. S. (2011), "Ridge and Surrogate Ridge Regressions," in
*International Encyclopedia of Statistical Science,*(Miodrag Lovric, Ed.), New York: Springer, Part 18, 1232–1234. - Castillo, E., Castillo, C., Hadi, A. S., and Sarabia, J. M. (2009), "Combined Regression Models,"
*Computational Statistics,*24, 37–66. - Castillo, E., Hadi, A. S., and Minguez, R. (2009), "Diagnostics for Nonlinear Regression,"
*Journal of Statistical Computation and Simulation,*79, 1109–1128. - Castillo, E., Castillo, C., Hadi, A. S., and Minguez, R. (2008), "Duality and Local Sensitivity Analysis in Least Squares, Minimax, and Least Absolute Values,"
*Journal of Statistical Computations and Simulation,*78, 887–909. - Castillo, E., Castillo, C., Hadi, A. S., and Sarabia, J. M. (2008), "Local Sensitivity Analysis in Estimation Problems,"
*Journal of Computational and Graphical Statistics,*17, 703–725. - Chatterjee, S. and Hadi, A. S. (2006),
*Regression Analysis by Example, (4th Edition)*, New York: John Wiley and Sons. - Chatterjee, S. and Hadi, A. S. (1988),
*Sensitivity Analysis in Linear Regression*, New York: John Wiley and Sons.

## Robust Statistics and Outlier Detection

Although it is customary to assume that data are homogeneous, in fact they often contain outliers or subgroups. Scientists and philosophers have recognized for at least 380 years that real data are not homogeneous and that the identification of outliers is an important step in the progress of scientific understanding. Methods that deal with robust estimation and outlier detection are presented in the following articles:

### Robust Regression Methods:

- Hadi, A. S., Imon, A. H. M. R., and Werner, M. (2009), "Detection of Outliers,"
*The Wiley Interdisciplinary Reviews: Computational Statistics,*1, 57–70. Available online at http://www3.interscience.wiley.com/journal/122511038/issue - Billor, N., Chatterjee, S., and Hadi, A. S., (2006), "A Re-Weighted Least Squares Method for Robust Regression Estimation,"
*American Journal of Mathematical and Management Sciences*, 26, 229–252. (Click here to download executable program.)

- Kondylis, A. and Hadi, A. S. (2006), "Derived Components Regression Using the BACON Algorithm,"
*Computational Statistics and Data Analysis*, 51, 556–569.

- Castillo, E., Hadi, A. S., Lacruz, B., and Sarabia, J. M. (2001), "Regression Diagnostics for the Least Absolute Value and the Minimax Methods,"
*Communications in Statistics: Theory and Methods*, 30, 6, 1197–1225. - Hadi, A. S. and Luceño, A. (1997), "Maximum Trimmed Likelihood Estimators: A Unified Approach, Examples, and Algorithms,"
*Journal of Computational Statistics & Data Analysis*, 25, 251–272. - Hadi, A. S. and Simonoff, J. S. (1994), "Improving the Estimation and outlier Identification Properties of the Least Median of Squares and Minimum Volume Ellipsoid Estimators,"
*Parisankhyan Samikkha*, 1, 61–70.

### Detection of Outliers in Large Data Sets:

- Imon, R. and Hadi, A. S., (2008), "Identification of Multiple Outliers in Logistic Regression,"
*Communications in Statistics – Theory and Methods,*37, 1697–1709. - Billor, N., Hadi, A. S. and Velleman , P. F. (2000), "BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators,"
*Computational Statistics & Data Analysis*, 34, 279–298. Contact me for a copy of the paper and computer programs

### Detection of Outliers in Multivariate Data:

- Hadi, A. S. and Nyquist, H. (1999), "Frechet Distance as a Tool for Diagnosing Elliptically Symmetric Multivariate Data,"
*Linear Algebra and Its Applications*, 289, 183–201. - Hadi, A. S. (1994), "A Modification of a Method for the Detection of Outliers in Multivariate Samples,"
*Journal of the Royal Statistical Society, Series*(B), 56, 393–396. This method has been implemented in Stata (hadimvo), and in SYSTAT. Also, click here to copy an S-PLUS code. - Gould, W. and Hadi, A. S. (1993), "Identifying Multivariate Outliers," Stata Technical Bulletin, 11, 2–5. (An implementation in Stata of the method in the above paper).

- Hadi, A. S. (1992), "Identifying Multiple Outliers in Multivariate Data,"
*Journal of the Royal Statistical Society, Series*(B), 54, 761–771.

### Detection of Outliers in Regression Data:

- Billor, N., Chatterjee, S., and Hadi, A. S., (2007), "A Re-Weighted Least Squares Method for Robust Regression Estimation,"
*American Journal of Mathematical and Management Sciences*, 26, 229–252. (Click here to download executable program.) - Hadi, A. S. and Simonoff , J. S. (1997), "A More Robust Outlier Identifier for Regression Data,"
*Bulletin of the International Statistical Institute*, 281–282. Contact me or Jeff Simonoff for computer programs. - Hadi, A. S. and Simonoff, J. S. (1993), "Procedures for the Identification of Multiple Outliers in Linear Models,"
*Journal of the American Statistical Association*, 88, 1264–1272. (Code available) - Hadi, A. S. (1992), "A New Measure of Overall Potential Influence in Linear Regression,"
*Computational Statistics & Data Analysis*, 14, 1–27. The proposed influence measure has been implemented in Data Desk.

### Graphical Methods for the Detection of Outliers:

- Moustafa, R. E., and Hadi, A. S. (2006), "Fast and Efficient Graphs for Exploring Massive, Hyperdimensional Data,"
*Computing Science and Statistics,*38. - Moustafa, R. E., Hadi A., S., (2006), "Visualizing Hyper-Dimensional Data with the L1-L2 Plot,"
*Proceedings of the Hawaii International Conference on Statistics, Mathematics and Related Fields,*1282–1291. - Billor, N., Chatterjee, S., and Hadi, A. S., (2004), "A Re-Weighted Least Squares Method for Robust Regression Estimation,"
*American Journal of Mathematical and Management Sciences*, 26, 229–252. Contact me for computer programs. - Dodge, Y. and Hadi, A. S. (1999), "Simple Graphs and Bounds for the Elements of the Hat Matrix,"
*Journal of Applied Statistics,*26, 817–823. - Hadi, A. S. (1992), "A New Measure of Overall Potential Influence in Linear Regression,"
*Computational Statistics & Data Analysis*, 14, 1–27. The proposed Potential-Residual Plot has been implemented in Data Desk. - Hadi, A. S. (1993), "Graphical Methods for Linear Models," Chapter 23 in
*Handbook of Statistics: Computational Statistics*, (C. R. Rao, Ed.), Vol. 9, North-Holland Publishing Company, 775–802. - Hadi, A. S. (1990), "Two Graphical Displays for the Detection of Potentially Influential Subsets in Regression,"
*Journal of Applied Statistics*, 17, 313–327.

## Parameter and Quantile Estimation

- Amin, Z. and Hadi, A. S., (2009), "Fitting the Exponentiated Weibull Distribution to Failure Time Data,"
*Proceedings of the Tenth Islamic Countries Conference on Statistical Sciences ICCS-X,*Volume I, 424–250. - Castillo, E., Castillo, C., and Hadi, A. S. (2008), "Sensitivity Analysis in Ordered Parameters Models,"
*Journal of Statistical Planning and Inference*, 138, 1556–1576.

- Castillo, E., Hadi, A. S., Lacruz, B., and Sarabia, J. M. (2001), "Constrained Mixture Distributions,"
*Metrika*, 55, 247–269. - Castillo, E., Hadi, A. S., and Sarabia, J. M. (1998), "A Method for Estimating Lorenz Curves,"
*Communications in Statistics, Theory and Methods*, 27, 2037–2063. - Castillo, E. and Hadi, A. S. (1997), "Fitting the Generalized Pareto Distribution to Data,"
*Journal of the American Statistical Association*, 92, 1609–1620. - Hadi, A. S. and Luceño, A. (1997), "Maximum Trimmed Likelihood Estimators: A Unified Approach, Examples, and Algorithms,"
*Journal of Computational Statistics & Data Analysis*, 25, 251–272. - Castillo, E., Hadi, A. S., and Sarabia, J. M. (1997), "Fitting Continuous Bivariate Distributions to Data,"
*The Statistician*, 46, 355–369. - Castillo, E. and Hadi, A. S. (1995), "A Method for Estimating Parameters and Quantiles of Continuous Distributions of Random Variables,"
*Computational Statistics & Data Analysis*, 20, 421–439. - Castillo, E. and Hadi, A. S. (1994), "Parameter and Quantile Estimation for the Generalized Extreme-Value Distribution,"
*Environmetrics*, 5, 417–432. - Castillo, E. and Hadi, A. S. (1994), "Parameters and Quantiles Estimation for Continuously Distributed Random Variables,"
*Proceedings of the Statistical Computing Section*, American Statistical Association, 284–289.

## Fatigue and Lifetime Data Analysis

- Castillo, E., Fernández-Canteli, A., Hadi, A. S., and Lopez-Aenlle, M. (2007), "A Fatigue Model with Local Sensitivity Analysis,"
*Fatigue and Fracture of Engineering Materials and Structures,*30: 149–168. - Castillo, E., Fernández-Canteli, A., and Hadi, A. S. (1999), "On Fitting a Fatigue Model to Data"
*International Journal of Fatigue,*21, 97–106. - Castillo, E. and Hadi, A. S. (1995), "Modeling Life-Time Data with Application to Fatigue Models,"
*Journal of the American Statistical Association*, 90, 1041–1054.

## Reliability of Engineering Structures

- Minguez, R., Conejo, A. J., and Hadi, A. S., (2007), "Non-Gaussian State Estimation in Power Systems,"
*Advances in Mathematical and Statistical Modeling,*(B. C. Arnold, N. Balakrishnan, J. M. Sarabia, and R. Minguez, eds.), Birkhauser Boston, Inc., 141–156. - Minguez, R., Castillo, E., and Hadi, A. S., (2005), "Solving the Inverse Reliability Problem Using Decomposition Techniques,"
*Structural Safety,*27, 1–23. - Minguez, R., Castillo, E. and Hadi, A. S. (2004), "Sensitivity Analysis in Reliability Based Optimization," in
*Proceedings of the Fourth International Conference on Mathematical Methods in Reliability,*Santa Fe, New Mexico, World Scientific Publishing, Series on Quality, Reliability and Engineering Statistics, June 21–25, 2004. Santa Fe, New Mexico.

## Extreme Value Distributions

- Castillo, E., Hadi, A. S., Balakrishnan, N., and Sarabia, J. M. (2005),
*Extreme Value and Related Models with Applications in Engineering and Science*, John Wiley & Sons. (Click here to visit the book's Web site.) - Castillo, E. and Hadi, A. S. (1997), "Fitting the Generalized Pareto Distribution to Data," Journal of the American Statistical Association, 92, 1609–1620.
- Castillo, E. and Hadi, A. S. (1994), "Parameter and Quantile Estimation for the Generalized Extreme-Value Distribution," Environmetrics, 5, 417–432.
- "Point Estimation of the Parameters of Two Families of Generalized Logistic Distributions," Manuscript in preparation.

## Matrix Alegebra

- Moustafa, R. E. and Hadi, A. S. (2010), "A Method for Regularization of Ill-Posed Systems,"
*Proceedings of the First International Conference on Mathematics and Statistics, American University of Sharja,*UAE, 100383, AUS-ICMS’10-1–5. - Hadi, A. S. (1996),
*Matrix Algebra as a Tool*, Belmont, CA: Duxbury Press. ISBN: 0-534-23712-6.

## Perturbed Eigenvalue Problem

- "Modified Matrix Eigenvalue Problem for Real Symmetric Matrices With Applications in Statistics," Manuscript under review.
- Hadi, A. S. and Wells, M. T. (1990), "Assessing the Effects of Multiple Rows on the Condition Number of a Matrix,"
*Journal of the American Statistical Association*, 85, 786–792. - Hadi, A. S. (1988), "Diagnosing Collinearity-Influential Observations,"
*Computational Statistics & Data Analysis*, 7, 143–159. - Hadi, A. S. and Velleman, P. F. (1987), "Diagnosing Near Collinearities in Least Squares Regression," A Discussion of "Collinearity and Least Squares Regression", by G. W. Stewart,
*Statistical Science*, 2, 93–98. - Hadi, A. S. (1987), "The Influence of a Single Row on the Eigenstructure of a Matrix,"
*Proceedings of the Statistical Computing Section, American Statistical Association*, 85–90.

## Generalized Inverses

- Hadi, A. S. and Wells, M. T. (1991), "Minimum Distance Method of Estimation and Testing When Statistics Have Limiting Singular Multivariate Normal Distribution,"
*Sankhya*, Vol. 53, Series B, Part 2, 257–267. - Hadi, A. S. and Wells, M. T. (1990), "A Note on Generalized Wald's Test,"
*Metrika*, 37, 309–315.

## Statistical Analysis of Employment Discrimination Data

- Hadi, A. S. and Jersky, B. (1990), "How Fair Can Employers be?,"
*Communications in Statistics: Theory and Methods*, A19, 12, 4545–558.

## Probability

- Castillo, E., and Hadi, A. S.. (2000), "Some Probability Concepts for Engineers," in
*Handbook of Industrial Automations,*(E. L. Hall and R. L. Shell, Eds.), New York: Marcel Dekker, 1–32.

## •Expert systems and probabilistic reasoning

- Hadi, A. S. (2011), "Expert Systems," in
*International Encyclopedia of Statistical Science,*(Miodrag Lovric, Ed.), New York: Springer, Part 5, 480–482. - Castillo, E., Gutiérrez, J. M., and Hadi, A. S. (1997),
*Expert Systems and Probabilistic Network Models*, New York: Springer-Verlag. Translated to Spanish and published by the Spanish Academy of Engineers. ISBN: 0-387-94858-9

## Neural and Functional Networks

- Castillo, E. and Hadi, A. S. (2006), "Functional Networks," in
*Encyclopedia of Statistical Sciences*, (Samuel Kotz, N. Balakrishnan, Campbell B. Read and Brani Vidakovic, eds.), 4, 2573–2583.

- El-Sebakhy, E. A.; Hadi, A. S.; Faisal, K. A. (2007), "Iterative Least Squares Functional Networks Classifier,"
*IEEE Transactions on Neural Networks,*18, 844–850. - Castillo, E. and Hadi, A. S. (2004), "Functional Networks," in
*Encyclopedia of Statistical Sciences*, (N. Balakrishnan, C. Read, S. Kotz, and B. Vidakovic, eds.), 4, 2573–2583. - Castillo, E., Gutiérrez, J. M., Hadi, A. S., and Lacruz, B., "Some Applications of Functional Networks in Statistics and Engineering,"
*Technometrics*, 43, 10–24.

This paper received the " 2001 Technometrics Invited Paper Award" and was presented as such at the Joint Annual Meetings of five statistical societies in Atlanta, GA (August 7, 2001). - Castillo, E., Hadi, A., and Lacruz, B. (2001), "Optimal Transformations in Multiple Linear Regression Using Functional Networks," Proceedings of the International Work-Conference on Artificial and natural Neural Networks. IWANN 2001, in
*Lecture Notes in Computer Science 2084, Part I*, 316-324. - Castillo, E., Cobo, A., Gómez Nesterkín, and Hadi, A. S.(1999), "A General Framework for Functional Networks,"
*Networks*, 35, 70–82.

## Bayesian and Markov Networks

- Hadi, A. S. and Sakr, A. (2009), "A Boosted Approach for Bayesian Network Structure Learning,"
*Proceedings of the 21st Annual Conference on Statistics and Computer Modeling in Human and Social Sciences,*Cairo University, 138–157. - Castillo, E. and Hadi, A. S. (2006), "Bayesian Networks," in
*Encyclopedia of Statistical Sciences*, (Samuel Kotz, N. Balakrishnan, Campbell B. Read and Brani Vidakovic, eds.), 1, 425–435.

- Castillo, E. and Hadi, A. S. (2006), "Markov Networks," in
*Encyclopedia of Statistical Sciences*, (Samuel Kotz, N. Balakrishnan, Campbell B. Read and Brani Vidakovic, eds.), 7, 4535–4546.

## Data Mining and Visualization

- Moustafa, R. E., Hadi, A. S., and Symanzik, J. (2010), "Multi-Class Data Exploration Using Space Transformed Visualization Plots,"
*Journal of Computational and Graphical Statistics,*(in press). - Moustafa, R. E. and Hadi, A. S. (2009), "Grand Tour and Andrews Plot,"
*The Wiley Interdisciplinary Reviews: Computational Statistics,*1, 245–250. - Moustafa, R. E., and Hadi, A. S. (2006), "Fast and Efficient Graphs for Exploring Massive, Hyperdimensional Data,"
*Computing Science and Statistics,*38. - Moustafa, R. E., Hadi A., S., (2006), "Visualizing Hyper-Dimensional Data with the L1-L2 Plot,"
*Proceedings of the Hawaii International Conference on Statistics, Mathematics and Related Fields,*1282–1291. - Billor, N., Hadi, A. S. and Velleman , P. F. (2000), "BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators,"
*Computational Statist & Data Analysis*, 34, 279–298. Contact me for a copy of the paper and computer programs

## Software Available

- To see which
**statistical computer codes**are available, click here and examine the individual articles in list of publications. - To download software related to the
**expert systems and artificial intelligence**areas, visit the AI Research Group Site. - Below is an SPLUS code for the detection of outliers in multivariate data.

## S- PLUS Code:

```
```function(X) {
# -----------------------------------------------------------------
# Hadi, Ali S. (1994), "A Modification of a Method for the
# Detection of Outliers in Multivariate Samples," Journal of the
# Royal Statistical Society (B), 2, 393-396.
# -----------------------------------------------------------------
n <- dim(X) [1]
p <- dim(X) [2]
h <- trunc((n + p + 1)/2) id <- 1:n
r <- p
out <- 0
cf <- (1 + ((p + 1)/(n - p)) + (2/(n - 1 - (3*p))) )^2
# cf <- (1 + ((p + 1)/(n - p)) + (1/(n - p - h)) )^2
alpha <- 0.05
tol <- max(10^-(p+5), 10^-12)
# -----------------------------------------------------------------
# ** Compute Mahalanobis distance
# -----------------------------------------------------------------
C <- apply(X, 2, mean)
S <- var(X)
if (det(S) < tol) stop ()
D <- mahalanobis(X, C, S)
mah.out <- 0
cv <- qchisq(1-(alpha/n), p)
for (i in 1:n) if (D[i] >= cv) mah.out <- cbind(mah.out, i)
mah.out <- mah.out[-1]
mah <- sqrt(D)
Xbar <- C
Covariance <- S #
# ----------------------------------------------------------------
# ** Step 0
# ----------------------------------------------------------------
# ** Compute Di(Cm, Sm)
C <- apply(X, 2, median)
C <- t(array(C, dim = c(n, p)))
Y <- X - C
S <- ((n - 1)^-1)*(t(Y) %*% Y)
D <- mahalanobis(X, C[1, ], S)
Z <- sort.list(D)
# ----------------------------------------------------------------
# ** Compute Di(Cv, Sv)
repeat {
Y <- X[Z[1:h], ]
C <- apply(Y, 2, mean)
S <- var(Y)
if (det(S) > tol) {
D <- mahalanobis(X, C, S)
Z <- sort.list(D); break }
else h <- h + 1
}
# ----------------------------------------------------------------
# ** Step 1
# ----------------------------------------------------------------
repeat {
r <- r + 1
if ( h < r) break
Y <- X[Z[1:r],]
C <- apply(Y, 2, mean)
S <- var(Y)
if (det(S) > tol) {
D <- mahalanobis(X, C, S)
Z <- sort.list(D) }
}
# ----------------------------------------------------------------
# ** Step 3
# ----------------------------------------------------------------
# ** Compute Di(Cb, Sb)
repeat {
Y <- X[Z[1:h],]
C <- apply(Y, 2, mean)
S <- var(Y)
if (det(S) > tol) {
D <- mahalanobis(X, C, S)
Z <- sort.list(D)
if (D[Z[h + 1]] >= (cf*qchisq(1-(alpha/n), p))) {
out <- Z[(h + 1) : n]
break }
else { h <- h + 1
if (n <= h) break }
}
else { h <- h + 1
if (n <= h) break }
}
D <- sqrt(D/cf)
dst <- cbind(id, mah, D)
Outliers <- out
Cb <- C;
Sb <- S
Distances <- dst
return(Xbar, Covariance, mah.out, Outliers, Cb, Sb, Distances)
result
}
# ----------------------------------------------------------------