To test the performance for different implementations of libraries used for matrix operations the following R code was used
library(microbenchmark);
n = 100;
prodAxA = function(A) {
A%*%t(A)
}
mtime = rep(0,n);
for(i in 1:n) {
A = matrix(runif(i^2),nrow=i,ncol=i);
mtime[i] = median(microbenchmark(prodAxA(A))$time)
}
All libraries were O3 optimized for Core2 processor family, by rebuilding packages according to standard rules provided by package maintainer on GNU/Linux Debian operating system.
The performance of following matrix operation
was measured in nano-seconds 100 times and median time was recorded for from 1 to 100 matrix size.

According to results presented in the following figure, OpenBlas preformed best.
OpenBlas was 76% faster then original implementation and 36% faster then Atlas implementation.
| Size |
Blas(Original) Lapack(Original) |
Blas(Atlas) Lapack(Atlas) |
Blas(OpenBlas) Lapack(Atlas) |
| 100 |
1540240 |
542153 |
357314.5 |
The measurements were performed on Intel(R) Core(TM)2 Duo CPU T8300 @ 2.40GHz processor with performance governor.