Linear algebra algorithms can be written in terms of standard matrix-vector operations. This operations could be optimized for a particular hardware and thus one can increase performance by using the optimized BLAS libraries. In practice this means that it is must to use an optimized BLAS in your work. A nice introduction to BLAS is in Wikipedia
A quick overview of available functions is available in the LAPACK User Guide
On Netlib there is the reference BLAS implementation
but it is slow. Its goal just to demonstrate BLAS functions. This could be a quick solution for the start when you do not have an optimized BLAS library yet or in the case when you have problems with it. The BLAS interface was originally developed in Fortran but the C interface is also available
Usually you do not need this code as well, as it is already included in the optimized BLAS library.
Optimized BLAS libraries
I have been working for quite awhile with ATLAS 3.6
It is free but you have to compile it. This could be a good exercise to test your skills in software engineering.
Currently I am using Intel MKL
It is good but it is a commercial product and costs some money.
I have also once tried AMD AMCL
With difference to Intel MKL, it is free. Well, you need to sign up to download it and if you want to distribute it with your code, you have to fill the license agreement.
Another popular optimized BLAS library is Goto BLAS