<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Matrix Multiplication</title>
	<atom:link href="http://matrixprogramming.com/2008/01/matrixmultiply/feed" rel="self" type="application/rss+xml" />
	<link>http://matrixprogramming.com/2008/01/matrixmultiply</link>
	<description>Compiling numerical libraries</description>
	<lastBuildDate>Tue, 15 May 2012 02:26:33 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Evgenii Rudnyi</title>
		<link>http://matrixprogramming.com/2008/01/matrixmultiply/comment-page-1#comment-855</link>
		<dc:creator>Evgenii Rudnyi</dc:creator>
		<pubDate>Tue, 06 Dec 2011 22:10:03 +0000</pubDate>
		<guid isPermaLink="false">http://1.rudnyi.peterhost.ru/?p=312#comment-855</guid>
		<description>You are right that it would be good to check all possible optimization flags. I have limited myself in this study by -O3 only. 

In my view though, it is unlikely that by pure optimization at the compiler level one will achieve the performance of the optimized BLAS. Well, if you try it, please report your results here.</description>
		<content:encoded><![CDATA[<p>You are right that it would be good to check all possible optimization flags. I have limited myself in this study by -O3 only. </p>
<p>In my view though, it is unlikely that by pure optimization at the compiler level one will achieve the performance of the optimized BLAS. Well, if you try it, please report your results here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bit</title>
		<link>http://matrixprogramming.com/2008/01/matrixmultiply/comment-page-1#comment-854</link>
		<dc:creator>bit</dc:creator>
		<pubDate>Tue, 06 Dec 2011 21:28:54 +0000</pubDate>
		<guid isPermaLink="false">http://1.rudnyi.peterhost.ru/?p=312#comment-854</guid>
		<description>I potentially see one flaw here that is not addressed. When you compiled your programs did you enable auto-vectorization and auto-parallelization? I noticed in your compiler flags you did not include any cpu optimizations or anythign like that. I know from experience that enabling sse2 optimizations and autovectorization of these kinds of loops decreases compiler time significantly. The python modules would have certainly compiled with this in mind and in order to do a fair comparison you would need to ensure that you enable these types of optimizations. If you do this I think you will find that they all run about the same speed. 

If I can find it I have a list of compiler flags to do this and then the test could be reapproached with them</description>
		<content:encoded><![CDATA[<p>I potentially see one flaw here that is not addressed. When you compiled your programs did you enable auto-vectorization and auto-parallelization? I noticed in your compiler flags you did not include any cpu optimizations or anythign like that. I know from experience that enabling sse2 optimizations and autovectorization of these kinds of loops decreases compiler time significantly. The python modules would have certainly compiled with this in mind and in order to do a fair comparison you would need to ensure that you enable these types of optimizations. If you do this I think you will find that they all run about the same speed. </p>
<p>If I can find it I have a list of compiler flags to do this and then the test could be reapproached with them</p>
]]></content:encoded>
	</item>
</channel>
</rss>

