Friday, May 2, 2008

Pay per VPU Application Development

So-called "Cloud Computing" (aka Cluster Computing, aka Grids, aka Utility Computing, etc.) is even making the morning news these days, where it is being presented as having a supercomputer with virtual processor units just a click away on your home PC. I don't know too many home PC users who need a supercomputer, but even if they did and it was readily available, how competitive would it be given the plummeting cost of multicores for PCs?

A more realistic case might be argued for application developers, and in fact, that is the pitch from Interactive Supercomputing who offer their Linux clusters for applications involving graphics rendering, statistical analysis, data mining, as well as the usual line up of scientific and engineering applications. In particular, they are set up to port applications written in Matlab, Python, R and Java.

"A common use for a cluster is data processing. For example, if your workstation takes 10 hours to process your data set, it might take just 1 hour to process the same set using 10 nodes on the TTI cluster. The data is partitioned into 10 smaller units, each cluster node processes one unit. Our service makes it easy to do this."

This sounds familiar. When I was at Xerox PARC, we had on site a 65,536 processor CM-2 from Thinking Machines Corp. (now defunct). I was doing some simulations for a new network protocol. The CM-2 was front-ended by a Sun workstation and the only language available at that time for writing programs on the CM-2 was *LISP (the TMC parallel version of LISP). After coding up my simulation in *LISP, I was very excited to see how it would blaze on the CM-2. I don't remember the exact times, but it was something on the order of 10 hours. Huh? I could probably have run it overnight on the SPARCstation in a similar timeframe. I immediately got the on-site Apps Engineer from TMC to take a look at my code. He suggested replacing the RNG with native CM-2 code (called PARIS). That helped but it didn't even improve the runtime by a factor of 10x. Still disappointed, I got him to take another look. Over several rounds of this, he converted significant chunks of my *LISP to PARIS. I don't remember the best runtime, because the whole experience was dominated by the pain of having learnt *LISP, and then undoing a lot of it to get performance anywhere near that expected for a parallel machine. I would've been just as happy to write my simulation in C and let it run for several nights on the SPARCstation. Thus, I vowed never again to be sucked in by the lure of parallel programming; a vow maintained to this day.

How about cost? Massive cycles and porting assistance were free to me at Xerox. Interactive Supercomputing offers tiered pricing, starting with $2.77 per-core-hour which falls to $1.35 for a monthly subscription. Some rough calculations, assuming 4 cores, show how those prices unfold between 2000 core-hours per month (their minimum) and 20,000 core-hours per month (their max).


In the second row, 3000 core-hrs/mth is roughly equivalent to 4 cores running full tilt 7 by 24.

How does this compare with DYI? Pricing for the Intel Core 2 Extreme (4 cores) QX9650 @ 3GHz lies in the range $1000-$1500 for either 65 nm or 45 nm parts. Integrated into a box, you can expect to pay somewhere in the range $5000 to $10,000 depending on disk and RAM. So, your own local hardware would pay for itself, at least once over, in less than a month. But that's just the (possibly optimistic) hardware side. You still have to port your software to make full use of that hardware and if you can do it yourself, then it's free. Otherwise, there could be a lot of pain involved (of the CM-2 type) and that's the real edge that Interactive Supercomputing is offering; for the moment.

No comments: