## Tuesday, March 31, 2009

### Learning to Live with Virtualization

Comment at ArsTechnica:
"One of the big things that I've learned and that's been recently reinforced for me, is that you need a whole mindset change about how you build servers, how you evaluate what goes into your standard builds, how you monitor...

Think, for example, about a daemon that wakes up once an hour to check on something (it doesn't matter what...). Once an hour's not much, right? Except you have 100 guests, so that's once every 36 seconds, on average. Still sound like a lightweight process? Still need to be in your standard build? Likewise a daemon that eats 100 MB of RAM, or blindly installing packages because you "might need to use gcc some time". Yeah, maybe you might. Is it worth the storage you just ate?"

Couldn't agree more; especially the part about different monitoring requirements for VMMs.

## Monday, March 30, 2009

### Conficker Worm Defense for Enterprises

A blog-post over at ZDNet describes how enterprise data-ops can use network scanners to detect and disarm the Conficker Worm vulnerability on Microsoft Windows platforms, before it potentially wakes up on April 1st and begins to disable your anti-virus software.

One wonders whether the ZDNet writer was suffering from subliminal puritanism. The name of the worm is Conficker, not ConfLicker (sic). Presumably, the name is derived from the conjunction of Con, as in confidence trick (to enable it) and ficken, the German F-bomb (when it disables you).

Confession: Aye, 'twas I who corrected the ZDNet scribe. :-)

## Sunday, March 29, 2009

### More Guerrilla Boot Camp Classes in 2009

## Tuesday, March 24, 2009

### Slacker DBs in the Cloud Base

In my view, another reason Larry Ellison diss'd Cloud Computing last year (even though he promoted "Thin Clients" a decade ago, but completely overlooked the necessary infrastructure to support it: aka the cloud), is that he's afraid of how it might negatively impact sales of the ORACLE RDBMS. Why? Most of the world's data is not in relational form, and never will be. More importantly, Google knows this. (Think MapReduce)

One of the first people to see this coming was relational database academic, Joe Hellerstein at UCB. In his 2001 talk entitled "We Lose" (PDF slides), slide 5 contains the gist of his prescient observations:

• Grassroots use Filesystems, not DBs
• Grassroots use App servers, not ORDBs
• Grassroots write Java, PERL, Python, PHP, ... NOT SQL!

He defines Grassroots as: "Hackers. But also DBMS engineers, Berkeley grads, Physicists, etc."

Now, somewhere in between are Slacker Databases: "Amazon SimpleDB, Apache CouchDB, Google App Engine, and Persevere, offering far greater simplicity than SQL, may have a better way of storing data for Web apps." Hellerstein was more right than he could've known.

### Nine-Day NetBooks

I claim the "NetBook" (whatever the hell that really is) will turn out to be a 9-day wonder: less utility then a laptop, too big to put in your pocket. End of story. :-)

## Monday, March 23, 2009

### Sprint Looks Beyond Cellphones

Very interesting piece in the WSJ about Sprint's strategy to offset its inability to sign up enough new cellphone subscribers. The main idea is to sell wi-fi capacity to manufacturers of gadgets e.g, GPS devices, automobile dashboard computers, etc.

A largely unknown factoid, reported in this piece, is that Sprint handles wi-fi book downloads for the Kindle reader at Amazon.com.

From another perspective, it's interesting to note that the same capacity planning paradigms (e.g., queueing theory, scheduling algorithms, game theory) can be applied to both data networks and manufacturing systems. The umbrella term is operations research.

### Streaming Hadoop Data Into R Scripts

Along the lines of Mongo Measurement Requires Mongo Management, the HadoopStreaming package on CRAN provides utilities for applying R scripts to Hadoop streaming.

Hadoop has been deployed on Amazon's EC2. See our more recent ACM article, "Hadoop Superlinear Scalability: The Perpetual Motion of Parallel Performance" for a more detailed discussion about scalability issues.

### Higgs Slapping Starts Early

As I said in my A.A. Michelson Award acceptance speech, the search for the Higgs boson could turn out to be the 21st century null-experiment that supersedes the 19th century Michelson-Morley search for the aether. The big difference is in the amount of data that will be generated by the LHC, viz., 15 PB per year.

Since finding the Higgs in all those data will be like searching for the proverbial "needle," the pressure is on to justify the investment in the European machine (LHC-CMS for $10B) at CERN and the lack of investment by the U.S. Congress in the Texas Supercollider (SSC for$12B); much less than a bank bailout today. The proxy for the SSC is the aging machine at Fermilab. Because of the pressure to see something, I fully expect a lot of false positives to be reported and that will inevitably degenerate into arguments over confidence intervals for the data; just the kind of thing we discuss in the GDAT class next August.

However, I didn't expect things to really heat up until the LHC comes back online in the summer, after repairs to the collapsed superconducting magnets. In the meantime, however, the global economy has also collapsed and Fermilab is hurting for funds. So, while the LHC is down for the count, the Fermilab Dzero experiment is looking for the Higgs and getting in the news by setting some bounds on the energy ranges where the Higgs might live. Without getting into too much detail, the above diagram shows that the plausible range for the Higgs mass (mH) is 114 GeV < mH < 185 GeV (according to Fermilab). For reference, your analog TV set produces electrons that hit the screen with an energy of about 30 KeV. Mass and energy are directly related by Einstein's famous equation E = mc2, where c is the speed of light in vacuo.

This opportunistic move has set off a slapfest between some physicists at Fermilab and the CERN. If it's this ugly now, I don't know where it's going to go when those gaps close down to zero; apart from the obvious escape route that it's much heavier than 250 GeV.

## Sunday, March 22, 2009

### Twitts of the World, Unite!

The great thing about email is, you can ignore it. One of the things I can't stand about skype and IM is that, by design, they are very intrusive (or can be), to the point where I can't think straight. I'm slow, so I need a lot of uninterrupted time to think. Thus, I've held the same opinion, a fortiori, about Twitter. The very name has been its own aversion, for me.

## Friday, March 20, 2009

### Gmail 5 Second Retrieve Reprieve

I've been waiting for someone to come up with this (cache it first) implementation for email. I'm not sure 5 seconds is long enough.

## Thursday, March 19, 2009

### IBM Might Swallow the Sun

"Shares of Sun Microsystems, which makes the Java software that runs many Internet applications, were up 78.9 percent after reports that it was in talks to be acquired by I.B.M. Shares of Sun ended at $8.89. I.B.M. was down 1 percent, to$91.95."
I heard this rumor at the Portland CMG meeting yesterday. Apparently, Sun has been quietly "looking for a date" for some time. Presumably, IBM's main interest is in Java IP. Will Solaris replace AIX (under the covers)?

I had a long-standing theory that Sun Microsystems would be bought by Fujitsu Corp to simply milk Solaris service contracts for the next 10 years. It's not interesting innovation, but it is a business. Sun has always managed to have enough cash in the bank to be able to forestall such a move, but now, they're out of gas.

Update: Why an IBM purchase of Sun would make sense (cnet)

## Wednesday, March 11, 2009

### Treemap Visualization of Disk Volumes

GrandPerspective is a FOSS tool for Mac OS X that provides a treemap visualization of file layout on a disk. I created the treemap below from an 80 GB disk on my G4 towermac, which has both Mac OS X files (left) and WinXP files (right); the latter being a copy from the disk of my recently deceased Sony laptop). It certainly gives new meaning to the term disk blocks.

It's quite striking to see the greater number of larger aggregations of files on the Mac side vs. the many smaller files on the XP side. I guess that's why we don't need to do "defragging" on macs. :-)

## Monday, March 9, 2009

### i-Screen, u-Screen, Vee All Screen for Which Screen?

When I first came to the USA, it quickly became apparent that there was no such thing as, ice cream. You had to specify what flavor, what combination of flavors, what kind of cone, what you wanted on top of it, and so on. This is all enshrined in the song I scream, You scream, We all scream for Ice Cream. Coming from England, I was not used to dealing with such a wide spectrum of choices for such a simple thing as ice cream. And England had the worst ice cream I had ever tasted, made from hydrogenated vegetable oils; margarine, basically. But it only took a few "experiments" to catch on to the more complex American approach.

## Monday, March 2, 2009

### Michelson Comes Home to California

At the CMG conference in Las Vegas last December, I was presented with the A.A. Michelson Award. It actually consists of 2 pieces: a framed citation, which you can see (and hear President Cathy Nolan reading) in the video of the ceremony, and a wooden plaque with lots of brass bits on it; including a ruler for performance measurement. :-)

