Monday, April 25, 2011

Thanks, Budget Cuts, and an Inside Look


I would like to begin this blog with a heartfelt thanks to all of you that have contributed to the UW weather prediction research fund noted in the upper right of this blog and here. Over one-hundred of you have contributed both small and larger amounts and these funds (several thousand dollars so far) will help maintain our regional weather research and our real-time local weather prediction models running. Others contributed to the student scholarship fund, which is extraordinarily important as tuition zooms and financial pressure on our students increase.

It is really impressive that there is a community of Northwest residents who are so committed to keeping the collection of regional weather data, state-of-the-art weather models, and local forecasting research going in this period of cutbacks and retrenchment.

And the support is truly needed. This month for example, I learned that the funding I had gotten from the NWS for over a decade is ending due to their cutting of university funding in half. This was the money I was going to use to assimilate the new coastal radar into our local weather prediction efforts! Anyway, with your support I will work to keep our efforts going.

But since you are investors in our regional modeling, let me give you a little glimpse "behind the curtain" to see what you are supporting.

The local weather prediction computing facility is in the atmospheric sciences building on the UW campus. I have about two-hundred processors, most of them on dual quad-core servers (8 processors per board or node) and over 300 terabytes (trillion bytes) of disk storage. And to keep the thing running when lights flicker, there are uninterruptable power supplies (UPSs). Some of these processors can communicate through high-speed (40 gigabit per second) interconnects. All this stuff is packed into several clusters in very special racks (more on this later). Here is an example of one of the clusters:


This computer facility is one of the greenest around! The UW didn't have enough funds to give us a decent air conditioning system, so we came up with our own approach...we blow all the hot air outside and bring in cool air from the outside to make it up! To make this work we got fancy enclosures with BIG fans inside them. It pulls air across the processors and then out big ducts to the exterior of the building. Here are two pictures of our Rube Goldberg set up:
Ok, it is not pretty. And the UW AC guys didn't like the looks of it, but it really works! It is like all the heat from the computers doesn't exist. We have a big intake in one of the computer rooms that brings in the cool outside air. Only when the outside air is hot (a VERY, VERY short period here in Seattle!) do we need AC and the weak units the university gave us, plus an auxiliary unit we bought, does the trick. Why use energy to cool hot air when you can get rid of it! If we were really clever we would redirect the computer air to the heat ducts of the building and no external heat would be needed. Is anyone designing buildings to do this? They should.

You know the WRF model I am always showing on this blog? Generally we run this 0n 64 processors simultaneously, in other words the code is parallelized so that the problem can be split efficiently on many processors. And fast communication between the processors speeds this up immensely.

This computers not only run the forecast models, but they include web servers and other needs. This is not an inexpensive enterprise, and that is why your help is so valuable. Disks continuously fail, UPS batteries die, backup tapes are continuously needed, and we find that we have to replace the processors roughly every five years. Our system programmers, Dave Warren and Harry Edmon, are marvelous in keeping the system going and it rarely fails (at least 99% availability). And the department charges us per processor for support.

The amount of data moving through the system is amazing. Every day we bring in hundreds of gigabytes of weather data (NWS models, satellite and radar data, surface obs) and we acquire the data from over 70 local weather networks...all in real time. The models we run produce hundreds of gigabytes more a day. And the graphics you see on the web....we produce over 50,000 images a day! A tape and removal disk back system allows us to save key observations and data --again more expense.

And then there are the people... right now I have 3 staff members who spend much of their time developing improved weather prediction systems and maintaining this enterprise. And of course the students who are doing their theses on understanding weather systems and developing future technologies--like figuring how to get the maximum benefit from the new coastal radar.

This all started in 1995 with a single processor computer.

Again, thanks for all your help...cliff

19 comments:

Natalie said...

"If we were really clever we would redirect the computer air to the heat ducts of the building and no external heat would be needed. Is anyone designing buildings to do this?"

We're totally doing this! I'm the electrical side of things, but I've seen my HVAC colleagues use gadgets called Heat Recovery Units in a lot of buildings lately. From what I understand, it uses the warm outgoing air from a building to warm up the incoming air, reducing heating costs.

Brooks said...

"Uninterpretable" Power Supplies? What, like they speak a dead language?

Rick said...

It's really a shame that UW can't get that equipment into a decent data center, especially since those servers serve such a critical function.

Fortunately our mild climate is ideal for outside air cooling of computing equipment and it is a technique that is used widely throughout the region. It's just typically done without the octopus of cooling hoses.

Thank you for a look at the backend of thing, Prof. Mass.

kaezi said...

This is very cool. As manager of an operations department overseeing several hundred servers I can certainly appreciate this. I take it the somewhat jury rigged HVAC must change during the hotter weeks during the summer? In any case, thanks for the insight! I always enjoy rack photos, and more creative solutions to datacenter ops.

Eric Jain said...

Interesting! I like the "uninterpretable power supplies" ;-)

seawallrunner said...

uninterpretable power supplies (UPSs)

I think you mean uninterruptible power supplies :)

Cliff Mass Weather Blog said...

Thanks to those who read this blog more carefully than I do. This shows you the dangers of not proofing carefully after running a spell check! For the spell checker it was certainly uninterpretable!...cliff

Robert Okrie said...

Get blanking panels into your open rack spaces! ;-) Never ceases to amaze this old geek how much useful work is done in such small physical spaces today. Crunch on!

Nicky said...

Fun! Thanks for showing us the details.

Doug said...

Nothing wrong w/outside ventilation as long as you can handle potential contamination. I was once responsible for a 20KW FM transmitter plant in Pittsburgh, cooled using external air. At the time Pittsburgh's air was still pretty filthy w/stuff that was not only grubby but corrosive; it was imperative to use efficient filtration. Snow ingestion could be a problem as well; a trap with a change in direction helped with that.

Meanwhile, up in Bellingham at yet another station in my past, the AM plant reject heat was "officially" mixed w/inside HVAC for studios and offices during winter. No other heat input was required during the daytime, when the station's license permitted full power operation.

Thanks for the reminder on fundraising; finally remembered to ante up...

orv said...

Rick, there's a little internal politics going on here. The University is trying to consolidate research computing onto a central cluster run by UW Information Technologies, an independent department that provides networking services to campus and funds itself by charging fees for services to other departments. UW IT has their own data center. For that reason the administration has become extremely resistent to building new data centers elsewhere on campus.

David Cuthbert said...

Full disclosure: I work at Amazon, though not on EC2/AWS.

Have you investigated any of the utility computing (ahem... "cloud"... but I do dislike that term) services out there?

Unless you're relying on a custom interconnect (10 Gbps, Infiniband, etc.) to keep communication between the nodes saturated, your setup should be easily ported to something like EC2 or Rackspace. Both of these are just generic nodes you spin up (or down, as per your needs), installed with whatever OS and application image you want. This would free you from the business of replacing failed disks, broken fans, running wires, etc., and focus on research.

I'm less sure about the fit with software-platform services (Windows Azure, Google App Engine). From what I understand, those impose certain constraints on how you write your software (which may or may not be an issue).

(I would, of course, lean on a vendor to provide these services as a severely discounted academic research rate.)

Patrick said...

If we were really clever we would redirect the computer air to the heat ducts of the building and no external heat would be needed. Is anyone designing buildings to do this? They should.

Using the hot air from computers to heat the building used to be fairly common during the mainframe era. Even at UW, the Academic Computer Center was built with little or no heat source other than the computers within. When the Cyber was retired, the building heating system had to be renovated.

So, how hot can it be outside without overheating your computers? And what do you do when that happens?

Guy said...

The visual of your cooling systems was the laugh of my day. It was just too wonderously galumphy.

Rick said...

@orv No doubt about that, after seeing some of the "Data centers" around campus I can see why they're hesitant to build more inefficient spaces.

@Patrick Servers are actually pretty resilient. More so than manufacturers let on. MSFT ran a cabinet of servers outside under a small awning for a period of time and Intel has done some similar more controlled testing as well. You'd be surprised.

Patrick said...

Rick, they can tolerate abuse for a while. But if you're depending on donations to replace them when they fail prematurely, that's not how you should treat them. Or if you want to keep your 99% uptime.

Way back there was a DEC-20 mainframe. It was supposed to be air conditioned to 65 degrees, but one summer the A/C failed and it was about a week before spare parts could be obtained. The computer center would turn the computer on in the evening, leave the outside doors open, and let it run until it crashed. So it was determined experimentally that the DEC-20 crashes when the machine room air temp. reaches 82 degrees.

John said...

When the Boeing Computer Services Bellevue data center opened in 1982 (IIRC) all the office heat came from the computers in the data center. We were freezing initially because they hadn't fired up all the machines (which included a Cray 1) so no heat in the offices. So this is a great idea, but not a new one.

JewelyaZ said...

I'm glad the UW weather set-up has its own servers not "in the cloud" -- no matter how appropriate the resulting slogan might be.

I was severely unimpressed by Amazon's recent cloud downtime... Foursquare was hosed for almost three days. I thought the whole point of the cloud was AVOIDING that sort of downtime?

The City of Seattle uses a modified filtered-outside-air system for cooling its big server room. They have to use actual A/C for <10 days most years, so it does save substantial energy input and therefore money.

The employee who gave me my data center tour also showed me how the heat was recycled into the building heating system. Very cool.

I haven't donated yet because things are really tight for us... I'm currently unemployed... but it's top of my list as soon as I have a steady income again. I think the work Cliff is doing is vital.

Robert said...

Visiting and ADP site outside of Boston many years ago I found that they heated their building with waste heat from their banks of Digital Equipment PDP-10s. It was their only heat source, and this was bitter cold Boston. In winter, outside air was used to cool the processors.

And the 747 factory in Everett neither heats nor cools, but controls the temperature by opening and closing the giant doors at either end of the building.