January 28, 2013

Fixing the National Weather Service's Computer Gap

In previous blogs, I have documented the profound inadequacy of the computational resources used for operational numerical weather prediction by the National Weather Service (NWS) and the serious implications this deficiency has for the quality of weather forecasts in the U.S.   I have described how the world-leading European Center for Medium Range Weather Forecasting (ECMWF) now has more than ten times the computer power as the U.S. Environmental Modeling Center (EMC), how U.S. skill in global prediction is in second or third place, and how the lack of computer resources is crippling the NWS's ability to move forward in probabilistic prediction, the next major area of development.
(Reminder:  EMC, part of the NWS, is the operational weather prediction entity of the U.S.)

I have talked to many people about my blogs and assessment, including meteorologists, both inside and outside the NWS, and highly placed managers and administrators in the NWS:  there is essentially no disagreement that we have a serious problem in numerical weather prediction, and that lack of computer power is a major cause but not the only one.

The new NOAA Fairmont Computer Center hss far more capability than EMC's computer center

It is time to fix the NWS's operational computer deficiency and this blog will describe how it can be done within a year using funds that are already appropriated.  But it will take leadership and a willingness to do things a different way.  And an end to highly disfunctional relationships in NOAA and the NWS.  This is going to be a very frank assessment of the current situation and will get somewhat technical in places...so please forgive me or skip this blog if you find it tedious.

The Problem is Worse Than I Thought

When U.S. Senator Maria Cantwell learned about the lack of computer power for U.S.  numerical weather prediction at a luncheon I attended, she asked an important question of the head of the NWS:  how can this be when Congress has appropriated large amounts of funds for weather and climate computers?  He did not answer, but the answer is clear: nearly all of these resources have been unavailable for weather prediction--most are used for climate studies.

But the problem is deeper and more disturbing than that... other groups in NOAA are securing bigger computers than the national operational center, EMC.   And some of these groups are actively working to acquire computer resources for themselves rather than EMC.   A good case is the NOAA Earth System Research Lab (ESRL) in  NOAA's Office of Atmospheric Research.  This lab is tasked with doing research to support operational numerical weather prediction (NWP) in the NWS, even though they are not in the NWS.  As noted in an earlier blog, this is a crazy organizational situation, with those running operational NWP in the U.S. unable to control the research that supports them.  ESRL has been able to find funds for very large supercomputers (the "jet" machines) that far eclipse what EMC has to work with.  ESRL has established essentially operational capabilities and wants to expand it further (they called it Regular Research).  Amazingly, two high administrators in OAR/ESRL told me that I should not be working to secure big computers for EMC but rather should get it for THEM!  I was really taken aback by their attitude.  And recently the Hurricane Forecast Improvement Program (HFIP) received  a large computer resource (placed at ESRL) and HFIP is using them for operational global and hurricane-scale simulations.

NOAA ESRL in Boulder houses supercomputers more capable than those used by the U.S.'s main numerical weather prediction facility

So we have the nutty situation in which operational NWP is starved for computer resources, undermining progress in weather prediction, while climate studies have massive supercomputers available and NOAA fosters active competitors in its organization that are doing essentially operational weather prediction with far greater resources than EMC, the U.S. operational center.   This screams about poor leadership and management in NOAA.

The other problem is that the NWS is wasting a substantial amount of the limited computer power it does have today.  The graphic below shows how the NWS is using their current computer.  A lot of it does not make sense.  Time is on the X axis (entire day) and the Y axis is the number of nodes (a node is a collection of processors) used.  The various colors represent different models or simulations run on this computer.

The red color on the lower portion, the largest use of the computer, is for the Climate Forecast System, in which they run seasonal forecasts.  But they run these forecasts FOUR TIMES A DAY, which makes no sense.  Why run a seasonal simulation that often?  In contrast, running the global model (the GFS), shown by the dirty green color, only takes a small part of the computer.  Furthermore, they run the GFS out FOUR TIMES A DAY to 384 h--why do they do that?  Most other big centers only find it is useful to run out twice a day.  I could find no objective proof in the literature or elsewhere why such frequent runs could be useful.   I can go into more detail, but the bottom line is that the use of EMC's computer is inefficient and not well thought out.  A lot is done for legacy reasons.  A rational evaluation of cost and benefit would clearly change allocations substantially.   But even if they used the current small computers rationally they don't have enough to do what needs to be done.

Production Schedule for EMC's Computer

What do they really need?

      For EMC to serve the nation in a reasonable way, I believe they need the computational resources to do the following:

(1) Run a global ensemble system at 12-15 km resolution (currently they are at roughly 50 km). (Remember ensembles is when you run a model many times with different starting points and model physics, this allows one to get at the uncertainties in a forecast).  This ensemble needs to be running the best physics possible, unlike the inferior physics used in the current U.S. global ensemble system.
(2)  Run convection resolving high-resolution ensembles over the U.S. (1-4 km resolution).  Currently, the U.S. ensemble system is at 16 km resolution.  Many of the runs of the current use inferior physics to save computer time.
(3)  Run a rapid-update system (like ESRL's HRRR) at 3 km resolution.   Eventually, (2) an (3) should be combined.
(4)  Lowest priority but useful.  Run a global model at 2-4 km resolution.

Doubling resolution takes about 8 times the computer power.    My back of the envelope calculation is that the above is doable if EMC had 5-10 petaflops of computer power (well within the range of recently acquired machines by others).  The plan below will give it to EMC for operational use and maintain high reliability.

How to Fix the Problem Quickly

First, EMC needs to get their house in order and reduce the waste in their current schedule, which I estimate is roughly 25% of their current computer.

EMC will get an upgrade this summer of their two .07 petaflop machines (the vendor is  IBM, one operational and one backup) to .2 petaflops.  This is helpful, but not nearly enough.  Congress just passed the Hurricane Sandy relief bill for roughly 50 billion dollars.   Within this bill is 25 million for enhanced hurricane weather prediction and data assimilation and 50 million for hurricane research...money that is going to NOAA.  One thing we learned is that good global weather prediction is the key for hurricane forecasting--that is why the European Center Global Model was the best during Sandy. So you want to help hurricane forecasting?  USE ALL OF THE 25 MILLION TO UPGRADE EMC's COMPUTER RESOURCES.

The German weather service just purchased a 23 million dollar CRAY supercomputer that dwarfs what the U.S. NWS now uses.
Use the 25 million in Sandy money to acquire (EMC likes to lease) ONE big machine, a computer with 1-3 petaflops or more.  My discussions with several computer vendors suggests that the NWS might be surprised about how much they could get for 25 million.  Perhaps as high as 5-10 petaflops if they play their cards right.  I believe this machine could possess at least 99% reliability and folks in the NWS computer hierarchy agree.  (Hell...I have a cluster I use for weather forecasting that maintains such reliability and I do it on a shoestring, surely they can as well!).  The recently acquired NOAA Fairmont machine can serve as backup for the new EMC computer, as well as being available for development and research. 

Thus, the operational load can be split between the current IBM system, which will increase in size again in 2015 to roughly one petaflop, and the new system purchased with Sandy money.  Using these new resources wisely, the NWS operational can jump to world leadership capability in numerical weather prediction and radically improve the products it provides to U.S. users.

Additional Fixes

There is little doubt EMC could quickly take advantage of the increased computer resources (I have confirmed this by talking with their leadership).  However, as noted earlier, the problems in U.S. numerical weather prediction are deeper than lack of supercomputers (although fixing that deficiency would be a good start).  Management and leadership failures have abounded.  To address these problems, immediate attention should be given to the following items:

1. Establish a numerical weather prediction advisory board for EMC that provides recommendations from experts in the entire community.  A big part of the problem is that the National Weather Service folks have been too isolated from the rest of the meteorological community.  They serve the nation but have generally been unenthusiastic and getting guidance and advice from their users, the private sector, and the research community.  This has led NWS EMC to second/third tier status and must change.  For years, U.S. National Academy committees and others have recommended that EMC establish a representative advisory committee that would act as an active partner.  NWS management has pushed back on this and have done nothing.  Enough is enough....this advisory committee should be established immediately and should serve as a sounding board for deciding on which models are run, how they are run, the computer resources, needed and more.
Japan's weather supercomputer (peak .85 petaflops) is roughly ten times larger then U.S. EMC.
2.  Restructure NWP research and development in NOAA/NWS.  The current separation of  weather prediction research from operations has been a continuing disaster and must end.  NOAA leadership finally must deal with this mess.   Moving EMC into NOAA and combining with OAR/ESRL under one manager might work.  Or move ESRL folks into the NWS under EMC. 

3.  Establish a comprehensive verification program for U.S. models.  To improve weather forecasting models you must know their strengths and weaknesses.   The NWS/EMC model verification program is very weak and superficial.  If you want to see how bad things are, check their very poor model verification web pages.  Ask a simple question:  how well the model's verify over the NW?  Better over the mountains or lowlands?  Or how has forecast skill over California changed during the past few decades? You will be disappointed I guarantee you. A lot of the statistics are monthly, making it impossible to determine the trends in model skill.

NOAA money supports the Developmental Testbed Center, which I know quite a bit about (I have been chair of their Science Advisory Board).  The dream was that folks could provide new research innovations that would be tested in an operational-like environment for a wide range of cases.  If successful, they would go into operations.  Sounds good?  After nearly a decade and millions of dollars, this is a dream that never seems to happen.  The DTC should take the testbed role seriously.  Now.

Taiwan's weather bureau has a computer twice as fast as EMCs and has purchased one over 15 times as powerful.

4.  Support a model improvement research program.   The U.S. has the largest meteorological research community in the world, with universities like the U.S. doing cutting edge research on numerical weather prediction and related topics.  NOAA/NWS have failed to take advantage of this huge community, maintaining a miniscule extramural research program.   Any new research funds goes right into NOAA coffers.  This must change.  Let's start with the 50 million in Sandy research money and use most of it for extramural, university-based research. NWS/NOAA extramural weather model research should be targeted to the most acute needs of the National Weather Service modeling efforts.  Trust me, money speaks in the research community.

5.  Create a strategic plan with community input and do it.   Currently, there is NO comprehensive and detailed strategic plan by the National Weather Service on the improvement of numerical weather prediction.   This contrasts with foreign meteorological services (such as our neighbors, the Canadians), who have laid out detailed and aggressive roadmaps of their future direction.   You can't go far without a map.  The NWS needs one and the community should be at the table when it is constructed.

5.  Provide decent documentation of what U.S. modeling centers are doing.

  Want to figure out the details of the models run by the U.S. ?  Good luck.  It is pretty much impossible to do so by going to EMC or its parent NCEP's web sites.  Scanty, out-of-date material is all you will find.  Amusingly, what you WILL find is their response to "certain blogs."  You can't imagine whose.

Let me be blunt: the state of operational U.S. numerical weather prediction is an embarrassment to the nation and it does not have to be this way.  Taiwan, Germany, England, the European Center, Canada, and other nations have more computer power for their weather prediction services.  Our nation has had inferior numerical weather prediction for too long.  New computers are an obvious and relatively easy first step, because they make everything possible.  For the price of a single warplane we could have greatly improved weather prediction that would save lives and property.   Congress and the American public should not accept delays in action.  If this issue was placed before a real leader like President Lincoln, asking him when we should act, I can imagine what he would say (click on the arrow at the bottom of the  picture to find out):





16 comments:

  1. Interesting post Cliff. I don't know enough about it to weigh in. But one observation I'll venture is that it seems similar to some of the turf wars between intelligence agencies before 911 with the redundancy, poor communications, and insular culture.

    ReplyDelete
  2. Cliff, your leadership here is greatly appreciated by me and my family for many years now, and I'm sure, countless others as well. Maria Cantwell is one of the best elected officials we have, and I'm so glad you've got her ear! I want her to establish the numerical weather prediction advisory board you're speaking of, and gee, I think YOU should be its chair. NOW (as you say). On a more pedestrian note, there are a number of typos in this post.. you say "enthusiastic and getting" in one place where you mean "about" and you use "universities like the U.S. doing cutting edge" when I'm sure you meant UW. Whatever, but you might want to fix them. :-)
    Is the development of Probcast stalled out? Did the study you guys did about users of Probcast reveal anything useful? As a weather-info consumer, I believe that you're on to a good presentation format with Probcast, and I would love to see it strengthened and expanded.
    Thanks for all you do.

    ReplyDelete
  3. Suppose we were to grant that this fellow's science were sound, or just leave that question aside

    http://wattsupwiththat.com/2013/01/27/expert-predicts-monsoon-britain/

    In your opinion, would the direction he supposes the climate is going have similar implications for the PacNW? ARkStorms for CA?
    Have we too, had a period of less intense flooding since the 1960s?

    ReplyDelete
  4. I have no credentials other than being a federal public servant for 25 years. I learned that being a "voice of reason" may get you some changes, but you become a cultural outsider at the 'Senior Executive Level' unless you can show them how your proposal will benefit them personally in their career desires. I also learned that it is rare for any government manager to turn down money even if they know it could be used better elsewhere. I envision billions in unused equipment collecting dust on many shelves and racks throughout many federal agencies.
    As someone without any credentials here, other than 25 years of troubleshooting systems and processes, I can't imagine how we as a country can lead the world in Climate Change Science without cooperation between agencies.It seems that Climate Scientists would reach out to oceanographic scientists and atmospheric scientists in a spirit oh collegiality and cooperation, supporting each other rather than competing for the most dollars. How can a climate modeling program be tested and verified without checking with the weather guys? It seems that we should cooperatively work to prove that we can forecast in the short term, say a goal of 90% accuracy over a 30 day period before we can have any confidence in a 50-100 year prediction. I believe this issue is so important that you should bring regional people together first,; Say Senators, Congressional Reps, and representatives from Oceans, Atmosphere, and Long Term Modelers from the PAC NW and test if that group can cooperate and come to a workable plan then from that nucleus use what you have learned, especially politically and diplomatically, then take it national then global. Perhaps a smart and capable leader will arise from the soup to take this to the next step. Maybe that person is Congressman Reichert or Senator Cantwell or perhaps they will come from Oregon or Idaho. Without weather prediction being vastly better supported we will not be able to prepare for the effects of anthropogenic or naturally caused climate change. I do salute you, Professor Mass for your strongly voiced and consistent convictions and hope you can become a voice in national media to alert all Taxpayers to this ridiculous system and how easy it should be to sort out.. Charles Walsh

    ReplyDelete
  5. Is there anyone we can contact to support this effort?

    ReplyDelete
  6. Cliff, this is an excellent analysis!

    Judy Curry

    ReplyDelete
  7. This seems important! What can, "We the people" do about this??

    ReplyDelete
  8. Careful Cliff..Using Sandy Money. NJ's Governor will sent over some of the "boys" to adjust you!

    ReplyDelete
  9. Mr. Mass:

    As an IT professional, I found your breakdown and explanation of the issues facing the EWS to be cogent and well-reasoned. While reading your suggestions for how to address these issues, I noticed that you did not suggest any "strong" technology fixes.

    Would it make sense for the EWS to re-purpose the principles behind folding@home, LHC@home, or rosetta@home and deploy a crowdsourced weather modeling platform? Granted, you'd need approximately 250,000 machines running at 2.5GHz each to get the rough equivalent of a petaflop, but it'd possibly alleviate the resource constraint, if not eliminate it.

    ReplyDelete
  10. With what you've pointed above regarding the majority of the main funds appropriated available for computer-power being channeled into climate studies / - as opposed to more evenly distributed between weather and climate, it would appear leastwise, that this is most likely the result of a more naive perception having been adopted of what either whether will or may be needed, ultimately, toward dealing with what will or may occur, more climate related. — i.e. different main, more "extreme" / severe weather events.

    For myself, I'd say, that where working to "sell" the idea that both weather and climate study warrant a more substantial stream of funding, this main correlation should be pointed to more strongly. — i.e. the more "finite" element(s, of whatever more "climate" related assessment. And with more work being done then, to shift this focus, to the benefits of a system set up to generate better weather prediction, more specifically.

    Beyond this, and where considering the more basic problem of not (there "never" being) enough funding for important issues looked at more generally, .. as with Bill Gates' having adopted "World Hunger" as an if, more "philanthropic", endeavor and project, perhaps whomever is in charge these days, over at "Apple", might be called upon to consider this issue as one of interest, even noteworthy.

    ReplyDelete
  11. Well said!! Cliff

    I worked there for 7 years and exposed a security hole in their system and I was let go by the NCO - very stubborn and bureaucratic. A complete change of leadership there is needed. IMHO

    ReplyDelete
  12. Seems to me the next step to improve the situation is have an agency sponsor an NRC/NAS study of what should be done. Or get a good workshop, with 50-100 or so reputable weather folks, emergency managers, and computer people, convened to write a white paper.

    Posting here will only go so far - no one is likely to act without some sort of community consensus.

    ReplyDelete

  13. We have been developing a crisis in weather forecasting for well more than a decade. Back as far as 1995, we could see ECMWF forecasts emerging as higher quality and more robust than U.S. products. This emerging gap was attributed to budget, visitors programs, lack of operational mission, computers, etc. However, a fundamental distinction is that ECMWF is a mission-focused organization that has integrated research and operations together with organizational attention to science-based, validated products. They focus on absorbing research and data, much of which is from the U.S., into their system for the benefit of their products. They can focus their in-house research on the glue that is needed to produce excellent, science-based products. They have ownership of their budget and can direct that budget to accomplish their mission. They have an internalized incentive structure that focuses their people’s efforts on their products. They know to invest in software and to spend on computers. They know how to satisfy their customers. They have benefitted from stability, strong management, and focused leadership.

    Based on the success of ECMWF, many advisory panels have offered advice to U.S. weather-science agencies: buy computers, start visitors programs, engage the community, get more advise from excellent scientists, etc. It used to be stated that a goal at NASA and NOAA should be to “stop the fly-over-Washington syndrome” (to Reading). These well-intentioned recommendations not only are patches, but any incremental benefit in the short-term comes at increased long-term costs of ignoring the underlying, systemic causes of the problems. So a fundamental need is to enable the managers of U.S. modeling and data activities to work coherently, with focus, with stability, with balanced investments and expenditures to develop validated, science-based products. We have been bootstrapping solutions for more than a decade – an unsustainable approach. We must cease to perpetuate the destructive and unnecessary divide between “research” and “operations” - a divide that is maintained by our funding models and many advisory panels. We must find a way to reduce the divisive and destructive competition that places agencies, laboratories in agencies, and divisions in laboratories in hostile relationships – hostility that is perpetuated because it has emerged as the strategy that works.

    Here is a concrete example. I was responsible for computational resources to support routine delivery of forecast-assimilation products to support NASA’s aircraft and space missions. Vendors for the computing hardware companies worked the political environment in such a way as to, de facto, earmark dollars that would provide to my organization specific pieces of hardware – let’s say a supercomputer. If this happened, then dollars would be removed from my budget. I would, therefore, have a supercomputer, with no budget to run it correctly, and perhaps that was incompatible with the software and data subsystem of my forecast-assimilation system. As a manager I found this situation untenable. As a manager, I knew the balance of expenditures and investments that I needed to make to provide a science-validated product. However, it was difficult to maintain this balance in the presence of a whole portfolio of external interests that could often, through budget and influence, reach deep into my organization. Progress on this requires us to be cognizant of the entire portfolio of the expenditures, investments, and relationships that are needed to provide robust weather and environmental products. This is not simply a matter of advocating for more money. More money can, in fact, make things worse.

    ReplyDelete
  14. Technical comment: it makes no sense to run a "seasonal forecast" 4 times a day. However, it may make sense to schedule 4 runs per day of the large number of ensembles (hindcasts and forecasts) that are required to generate a seasonal forecast. This may be what the schedule is.

    In terms of efficiency of computer usage, combining research and NWP, and in particular combining climate and NWP has been shown to be a good move at a number of operational centres. Additionally, the use of supercomputer resources for operational purposes meshes nicely with research jobs or chunks of longer climate jobs fitting between the gaps of the operational schedule.

    ReplyDelete
  15. I'm only a hobbyist but I am fascinated by the science of numerical prediction. This post explains a lot of what I've observed even from a layman's perspective and it makes me sad. Seems like a lot of dedicated people are being stymied by a terrible system of waste and mismanagement at the highest levels.

    I hope the new director can get this straightened out.

    Great blog by the way!

    ReplyDelete
  16. Cliff,

    Love you blog. I got to thinking a bit as I sit here in Western Kansas 10 PM, wondering if the thunderstorms approaching from the West over Colorado and Nebraska are going to strengthen or weaken. Frankly I have no idea, and most of the time the Models have no idea either. From a completely selfish standpoint of my entertainment value, getting up every 1/2 hour through the night to look at the radar images, would be ruined by a more accurate model.


    Obviously the economic benefits for better short term forecasting should be driving more funds now, but for tonight I don't want anybody to spoil the end of the movie.

    ReplyDelete

Please make sure your comments are civil. Name calling and personal attacks are not appropriate.