Rolomatic
Ex Member
|
Introduction
This study was initiated due to the lack of data concerning the accuracy of weather forecasts. The purpose of the study was to determine how accurate weather forecasts really are, and how accuracy differs between weather forecast providers. The author also personally wanted to know who provided the most accurate forecasts. In speaking with weather forecast providers and curious individuals, it became apparent that there has been no comprehensive, on-going, large-scale accuracy analysis of Internet weather forecast providers. This study aims to be the largest on-going accuracy analysis of Internet weather forecast providers. This report details the results of the first six-months of this study.
Comparison Criteria Selection
High temperature accuracy was chosen as the basis for this study because of its numerical precision and consistent definition between providers. The only aspects of a general forecast that are expressed as numbers, which are easy for computers to work with, are the high and low temperature, precipitation probability, and precipitation amount. All other predictions, from cloud cover to wind levels, are usually expressed in human terms.
Precipitation probability and amount were removed from consideration as not all internet weather forecast providers supply that information in their summary forecasts as a numeric value. Low temperature was removed after careful consideration because it was found after the study began that different providers have different meanings for low temperature. Some providers consider a low for a given day to be the low that occurred that morning and previous night. Others consider the low for a given day to the overnight low for the coming evening. Finally, some National Weather Service climatological summary reports document the low that occurred in the 24-hour period from midnight to midnight. For these reasons, the high temperature forecast was determined to be the most accurate, unambiguous measure of a forecast’s accuracy.
Data Collection Methodology
Each night starting at 10 p.m. Eastern from January through June, forecasts were collected from the six major internet weather forecast providers that met all selection criteria. Intellicast and MyForecast use the station call sign as the forecast identifier. For the others, the zip code from the table above was used to retrieve the forecast. The next day, two-day-out, and three-day-out forecasts were collected. The daily climatological summary from the National Weather Service Climate Prediction Center’s Climate Operations Branch was used as the official observational record of each day.
Table 2 lists the number of forecasts retrieved for each provider from the period of January 1, 2003, to June 30, 2003. The table also lists the number of forecasts compared to actual observations. There were 181 days in the time period. There were 20 cities in the study, each with a 1-, 2-, and 3-day-out forecast for each day in the study. Therefore the maximum number of forecasts that could be retrieved for each internet weather forecast provider is 181 days times 3 forecasts per day, times 20 cities, or 10,860 individual forecasts.
There were a total of 181 days times 20 cities, or 3,620 observation records possible. Of that, 3,414 observations were actually collected or 94.31% of the total possible. Possible reasons why a forecast or observation could not be retrieved were network problems, site unavailability, or other technical problems. Unisys in particular was challenging as not all forecasts were filled in with the high and low temperatures. Even with all those challenges, on average over 85% of possible forecasts were collected and scored against actual observations, a total of 55,754 individual forecasts. This makes the study the largest accuracy study of internet weather forecast providers ever.
Comparison Methodology
The high temperature forecast error was calculated by subtracting the high temperature forecast from the observed high temperature, and squaring the result. This was done for the one-, two-, and three-day-out forecasts. Each calculated error was then averaged to derive a single error for each provider. This result is called the root-mean-squared error, or RMS error. RMS error was used because it is an indicator of standard deviation as well as difference. With this approach, wide variation is penalized more than consistency with a few large errors. This approach has its root in the customer experience: a customer of a forecast would rather see forecasts that were mostly right most of the time than right-on sometimes and dead-wrong others. With RMS error, a lower number indicates greater forecast accuracy.
Additionally, the number of high temperature forecasts that had an absolute error (the absolute value of the difference between the forecast and the observation) within three degrees was calculated. This number was then divided by the total number of forecasts for the provider to derive the percentage of forecasts within three degrees of actual. This is a measure of the number of forecasts that could be considered “correct”. This measure was included as a measure of correctness, as the “three-degree guarantee” is becoming popular amongst television meteorologists.
|