2.6 Summary of comparisons

One goal of this assessment is to determine whether measurements from a variety of platforms and instrumental techniques are comparable. It can be seen from the discussion of specific instrument comparisons presented in this chapter, that there is significant variability among instruments even when sampling similar conditions. Some types of comparisons are difficult due to fundamental differences in types of measurements, as noted in the discussion of measurements of upper tropospheric humidity from satellite and in situ measurements. Sufficient numbers of coincidences may improve the statistics in comparing disparate instrumental techniques where spatial resolution is important. However, in the variety of comparisons presented, fairly small sample sizes are considered. Therefore, many of the differences presented are not statistically significant. This is demonstrated by comparing the absolute values of the mean difference and RMS difference as plotted on several other plots in this chapter. In Figure 2.28, the RMS difference is much larger than the absolute value of the mean difference. This indicates that random errors are a large part of the measured instrumental differences. Such random errors may be due to geophysical variability (none of measurements are exactly coincident in time or space) or to instrumental noise.

However, even though many of the differences noted are within stated instrumental errors, certain biases between specific instruments do appear. One difficulty in assessing the magnitude of those biases is that a complete set of comparisons does not exist. Thus, a question to consider is whether one instrument can be used as a transfer standard to assess the relationship between two different instruments. This may be possible to do with the lower stratospheric measurements. The geophysical and sampling issues overwhelm the tropospheric measurements, and therefore make it a much more difficult problem.

Of the sensors compared in this chapter, only the frostpoint instruments have been used extensively both in the troposphere and stratosphere. For this reason, the summary given below is divided into tropospheric and stratospheric sections.

2.6.1 Stratospheric comparisons

All the measurement comparisons discussed in this chapter are summarised in this section based on percentage differences. Any direct comparisons given in terms of mixing ratio differences have been converted to percent differences for the purposes of this section.

In many cases, differences are consistent when comparing across instruments. Since direct comparisons do not exist between all instruments, a third instrument was used as a transfer standard to determine the entire set of relationships between instruments. In some cases, this technique works well. One good example is demonstrated using the Harvard Lyman-a, NOAA-AL Lyman-a and JPL TDL airborne instruments discussed in section 2.2. The NOAA-AL-Harvard comparisons indicate a difference of ~15% with the Harvard instrument reading larger values. The NOAA-AL-JPL comparisons indicate a difference of ~16% with the JPL TDL instrument larger. From this, one would derive a JPL-Harvard difference of 1%, with the JPL TDL measurements larger. The POLARIS comparisons actually indicate a JPL-Harvard difference of 1% with the opposite sign. However, considering data scatter, the JPL-Harvard difference is essentially zero at cruise altitude, which would also be deduced using the NOAA-AL Lyman-a as a transfer standard. In other cases, using an third instrument as a transfer standard does not work as well. One such example uses the NOAA-CMDL frostpoint, the MLS, and the HALOE instrument. In the 60 to 100 hPa layer, MLS differs from HALOE by ~5%, with HALOE larger. The NOAA-CMDL differs from MLS by ~3%, with NOAA-CMDL larger, leading one to conclude that the NOAA-CMDL instrument would read ~2% less than HALOE in this layer. However, the direct comparison between the HALOE and NOAA-CMDL instruments shows NAOA-CMDL 12% greater than HALOE. Because of these differences, both the direct comparisons and ranges deduced using a third instrument as a transfer standard will be presented.

One problem in consistency using an instrument as a transfer standard is that it is possible there are instrumental errors that change with time. This could be due to instrument degradation, or changes in atmospheric conditions that affect remote sensing retrievals, such as aerosol loading, temperature, or interfering gas corrections. These types of problems (aerosol in particular) likely affect the HALOE and SAGE II retrievals. Based on comparisons with MLS, it appears that HALOE problems related to aerosols are much lessened by 1993. This is fortunate, in that HALOE comparisons are possible with most of the stratospheric measurements. For this reason, it will be considered the baseline, and all other instruments compared with it. In some cases, there are gradients in the percentage differences throughout the altitude range (HALOE - NOAA-CMDL comparisons shown in Figure 2.21 are one such example). These gradients will be averaged over to give a single value for each table. Because the satellite measurements are reported to be less accurate below ~100 hPa, all comparisons will be presented above that level. The layer from 60-100 hPa contains the best overlap of satellite- and balloon-borne measurements. Layers from 10-50 hPa, and 1-10 hPa are also considered. By necessity, the in situ aircraft instruments will not be included in these comparisons.

Tables 2.5-2.7 show the direct HALOE and indirect comparisons deduced from the information presented in this chapter. The indirect comparisons are presented to show the range possible from the full set of measurements considered. These are constructed as described at the beginning of this section. Figure 2.72 summarises the range of measurements for three pressure layers. The mean values are plotted as symbols, and the range bars map out the extremes from the indirect comparisons. A small range does not necessarily translate into a small uncertainty for the estimate. It may mean that the comparisons are all consistent, or it may simply denote a dearth of indirect comparisons. In the cases of POAM III and SAGE II, no indirect comparisons were available, therefore, the range given is the internal sunrise/sunset difference deduced for each instrument.

In the 1-10 hPa layer (Table 2.5, top panel of Figure 2.72), 11 instruments are considered. Coincident comparisons with HALOE were possible with all the instruments except SAGE II. For SAGE II, climatological comparisons were included. All instruments are remote sounders, from satellite, high altitude balloon, space shuttle, and ground based platforms. At this level, aerosol contamination problems for the solar occultation instruments should be minimal, and geophysical variability should also be low, making this layer ideal for these sorts of comparisons. The absolute range of both direct and indirect comparisons with HALOE is -5% to +17%. The mean is +5% and the median is +6%. The mean comparisons (symbols in Figure 2.72) of 9 of the 11 instruments cluster between 0% and 10% greater than HALOE. All instruments considered for the 1-10 hPa layer agree within their stated levels of accuracy.

In the 10-50 hPa layer (Table 2.6, middle panel of Figure 2.72), 13 instruments are considered. Only climatological comparisons were possible between HALOE and SAGE II, and only indirect comparisons using other instruments transfer standards were possible for MIPAS. This layer allows comparisons between satellite and balloon-borne remote sounders and in situ balloon-borne instruments. The range of both direct and indirect comparisons with HALOE is -30 % to +27 %. However, this wide range is a consequence of a large range in the HALOE-LMD frostpoint comparison. If the LMD frostpoint comparisons are neglected, the range is reduced to -8 % to +16 %. The mean of the comparisons in this layer is +3%, and the median is +4%. The mean comparisons of 11 of the 13 instruments fall between -2% and +8%. On average, the instruments considered for the 10-50 hPa layer also agree within their stated levels of accuracy.

The 60-100 hPa layer (Table 2.7, bottom panel of Figure 2.72) has the largest number (15) of instruments and measurement techniques of the 3 layers discussed here. The range of both direct and indirect comparisons with HALOE is large, from -15% to +42%, with the range of mean comparisons (symbols plotted) from -9% to +20%. The mean of the comparisons is +6.5%, the median is +6%, and 11 of the 15 instruments fall into the range -1% to +14%. The two lowest instruments are satellite sensors, and the two highest are aircraft borne in situ instruments. The balloon-borne in situ and remote sensor measurements cluster near the mean. In this layer, geophysical variability is larger, thereby making close coincidences for comparisons more important. A greater number of measurement techniques are used here, however accuracies quoted in Chapter 1 do not indicate that should be a problem. Although the bulk of the measurements at this level do agree to within stated accuracies, the low satellite and high in situ aircraft measurements do not. Similar differences are also present in the derived quantity [H2O]e (see Table 2.4), which should minimise sampling mismatch problems. Thus, it appears that the differences noted here are real. The differences between instruments occupying the extreme positions exceed what would be expected based on the uncertainties given in Chapter 1.

Table 2.5. Summary of the relationship for measurements of water vapour above 10 hPa and below 1 hPa. Percentages reflect differences from HALOE measurements. The first column shows direct comparisons with HALOE. The second column shows a deduced relationship with HALOE using the instrument in the previous column as a transfer standard..

Direct Comparisons with HALOE
Indirect Comparisons with HALOE
using a transfer instrument

HALOE +0%

 

MLS -5%

ATMOS +5%

MAS -2%

FIRS-2 0%

WVMS -3%

SAGE II (SR) +9%

(SS) +3%

(climatological comparisons)

No indirect comparisons considered for climatological comparisons.

POAM III (NH,SR) -5%

(SH,SS) +7%

No indirect comparisons available for this assessment.

ILAS (SH,SS) +8%

(NH,SR) +12%

MkIV +16%

FIRS-2 +17%

(indirect ILAS comparisons valid for NH)

ATMOS +3%

MLS -2%

MAS +6%

MkIV +3%

MAS +5%

MLS -2%

ATMOS +9%

WVMS +8%

MkIV +10%

 

ILAS +14%

ATMOS +10%

FIRS-2 +12%

FIRS-2 +5%

 

MLS -5%

ILAS +10%

MkIV +3%

WVMS +6%

MLS +9%

ATMOS +9%

MAS +3%

WASPAM +12%

10% in 1994, 14% in 1998 absolute

range from 9% to 17%

No indirect comparisons available for this assessment.

Table 2.6. Summary of the relationship for measurements of water vapour between 10 and 50 hPa. Percentages reflect differences from HALOE measurements. The first column shows direct comparisons with HALOE. The second column shows a deduced relationship with HALOE using the instrument in the previous column as a transfer standard. MIPAS had no direct comparison with HALOE available for this assessment, and is only included in the second column.

Direct Comparisons with HALOE
Indirect Comparisons with HALOE
using a transfer instrument

HALOE +0%

 

MLS -8%

ATMOS +5%

MAS +1%

FIRS-2 -1%

CMDL -11%

LMD balloon (frostpoint) +12%

SAGE II (SR) +6%

(SS) +2%

(climatological comparisons)

No indirect comparisons considered for climatological comparisons.

POAM III (NH,SR) -8%

(SH,SS) +8%

No indirect comparisons available for this assessment.

ILAS (SH,SS) -1%

(NH,SR) +5%

MkIV +3%

FIRS-2 +7%

MIPAS +11%

FISH +15%

LMD balloon (frostpoint) +17%

(indirect ILAS comparisons valid for NH)

ATMOS +6%

MLS -7%

MAS -4%

MkIV +6%

MAS +4%

MLS -5%

ATMOS +14%

MkIV +5%

 

ILAS +10%

ATMOS +5%

FIRS-2 +0%

FIRS-2 +4%

 

MLS -3%

ILAS +9%

MkIV +9%

FISH Lyman-alpha +6%

ILAS +16%

LMD balloon (frostpoint) +1%

NOAA CMDL (frostpoint) -5%

MLS -2%

LMD balloon (frostpoint)

(tropics) +15%

(Sweden) -10%

MLS -30% to -5%

ILAS +2% to +27%

MIPAS -7% to +19%

FISH -5% to +20%

 

Table 2.7. Summary of the relationship for measurements of water vapour between 60 and 100 hPa. Percentages reflect differences from HALOE measurements. The first column shows direct comparisons with HALOE. The second column shows a deduced relationship with HALOE using the instrument in the previous column as a transfer standard. Italicized instruments (JPL TDL and MIPAS) in the second column have no direct comparison with HALOE available for this assessment.

Direct Comparisons with HALOE
Indirect Comparisons with HALOE
using a transfer instrument

HALOE +0%

 

MLS -5%

ATMOS +8%

FIRS-2 +10%

HARV +15%

CMDL -2%

LMD balloon (frostpoint) +5%

SAGE II (SR) -4%

(SS) -14%

(climatological comparisons)

No indirect comparisons considered for climatological comparisons.

POAM III (NH,SR) -2%

(SH,SS) +5%

No indirect comparisons available for this assessment.

ILAS (SH,SS) ~0%

(NH,SR) +3%

MkIV +6%

FIRS-2 +5%

MIPAS -1%

FISH -2%

(indirect ILAS comparisons valid for NH)

ATMOS +5%

MLS -8%

MkIV +3%

AL +3%

HARV +18%

MKIV +13%

ILAS +10%

ATMOS +15%

FIRS-2 +13%

HARV +20%

JPL TDL +15%

FIRS-2 +5%

MLS -10%

ILAS +7%

MkIV +5%

HARV +21%

JPL TDL +21%

NOAA AL Lyman-alpha (AL) +8%

ATMOS +10%

HARV +23%

CMDL -12%

JPL TDL +23%

Harvard Lyman-alpha (HARV) +20%

MLS +0%

ATMOS +7%

MkIV +15%

FIRS-2 +4%

AL +5%

CMDL -10%

JPL TDL +21%

FISH Lyman-alpha +8%

ILAS +13%

LMD Balloon (frostpoint) +3%

NOAA CMDL (frostpoint) +12%

MLS +15%

AL +32%

HARV +42%

LMD balloon (frostpoint)

(tropics) +15%

(Sweden) -5%

MLS -15% to +5%

MIPAS -1% to +19%

FISH +0% to +20%

 

Figure 2.72. Summary of the relationship between stratospheric measurements assessed in this report for 3 altitude ranges. The symbols give the direct percentage difference from HALOE, and the horizontal lines show the range of the indirect comparisons presented in Tables 2.5-2.7. Each tick mark is 1%, and the placement for HALOE is indicated by the dotted line. Where no direct comparison was available, the symbols give the average of the indirect comparisons.

At levels above 50 hPa, overall agreement between the sensors compared is quite good. The range of the direct comparisons with HALOE is less than 20%, with a clustering of nearly all the instruments within a range of 10% and instruments agreeing to within their stated levels of uncertainty. In the lowest layer considered, the agreement is still good in general, although the direct comparison extremes cover a range of 30% and most of the instruments compared fall within a 15% range.

From this set of comparisons, it appears that the infrared instruments agree well with each other and with the balloon-borne in situ instruments. The MLS stratospheric water vapour is biased low relative to the other instruments at all levels. In the 10-50 hPa layer, the two frostpoint instruments considered tend to be biased slightly low relative to most of the other measurements. This is not true in the 60-100 hPa layer, where both frostpoint instruments fall in the middle of the cluster of comparisons. The one stratospheric aircraft TDL instrument was not only biased larger than the other instruments at its cruise flight level, but there was a pressure dependence to its difference with the coincident Lyman-a measurements. Three Lyman-a instruments were considered, and there was not a consistent bias associated with the three as a whole. However, one of the instruments was biased high relative to the others in flight, but agreed within stated uncertainties when compared with both a frostpoint and another Lyman-a instrument under controlled laboratory conditions. This indicates that the fundamental in situ techniques are well understood, but that implementations of those techniques on airborne platforms are not. The reason for the larger spread in flight conditions is not understood; the problem deserves further attention.

Three of the solar occultation satellite instruments (SAGE II, ILAS and POAM III) show significant sunrise-sunset differences that deserve further examination. SAGE II is biased low in the 60-100 hPa layer, but there are likely aerosol contamination problems with its lower stratosphere retrievals associated with the decay of aerosol from the eruption of El Chichón. Although useful for deducing certain aspects of atmospheric behaviour, one should ensure that the research application fits the capabilities of the SAGE II data set. Filtering to avoid regions of high aerosols is likely needed. A new retrieval has been recently released, but it remains to be seen what improvements result. It should be noted that such an aerosol contamination problem is not unique to the SAGE II data. As described in section 2.4.2, during its first year of operation, HALOE measurements in the tropics are also likely affected by Mt. Pinatubo aerosols.

In spite of the differences detailed above, it appears that the agreement between different instruments measuring stratospheric water vapour has improved significantly over the past 15 years. Results presented in Albritton and Zander [1985] (see their Figure C-7) show a range of 50-60% between different balloon-borne instruments in an organised comparison, with no clear apparent clustering of measurements. In the present set of comparisons, the majority of the instruments cluster within a 10% range at all levels, with the extremes of the indirect comparisons of clustered instruments separated by ~30%. If the small number of instruments that fall at the high and low ends of the comparisons are considered, the extremes of the indirect comparisons are separated by ~50%. Still, the bulk of the instruments agree well with their stated errors. This should be considered a vast improvement over the state of stratospheric water vapour measurements in the early 1980's.

Existing measurements should be adequate for describing the seasonality of lower stratospheric water vapour, and deducing certain aspects of stratospheric transport and stratosphere-troposphere exchange. Yet, even this seemingly good agreement is not sufficient to allow combining instrumental records to estimate long-term changes in stratospheric water vapour. Deduced changes presented in section 2.5.5 are on the order of 1%/year. 10% agreement is not sufficient to allow combination of time series from different instruments; the biases are still larger than the signal of interest. A long data set from the same instrument is more valuable than a series of short data sets from different instruments for long-term change determinations. Continuing to fly instruments with extensive histories is important for monitoring the stratosphere for long-term changes in water vapour.

2.6.2 Upper tropospheric humidity comparisons

Because of the radiative importance of water vapour in the upper troposphere and the potential for long-term change, an assessment of the quality of the data in this region of the atmosphere is an essential part of this chapter. In terms of available data for describing the distribution and understanding the processes that control water vapour in the upper troposphere, the data set from the HIRS sensor on TOVS is the most comprehensive in geographic coverage and length of record. The world-wide radiosonde network has provided tropospheric humidity measurements since the 1940's. The length of record and global coverage suggest that radiosondes have the potential for providing data for addressing questions about water vapour behaviour, and could be an important tool for evaluating other measurement techniques. However, the radiosonde network humidity sensor performance tends to be of poor quality at cold temperatures and low pressures (see section 1.1.4), which are precisely the conditions experienced in the upper troposphere. Additionally, as noted in section 1.1.4, there have been numerous changes in instrumentation over the existing data record, this complicates analysis of such data for long-term changes. However, because of their relatively low cost, operational radiosondes have been, and likely will continue to be, used for validation purposes of upper tropospheric humidity. The recent improvement of radiosonde humidity sensors holds some promise for obtaining better data from these operational instruments. Comparisons presented in this chapter (section 2.2.3) between the widely used Vaisala Humicap A sensor and the NOAA-CMDL frostpoint instrument show, for example, that at temperatures of ?60oC the reported humidity from the radiosonde is only one half of the frostpoint instrument value. From this type of comparison, a correction algorithm to improve the quality of the upper tropospheric radiosonde data was developed. However, application of this correction to the archived Vaisala radiosonde data set for the period 1991-1994 for comparison with the MLS upper tropospheric water vapour product yielded inconsistent results that are not currently understood (see section 2.3.2). Appropriately calibrating the radiosonde humidity element for the temperature conditions of the troposphere can improve the accuracy of the measurements (section 2.2.3). Such calibrations are being implemented in newer radiosonde models.

In addition to radiosondes, several other non-satellite systems were compared primarily with a view toward their value in evaluating the TOVS and MLS upper tropospheric water vapour products, but also for their potential as new technologies for future measurements. Both DIAL and Raman LIDAR systems were compared with radiosondes as well as airborne systems using chilled mirror hygrometers. The LIDARs gave results in the troposphere within about 10% of the correlative measurements, suggesting that such systems can accurately measure water vapour, and that if deployed in sufficient numbers could provide profile data for satellite validation.

Commercial aircraft provide a platform from which upper tropospheric water vapour data can be obtained on a regular basis in heavily used flight corridors. The MOZAIC program currently provides such data using a specially calibrated sensor similar to the Vaisala radiosonde Humicap-H. This set also provides potential satellite correlative data. Comparisons of the MOZAIC sensor with airborne frostpoint and Lyman-a hygrometers as part of designed validation experiments show that the MOZAIC measurements are accurate to about 10% in mixing ratio for flight altitudes in the upper troposphere.

From the direct water vapour measurements summarised above, comparisons were carried out with the MLS and TOVS satellite observations. The high variability of water vapour in the upper troposphere leaves significant ambiguity in the conclusions from these comparisons. At present, it must be concluded that the direct measurements of UTH do not provide a strong constraint on the satellite measurements of UTH. At the two highest MLS retrieved levels (147 hPa and 215 hPa) the biases are smaller (about 10%) than those at the lower two altitudes where all of the comparisons suggest a dry bias of MLS. Published comparisons of radiosonde humidities with TOVS data show that the radiosondes indicate substantially smaller humidity values. A recent comparison of a fairly small number of properly calibrated radiosonde sensors with TOVS UTH showed only small biases. In the TOVS comparisons with MOZAIC, it was found that at higher latitudes, the satellite weighting function peaked at a different altitude than the aircraft observations. This results in biases between the derived humidities from the two techniques. For comparisons at low humidities, the MOZAIC measurements are smaller than are those from TOVS. However, the bias derived is not statistically significant. An important issue is that the variability of the means of the smaller MOZAIC data set differs significantly from that of the much larger TOVS data set. This indicates that the smaller MOZAIC data set is not adequately capturing the true atmospheric variability, and thus statistical comparisons between the disparate data may not be meaningful.

Comparisons of SAGE II integrated water vapour between 200-500 hPa with integrated TOVS data suggest that SAGE II sampling is insufficient to properly characterise the full variability of upper tropospheric water vapour. Although TOVS samples much more frequently than MLS, the two instruments have similar sampling volumes. These two instruments provided the best opportunity for comparison of satellite UTH. At the 316 hPa level (near the peak of the TOVS-HIRS channel 12 weighting function), the TOVS and MLS give similar results. At very low and at very high values of UTH, MLS averages are biased low relative to TOVS. This is likely due to the small number of coincidences. In the mid-range of UTH there is virtually no bias. Overall these two systems that use very different techniques appear to produce comparable results on monthly averaged time scales.

2.6.3 Conclusions and recommendations

Over 25 instruments representing several techniques were assessed for the quality of the data that they produce. More conclusive results about data quality could be drawn from the stratospheric comparisons than those in the upper troposphere. In some ways this reflects the fact that in the upper troposphere water is more dynamic and variable, making comparisons more difficult. It also appears, however, that greater emphasis has been placed on, and attention paid to, developing and validating stratospheric water vapour observations. For both the stratosphere and troposphere there is no single technique or instrument platform that is recognised as a standard to which other techniques should be compared, and thus comparisons were made relative to one another.

In the stratosphere, a reasonable degree of consistency was found among measurements made from near the tropopause up to 50 km (~1 hPa). The majority of the instruments considered clustered within a 10% range, although direct comparisons among other individual instruments showed larger differences, however, with some of these exceeding 30%, and indirect intercomparisons reached 50%. For differences smaller than about 10%, the quality of comparisons makes it difficult to determine the cause of the differences since they may be related to factors such as coincidence of measurements that are affected by atmospheric variability. However, even much larger differences, that indeed appear to be real, such as those between MLS and the Harvard Lyman-a instrument, do not seem to have causes that were revealed through this assessment process.

There is only one stratospheric water vapour data set of 20 years duration that has a nearly continuous time series available for determination of long-term changes. There are, however, a number of sets that have sampled periodically over a long period and several time series of intermediate length (8-15 years) that can be used for evaluating stratospheric changes. Although not definitive, these observations are consistent in suggesting that water vapour has increased at about 1%/year over the past 50 years. The record also suggests that this increase has not been uniform but has varied over this period.

In the upper troposphere the TOVS data set is the most extensive in length of record and geographic coverage, and is the only one capable of answering many of the scientific questions about water vapour in this region of the atmosphere. It was therefore the data set that was the focus of the UT data quality assessment. The tropospheric MLS sensor provided the best data set for comparison with TOVS and only small biases were found between them. Comparisons between TOVS and MLS and direct water vapour measurements from radiosondes and aircraft did not provide strong constraints on the performance of the satellite sensors. In part this was due to the shortcomings in the direct measurements, particularly the radiosondes. However, there are also difficulties with making comparisons in an inhomogeneous atmosphere when instruments have very different spatial coverage and altitude resolution. The assessment of the TOVS data did not reveal any major inconsistencies in this data set that would preclude its use in describing the behaviour of upper tropospheric humidity.

Several recommendations that can be made from this evaluation of data quality include: