Tag Archives: hop stand

IBUs and the SMPH Model

Introduction
When I first started brewing, the software I used had three options for predicting IBUs: Tinseth, Rager, and Garetz. The word “Tinseth” had a nice sound to it, so I chose that one. I was quite happy with that option until I became interested in flameout additions and hop stands, where the Tinseth formula predicts zero IBUs. Then I found out that I could get IBUs professionally measured at a very reasonable cost. So I did one small experiment to measure IBUs in finished beer with and without a hop stand. And then another experiment, and then another. Just when I thought I could predict IBUs reasonably well, I’d get results that challenged my assumptions. I wrote detailed blog posts about almost all of my experiments, so that anyone can (hopefully) replicate my findings. More than seven years and well over 300 measured IBU values after my first experiment, I put the finishing touches on a new model for predicting IBUs. The purpose of this post is not to go into the gory details, but to give an overview of the model’s higher-level concepts and to address some common misconceptions about the IBU. I also give test results that compare this model with four other IBU models on 18 different beers ranging from 20 to 70 IBUs.

This new model, called SMPH, is available at https://jphosom.github.io/alchemyoverlord/. Even if you don’t use it for recipe planning, I encourage you to play around with it to see how different brewing conditions can yield very different (or sometimes not so different) IBU predictions.

It’s important to note that I’ve been pretty obsessive about measuring or estimating volumes, alpha-acid ratings, weights of hops, hop steep times, wort cooling times, pH, and any other factor that seems relevant. While I’m not saying that you need to be this obsessive in your brewing (it’s supposed to be fun, right?), realize that small measurement or estimation errors might have a large impact on predicted IBUs. If your post-boil volume measurement is off by 10%, then your IBU prediction will also be off by 10%. There is also 10% to 15% variation in alpha-acid content within the same bale of hops [Verzele and De Keukeleire, p. 331], and so the AA rating on your package may not be an accurate indicator of the amount of alpha acids that are in your hops. If your predicted IBU value is off by 20%, that might reduce your prediction from 40 to 32 IBUs. If you’re unlucky, all of these measurement errors can add up and make the prediction meaningless; if you’re lucky, they can cancel each other out. Your mileage may vary.

If predicting IBUs is such an imprecise and difficult art form, why bother? Obviously, you don’t need to care about predicting IBUs if you’re happy with the bitterness levels of the beers that you make. Or, if you find that a beer turns out more (or less) bitter than you’d like and you don’t mind brewing it again with a different amount of hops (or adding iso-alpha acid extract, or blending several beers), then you don’t need to worry about it. But, if you find that your first attempt at a beer can sometimes yield a bitterness that isn’t quite right, you may want to get the best prediction of bitterness that you can before brewing. That prediction might still be a bit off, but an in-the-ballpark estimate is still better than no estimate at all. By way of analogy, even though pH test strips aren’t as accurate as a digital pH meter, if you don’t have a pH meter it’s still better to use test strips than to pretend that pH doesn’t matter and ignore it.

IBUs: IAAs and ABCs
The IBU is a measurement of the amount of infrared light absorbed by a sample of processed beer [Thermoscientific; Anon.]. It is often (and incorrectly) reported that one IBU equals one part per million (ppm) of isomerized alpha acids (IAAs). However, as Val Peacock explains, the IBU was developed in the 1950s and 1960s to measure the combination of both IAAs and “auxiliary bittering compounds” (ABCs) [Peacock, pp. 158-161]. The researchers at that time knew that there are bitter substances in beer other than IAAs, and they deliberately included them in the IBU measurement. The IBU combines the concentration of IAAs and ABCs in beer into a single measure of approximate bitterness. The confusion about one IBU equaling one ppm of IAAs has come up because they scaled the IBU measurement so that the two numbers would often be close to each other. However, this rough correspondence only holds under specific brewing circumstances that were common in the 1960s and are less common today. When the IBU was developed, IAAs contributed to about 70% of the IBU value, and ABCs contributed the remaining 30%. The proportion of IAAs contributing to the IBU can change greatly depending on brewing techniques and how well the hops have been stored. In the 18 beers used in testing the SMPH model (described in more detail below), I estimate that IAAs contribute to between 50% and 75% of the IBU. (A West-Coast IPA with lots of late-hop and dry-hop additions has an IAA contribution of 50%, and a more traditional beer with one early and one late addition has an IAA contribution of 75%.) In the data used for finding SMPH parameter values, the estimated IAA contribution ranges from 0% to over 80% of the IBU.

I have found that the largest fraction of ABCs are oxidized alpha acids (oAAs) that are produced when hops are added to hot wort [Algazzali, p. 17]. I estimate that about 10% of the available alpha acids oxidize quickly in boiling wort, producing oAAs. In most beers, the second-largest contributors to ABCs are malt and hop polyphenols, followed by oxidized beta acids. In her Master’s thesis, Christina Hahn (advisor: Tom Shellhammer) notes that “individually, iso-alpha acids and [oxidized alpha acid] concentrations are relatively poor predictors of sensory bitterness, while the sum of iso-alpha acids and [oxidized alpha acids] is almost as good a predictor of sensory bitterness as [the IBU]” [Hahn, p. 48]. She found a strong correlation (R2 = 0.86) between sensory bitterness and the IBU, and a strong correlation (R2 = 0.80) between sensory bitterness and the combination of IAAs, oAAs, and alcohol (ABV) [Hahn, p. 50]. In short, the concentrations of IAAs and oAAs are, together, very good predictors for both sensory bitterness and the IBU. These findings support the claim that oAAs are the largest component of the auxiliary bittering compounds.

SMPH Model: The Big Picture
The SMPH model was developed to have one key advantage over other IBU models: it separates out the contribution of isomerized alpha acids (IAAs) from auxiliary bittering compounds (ABCs). The conversion of alpha acids to IAAs takes place relatively slowly (e.g. over the course of an hour-long boil), but ABCs are quickly produced or dissolved in the wort. These different time scales mean that IAAs and ABCs should be modeled separately. (The mIBU calculator includes some approximations in this regard, but it is inherently limited in its ability to accurately separate the two.) While IAAs contribute the most to the IBU in “typical” beers (if there is such a thing as a typical beer anymore), ABCs can contribute a significant amount, especially when using hops late in the boil, when using a hop stand, and/or when dry hopping (techniques commonly used in brewing IPAs).

The starting point for development of the SMPH model was an understanding of Val Peacock’s explanation that IBUs are a specific proportion of the concentrations of IAAs and ABCs in beer [Peacock, p. 161], and realizing that Mark Malowicki’s model of the production and degradation of IAAs in boiling wort [Malowicki, p. 27] could be combined with rough estimates of the concentration of each ABC and different loss factors to predict IBUs. At that point there was the skeleton of a model but a lot of missing factors and unknown parameter values. These values were determined (or found to be irrelevant) by controlled experiments in which only the factor in question was varied. Data from those experiments were gradually added to the set of model training data. The process of estimating a loss factor and then minimizing the mean-squared error on the remaining parameters was iterated until the error on a cross-validation set was reduced to an acceptable level.

The SMPH model first makes a prediction of the concentration of IAAs in wort using Malowicki’s model of alpha-acid isomerization. It then estimates of the concentration of each auxiliary bittering compound in wort. The concentrations of IAAs and ABCs are then modified by various factors (described below) occurring during the boil, fermentation, and conditioning. Finally, it uses an equation proposed by Val Peacock [Peacock, p. 161] to convert from these estimated concentrations in beer to a final IBU value.

Like the Garetz model, the SMPH model can account for a large number of factors that influence IBUs. In the SMPH model, that means accounting for the boiling point of water, wort gravity, wort pH, wort clarity (e.g. a careful vorlauf vs. brew-in-a-bag wort collection), form of the hops (whole cones or pellets), hopping rate, hop freshness, krausen loss, flocculation, finings, filtering, and age of the beer. Basically every step in the brewing process seems to have some influence on IBUs.

The SMPH model uses approximations of all of the known factors that might influence IBUs. (Unknown factors are probably still waiting to be discovered.) The goal has not been to precisely quantify each of the myriad factors (I only have one life to live), but to put all of the approximations together into one imperfect but reasonable model. Where even approximations have been difficult to come by, I used over 300 measured IBU values to find the parameter values that give the best fit to the data.

Figure 1 illustrates the different components of the SMPH model. Measured IBU values from finished beer are shown at 10-minute intervals during a 90-minute boil. The green area shows the contribution from isomerized alpha acids (using the Malowicki model), the blue area shows the contribution from oxidized alpha acids, and the red area shows the contribution from malt and hop polyphenols. The SMPH model output is the sum of these contributions. (In this example, the hops were well preserved and so the contribution from oxidized beta acids is negligible.)

Figure 1. Measured IBUs and the components of the SMPH model.

A much more detailed explanation of the concepts and factors used in the SMPH model is described in a separate blog post, A Summary of Factors Affecting IBUs.

Factors Influencing IBUs
The SMPH model accounts for a number of factors that influence IBUs. These factors can be put into one of three groups for the purposes of discussion: “large-impact”, “medium-impact”, and “small-impact” factors.

Large-Impact Factors
The factors that can have a large impact on IBUs are (a) hops added to hot wort (kettle hops) vs. ambient-temperature or “cold-side” wort (dry hops), (b) form of the hops (whole cones or pellets), (c) hopping rate, (d) wort pH, and (e) wort clarity.

Kettle vs. Dry Hops: Hops added to hot wort in the kettle undergo alpha-acid isomerization, which produces the majority of bitterness in most beers. Dry hopping will produce no IAAs, but in large amounts it can produce significant bitterness from ABCs [Parkin, pp. 33-34; Maye and Smith, p. 135], especially from oxidized alpha acids created during hop storage. Oddly enough, at higher IBUs the use of dry hopping also reduces the concentration of IAAs from kettle additions [Parkin, p. 34; Maye and Smith, p. 135]. The IBUs from a dry-hop addition are difficult to estimate, but the difference between adding hops to the boil kettle or to the fermentation or conditioning vessel will have the largest impact on the IBU value.

Form of Hops: Hop pellets produce more IBUs than whole cones. With pellets, the production of oxidized alpha acids when hops are added to the boiling wort is about double that of whole cones. This factor seems to be variety specific, with some varieties producing very little increase from pellets, and other varieties producing a large increase. The rate of alpha-acid isomerization appears to be the same when using pellets or whole cones.

Hopping Rate: It is well known that doubling or tripling the amount of hops generally won’t produce a doubling or tripling of the IBU. As the concentration of hops increases, the resulting IBU value increases more slowly. An alpha-acid solubility limit is a reasonable explanation for this effect, with all alpha acids dissolving up to about 200 ppm and a reduction in the percent that dissolves as the alpha-acid concentration increases. Mark Garetz incorporated a hopping-rate factor into his model, but I suspect that he underestimated the effect.

Wort pH: I’ve found that lowering the pH from 5.75 (the approximate pH of a mash made from untreated low-alkalinity water and two-row malt) to 5.25 (within the recommended range of 5.2 to 5.4) can reduce IBUs by 15% to 35%. Most of the decrease in IBUs appears to come from a loss of ABCs, with only a small loss of IAAs.

Wort Clarity: Much to my surprise, I’ve found that the clarity of the wort can have a significant impact on IBUs. In this case, “clarity” refers to how visually clear or cloudy the wort is when it is transferred to the fermentation vessel (FV), ignoring the effect of hop matter. Cloudy wort yields relatively fewer IBUs. In other words, wort produced using the brew-in-a-bag technique with no filtering of the grain bed can yield a much lower IBU value than clear wort produced with a careful vorlauf and good grain-bed filter. (This is not to say that one method is better than the other, just that they may yield different IBUs.) Likewise, stirring the wort just before transferring into the FV can produce a lower IBU value than letting the wort settle and racking only the clear wort into the FV. I’ve observed very clear wort producing 30% more IAAs than typical wort, and very cloudy wort producing 30% fewer IAAs than typical wort. The reason for IBUs being affected by wort clarity is unknown, but wort protein levels do not seem to be a factor.

Medium-Impact Factors
Factors that often have only a medium impact on IBUs are: (a) how well the hops have been stored (hop freshness), (b) wort specific gravity, (c) the use of a hop stand, (d) losses to krausen deposits, and (e) the age of the beer.

Hop Storage Conditions: The storage conditions of hops can have a large impact on the amount of alpha acids remaining in those hops. As the amount of alpha acids decreases due to poor storage conditions and/or longer storage duration, the amount of oxidized alpha and beta acids increases, somewhat mitigating the reduction in IBU values [Peacock, p. 162]. (Nitrogen-flushed packaging and cold storage are the best ways to preserve hops.) While differences in storage conditions may not have a large effect on the IBU, I think storage conditions do have a large impact on overall beer quality.

Wort Gravity: Wort gravity is one of the factors common to all IBU prediction models. On average, the difference in IBUs between a 1.030 wort and a 1.080 wort is about 15%. The difference between a 1.040 wort and a 1.070 wort is about 10%.

Hop Stands: During a hop stand, alpha acids continue to isomerize in the hot wort, increasing the IBU. The amount of impact from a hop stand depends a lot on the duration of the stand and when hops are added to the wort, so I’ve classified this as a medium-impact factor.

Krausen: Most brewers let krausen deposits accumulate on the sides of the fermentation vessel. If you skim off the krausen as it is produced (which is sometimes recommended to produce a “smoother” beer [e.g. Troester; Hough et al., pp. 652-653]), the resulting IBU value can be about 5% to 10% lower. If you use a blow-off tube and remove a lot of the krausen, the IBU value may be 25% lower. If you mix the krausen back into the beer (or use an anti-foaming agent) during fermentation then the IBU may be about 10% higher. I’ve classified krausen as a medium-impact factor because the loss of lots of krausen through a blow-off tube is quite possible but perhaps not so common.

Age of the Beer: After primary fermentation, IBUs will decrease as the beer conditions. I have noticed a 20% decrease in IBUs as a beer ages from 1 week to 13 weeks at about 60°F (16°C). While a lot of the decrease seems to happen in the first several weeks, most beers aren’t conditioned for months at cellar or room temperature, and if a beer is conditioned or stored at cold temperatures, IBUs are probably much better preserved. Therefore, I’ve put this factor in the “medium-impact” category, but it’s probably a small impact for cold-conditioned lagers.

Small-Impact Factors
The factors that usually have a minor impact on IBUs are (a) the boiling point of water, (b) the rate at which wort is force-cooled after flameout or a hop stand, and (c) flocculation, finings, and filtering.

Boiling Point of Water:
The difference in IBUs when brewing at sea level compared with Boulder, Colorado or Johannesburg, South Africa is about 20% for typical beers. This would be a large-impact factor, but most cities are at 1000 feet (300 meters) or less, in which case the impact is 4% or less. (In a typical beer, the majority of IBUs come from alpha-acid isomerization, and we can use the Malowicki model of temperature-dependent isomerization to estimate the impact of altitude.)

Rate of Wort Cooling: After flameout or a hop stand, alpha acids continue to isomerize in the hot wort while it is force-cooled, down to about 140°F (60°C). These post-boil IAAs increase the IBU. While there may be a large difference in IBUs when going (for example) from an ice bath to a Hydra wort chiller, smaller differences in cooling technique may have only a small impact on IBUs.

Flocculation, Filtering, and Finings: These factors are each estimated to influence the IBU by about 5% or less [Garetz, pp. 140-140; Fix and Fix, p. 129].

No-Impact Factors
There is one more group of factors that aren’t in the SMPH model because I don’t believe that they have any meaningful impact on IBUs. Such factors include the kettle size and kettle geometry, containing hops in a mesh bag, and the use of malt extract instead of wort from all-grain brewing. Kettle size or geometry is sometimes claimed to have an impact on IBUs, but one explanation for the correlation between kettle size and IBUs is the time it takes to cool a large volume of wort and the isomerization that happens while the wort is being cooled. My experiments have used a wide range of volumes, and I’ve seen no effect of volume or kettle size on IBUs. However, it is possible that hydrostatic pressure is a factor that may increase IBUs; an experiment by Brülosophy found a significant perceptual difference resulting from a change in hydrostatic pressure.  Further tests of IBUs and hydrostatic pressure may yield interesting results.  Putting hops in a mesh bag is sometimes claimed to reduce IBUs, but experiments conducted by both Brülosophy and me have shown no meaningful difference in measured IBUs. I’ve also heard that brewing with malt extract can yield different IBUs than with all-grain brewing, but my direct comparison of beers brewed with Briess Pilsen Dried Malt Extract and Great Western Premium two-row malt showed no meaningful difference in measured IBU values. (Also, I can think of no plausible mechanism through which the concentration of wort into dried malt extract could affect alpha-acid isomerization or the concentration of ABCs.) Future experiments may show some relationship for some of these factors under different conditions, for example with hops in a fine-mesh bag or specific brands of malt extract, but for now there is no known difference worth modeling.

SMPH Parameter Estimation
For most parameters in the SMPH model, estimated values could be obtained from the literature, direct experimentation, or reasonable assumptions. For a few parameters, though, there was no good estimate: (a) the loss of IAAs to trub during the boil, (b) what percent of the available alpha acids are quickly oxidized when added to hot wort, and (c) what percent of the alpha acids that oxidize during storage are dissolved when added to wort. In addition, I wanted to use all available data to get better estimates of the two parameters used in a hopping-rate correction model. A set of 347 measured IBU and IAA values were used to estimate values for these five parameters. (Four IBU and four IAA values were taken from Val Peacock’s reported numbers [Peacock, p. 162]. The other values came from my experiments.)

While this may seem like a lot of data for estimating five parameters, the estimation was complicated by the fact that I often didn’t have precise estimates of the alpha-acid content on brew day and/or how the hops had degraded during storage. Each measured value was therefore associated with a small parameter search for these experiment-specific values as well as the five common values.

Optimizing the parameter values to fit the data resulted in a root-mean-square (RMS) error of 1.6 IBUs and a maximum difference of 7.1 IBUs (for a condition that had 81 measured IBUs). The estimated loss factor for IAAs during the boil is 0.51. The percent of available alpha acids that quickly oxidize when added to hot wort is estimated at 11%. The percent of storage-generated oxidized alpha acids that dissolve in the wort is estimated at 33%. The solubility of alpha acids (for hopping-rate correction) is estimated to have a minimum limit of 200 ppm (below which all alpha acids are dissolved) and a maximum of 580 ppm.

Test Results
To evaluate and compare different IBU models, I collected an additional set of 18 IBU values that were not used in parameter estimation or for cross-validation of the SMPH model. These values ranged from 20.2 to 70.0 IBUs, including a variety of ale styles (two stouts, one ESB, one Kölsch, an English IPA, a West-Coast IPA, and twelve single-malt-and-single-hops (SMASH) beers with different timings of the hop additions). All IBU values were measured from finished beer.

The table below shows, for five IBU models, the RMS error and maximum difference between a measured and modeled IBU value on this set of 18 data points.

Model RMS Error (IBUs)
Max. Error (IBUs)
SMPH 2.4 5.2
Tinseth 20.4 70.5
Rager 39.6 137.9
Garetz 12.34 28.14
mIBU 11.4 33.2

Figure 2 compares measured IBUs and predicted IBUs for the five models, with measured IBUs on the horizontal axis and predicted IBUs on the vertical axis. The straight dashed line from lower left to middle right indicates where predicted and measured IBUs are equal. It can be seen that on this set of data, the Tinseth, Rager, and mIBU models all have very large predicted IBUs for the higher-IBU beers. The Garetz model has a good fit with the higher-IBU samples, but predicts values about 50% too low in the range of 20 to 25 IBUs.

Figure 2. A comparison of measured and predicted values for five IBU models.

Other Considerations
Some people are more sensitive to bitterness than others [Reed et al., p. 215]. From what I’ve observed, people who are very sensitive to bitterness find it unpleasant, and therefore they don’t tend to drink high-IBU beers. Also, the perception of bitterness changes with each sip. Therefore, I wouldn’t worry much about minor IBU differences; getting somewhere in the ballpark is probably just fine.

The IBU scales linearly with the concentrations of IAAs and ABCs. Bitterness, like most perceptual phenomena, does not increase linearly with the strength of the stimulus (as noted by Fechner’s law). Therefore, there is a divergence from the linear relationship between IBU values and the perception of bitterness, starting at about 60 IBUs [Hahn, p. 50]. However, as noted earlier, there is a strong correlation between IBUs and perceived bitterness, even at high IBUs. Hahn has developed a quadratic equation to map between IBUs and perceived bitterness, accounting for this non-linearity [Hahn, p. 50]. The SMPH calculator includes Hahn’s perceived bitterness value (or “bitterness intensity”) as an additional output.

Oxidized alpha acids are perceived as being about 34% less bitter than isomerized alpha acids [Algazzali, p. 45]. They absorb about 8.5% less infrared light than IAAs when measuring the IBU [Maye et al., p. 25, Figure 7], and so their perceptual bitterness is about 28% less than their measured contribution to the IBU (0.66/0.915 = 0.72). This is enough of a difference that if a beer containing only oxidized alpha acids (no IAAs) has 40 measured IBUs, it might be perceived as having the bitterness of a beer with only 29 IBUs.  This difference of 11 IBUs is above the perceptual threshold of 5 IBUs [Daniels, p. 76].

If the concentration of residual sugars in a beer is low and the IBU is large, the resulting beer may be perceived as overly bitter. Likewise, if there are a lot of residual sugars and a low IBU, the beer may be considered too sweet. Hahn’s perceptual study did not control for residual sugars, and yet panelists were able to fairly consistently judge a beer’s bitterness. The perception of bitterness and sweetness are different, but we prefer some relationship between them in our beers. The ratio of IBU to original-gravity points can be a useful (if imprecise) way to estimate this bitter/sweet balance and design a pleasing beer. Personally, I find that an IBU/OG ratio of about 0.5 creates a “balanced” beer a bit on the sweeter side, and an IBU/OG ratio of about 1.0 creates a pleasantly bitter (e.g. West-Coast) IPA.

One of the advantages of the Tinseth, Rager, and Garetz models is that no computer is needed to estimate IBUs. You just need to look up some values in tables and do basic math. These models are also easy to program, which has contributed to their popularity in brewing software. Unfortunately, the SMPH calculator is quite complex, using thousands of lines of code to compute concentrations and loss factors. This calculator is, however, available online to anyone who wants to use it.

Summary
An IBU value is determined by measurement of the amount of infrared light absorbed by (acidified) beer. The IBU deliberately includes the effects of both isomerized alpha acids and auxiliary bittering compounds. Even at higher IBUs, there is a strong correlation between IBUs and the perception of bitterness. IBU prediction usually doesn’t need to be very precise, because many people aren’t really all that good at detecting minor (or sometimes even moderate) differences in bitterness.

The SMPH model is a new method for estimating IBUs, which may be useful when trying to predict a beer’s bitterness before brewing. A key difference between the SMPH model and other IBU models is that it accounts separately for the contribution of IAAs and ABCs. Predicting IBUs is a bit of a “black art”, because there are so many variables and there is so much variability. The only way to really know the IBU level of a beer is to have it professionally tested, which is something I highly recommend.

Acknowledgments
I’d like to give a big shout-out to Dana Garves at Oregon BrewLab for the IBU measurements (as well as protein, polyphenol, and other measurements) used in developing the SMPH model.  I can always rely on the accuracy of the measured values and Dana’s cheerfulness. Scott Bruslind at Analysis Laboratory was also hugely supportive, helpful, and encouraging with my initial experiments. Zach Lilla at AAR Lab has been a friendly and reliable source for measuring alpha and beta acids (and the hop storage index) in my hops. I’d also like to thank Glenn Tinseth and Randy Mosher for prompt and encouraging answers to my out-of-the-blue questions. I greatly appreciate the spirit of cooperation and support that is a critical part of the homebrewing culture.

The SMPH model would not have been possible without the excellent research and publications by Tom Shellhammer (and his graduate students) at Oregon State University, Mark Malowicki (in particular), and Val Peacock at Hop Solutions, Inc.  While the model would not have been possible without their previous work, they had no input on its development, and so the name “SMPH” is simply a sequence of four letters, not an acronym.

References

Advertisement

Four Pilot Studies for Maximizing Hop Flavor with Late-Hop Additions

Abstract
The purpose of the experiments described here was to estimate at what point in the boil, and at what temperature, hops should be added in order to maximize hop flavor.  The first two perceptual tests were conducted using beers with the same amount of hops added at different times before flameout (from 1 to 20 minutes).  The third test was conducted with the same amount of hops added at 10 minutes before flameout and the kettle covered or not covered.  The fourth test was conducted with hops added at 10 minutes before flameout to boiling wort or to wort held at 170°F (77°C) .  The bitterness of the beers within each perceptual test was kept constant by adjusting the amount of a 40- or 45-minute hop addition.  These experiments were pilot studies due to the small number of test comparisons, the use of a single test subject, and the use of a single variety of hops.  The results indicate that hop flavor may be most pronounced with a 1-minute steep time, that evaporation has a gradual effect on hop flavor (with 10 minutes probably corresponding to a just-noticeable difference), and that the difference between a 1-minute and 20-minute steep time with an uncovered kettle was the most easily perceived of the conditions tested.  The 10-minute hop stand at 170°F (77°C) showed no perceptual difference from a 10-minute boil.  The results suggest that a “best practice” for maximizing hop flavor may be to add the hops very close to flameout, but that other late-hopping techniques may produce results that are perceptually very similar.

1. Introduction
The purpose of the experiments described here was to estimate at what point in the boil, and at what temperature, hops should be added for maximum hop flavor.  The term “hop flavor” can mean different things to different people.  For example, George Fix says that it has been traditionally (and not quite correctly) believed that the hop resins (which are responsible for bitterness) contribute to hop flavor, while the hop oils (including flavor compounds) contribute to hop aroma [Fix and Fix, p. 33 (emphasis mine)].  In this case, because the resins are responsible for bitterness, the term “hop flavor” is associated with the taste of bitterness.  Somewhat more recently, it has been recognized that hop oils contribute to hop “flavor and aroma” [Oliver, p. 539] and that “late-hopping [is] a well-accepted technique for adding hop flavor and aroma” [Oliver, p. 539], and so “hop flavor” can refer not to a bitter taste, but to a distinct non-bitter flavor.  Mark Garetz uses the term “character” to define this non-bitter flavor [Garetz, p. 14].  In this post, I use the term “flavor” for the non-bitter hop flavor that comes from the hop oils, with typical descriptions such as “floral,” “citrus,” “spicy,” “grapefruit,” or “earthy.”  These oils are also responsible for hop aroma [Oliver, p. 539], and so the terms “flavor” and “aroma” are often used together to describe their sensory impact.  I will use the term “flavor” with the understanding that flavor and aroma are intertwined.

It is usually said that hops should be added earlier in the boil for bitterness and later in the boil for flavor and/or aroma [e.g. Fix and Fix, p. 33; Garetz, pp. 10-11; Noonan, p. 160; Oliver, p. 539]. Therefore, the experiments in this blog post focus on late-hop additions ranging from 1 to 20 minutes before flameout and forced cooling.  (The distinction between “early” and “late” hopping is at around 30 minutes before flameout [Oliver, p. 539].)

While the belief in late hopping for flavor is nearly universal, it is difficult to find in the literature a “best” time for maximizing flavor or a quantified relationship between hop steep time and flavor.   Greg Noonan says that “flavoring hops are commonly added ten or fifteen minutes before the end of the boil for lager beer” [Noonan, p. 159].  Charlie Papazian is the only source I know of who provides a graph of the relationship between steep time and hop flavor, with a peak at 10 minutes before flameout (and a separate peak at 0 minutes for aroma) [Papazian, p. 68], but it’s unclear what set of data was used to produce this graph.  It is possible that chemical reactions between boiling wort and hop oils require some amount of time to produce the most hop flavor in finished beer.  Because flavor and aroma are intertwined, and the oils responsible for hop aroma are lost with evaporating steam [e.g. Lewis and Young, p. 271], it’s also possible that  peak hop flavor comes from flameout additions.  The use of hop stands, with hops steeped at below-boiling temperatures, are common in hop-forward ales and might also contribute to increased hop flavor.

Attempting to answer the question of when to add hops for maximum flavor presents two logistical challenges.  The first challenge is that the bitterness of beer increases with hop steep time and temperature, and so simply adding the same amount of hops at different times or temperatures will change the bitterness level in addition to any flavor changes.  This topic is discussed more in Section 2.  The second challenge is how to measure hop flavor in order to know when it has been maximized.  The perceptual-testing approach used here is discussed in more detail in Section 3.

I’ve created a separate web page as an interactive tutorial for the mathematics behind perceptual difference testing, including significance testing, the power of a test, likelihood ratios, estimating the effect size (d’), and confidence intervals.  These different analysis methods can be used to obtain a detailed interpretation of the results, which can be especially useful when the number of samples per trial is small and/or the statistical power of the test is low.

The perceptual experiments described below used only a single test subject and a single hop variety (Amarillo).  In addition, the number of test samples used in these experiments was too small to reliably detect minor perceptual differences. These experiments are therefore pilot studies; results are tentative and these results may or may not be supported by future studies.  Having tentative results is at least a first step toward having more conclusive results.

2. Controlling for Bitterness
In order to control the bitterness level of the beers in these experiments, I used up to two hop additions in each condition.  One addition was the same weight of hops added at different times or temperatures before flameout.  Another addition (if used) was always made at 40 or 45 minutes before flameout (40 minutes for the first two experiments; 45 minutes for the second two), and the weight of this other addition was varied in order to target the same IBU value across all conditions within a test.  Because additions at 40 or 45 minutes are considered to be primarily for bittering and not for flavor, the goal was to change the flavor with the timing of the late-hop addition but to keep total bitterness of each condition the same with the smaller but earlier addition.

To predict IBU values for each condition, I used the technique described in Estimating Isomerized Alpha Acids and nonIAA from Multiple IBU Measurements.  This technique is used, with Mark Malowicki’s model of alpha-acid isomerization [Malowicki], to estimate two parameters for modeling IBUs: scalingIAA and scalingnonIAAhops.  The scalingIAA parameter indicates how much of the isomerized alpha acids (IAA) are lost during the boil and fermentation, and the scalingnonIAAhops parameter indicates (a) what percent of the weight of the hops becomes auxiliary bittering compounds during the boil and (b) to what degree these compounds are lost during the boil and fermentation.  I obtained initial estimates of these two parameters from a preliminary study.   I used these values, along with wort volume, weight of the hops, AA rating, pH, and original gravity to predict IBUs.  The preliminary study and all experiments described here used hops from the same one-pound (0.45 kg) bag, to keep the alpha-acid (AA) rating and alpha-acid decay factor [e.g. Garetz, pp. 103-118] as equal as possible across conditions.

For the late-hop addition, I targeted an initial alpha-acid concentration close to the estimated alpha-acid solubility limit of about 200 ppm.  The IBU prediction technique estimates a certain IBU value from this amount of hops, wort, temperature, and steep time (ranging from 1 up to 20 minutes).  I then adjusted the weight of another hops addition, always added at 40 or 45 minutes before flameout, so that the model predicted the same total IBU value across all conditions within an experiment.  The goal was to have all of the conditions in a perceptual comparison within 5 measured IBUs of each other, as 5 IBUs has been reported to be the perceptual threshold [Daniels, p. 76].  Up to about 50 or 60 IBUs there is a strong linear relationship between IBUs and perceived bitterness [Hahn, p. 50], and so for beers in this range the IBU is a good (and linear) metric for perceived bitterness.

3. Flavor Testing Methodology
3.1 Overview
To measure hop flavor, I used the triangle test (also used at Brülosophy) in order to judge whether two conditions can be distinguished from each other [e.g. Angevaare; Society of Sensory Professionals].  In the triangle test, a test subject tastes three samples of beer where two of the samples are from the same condition and one is from a different condition.  The subject is asked which one of the three beers is different.  This test is repeated a number of times.  If the number of correct answers is above a threshold, then the two conditions can be considered perceptually different.  It is important to note that if the number of correct answers is below the threshold, nothing can be concluded from a standard significance test; standard significance testing can not accept the hypothesis that there is no perceptual difference between two conditions.  However, likelihood ratios can be used to estimate the relative strength of the evidence for whether two beers are perceptually the same or different.  We can also estimate the effect size (d’), which indicates the amount of difference between the two conditions.  A d’ of 0 indicates identical conditions, a d’ of 1.0 corresponds to a just-noticeable difference, and larger values of d’ indicate greater perceptual differences.

In this test, the beer judged as different was also rated by the subject as having either “more hop flavor” or “less hop flavor” than the others.  By comparing beers at a range of steep times, one can first determine which steep times can be distinguished from each other.  Then, for those samples that are correctly identified as different, one can look at how often one steep time is judged more flavorful than the other.

3.2 Testing Details
These experiments used a single subject or taster (this author).  This single-subject design has advantages and disadvantages.  One significant disadvantage of using a single subject is that the results from these experiments may or may not generalize to the larger population.  One significant advantage of using a single subject is that there is probably a lower threshold for detecting perceptual differences, compared with a larger group of subjects.  (Even if the one subject has a high threshold compared with the average population, the variance in the responses will be less for one subject than for many subjects due to individual threshold differences.  This variance in responses negatively affects the effect size (lowering the value of d’), making it more difficult to distinguish between conditions in a study with many subjects.)

In the first two perceptual studies, both experiments had four conditions for different hop steep times, labeled A, B, C, and D in Experiment #1 and E, F, G, and H in Experiment #2. This resulted in six comparisons between conditions (in Experiment #1, Condition A vs. B, A vs. C, A vs. D, B vs. C, B vs. D, and C vs. D).  Each comparison was tested eight times, for a total of 48 tests per experiment.  The third experiment had two conditions: (J) kettle covered or (K) uncovered during the 10-minute late-hop addition.  The fourth experiment also had two conditions: hops added to (L) boiling or (M) 170°F (77°C) wort for 10 minutes.  Each of the comparisons in the third and fourth experiments was tested 24 times.  The final two perceptual studies were conducted simultaneously, for a total of 48 tests.

A computer program was written to arrange the tests in random order with random ordering of conditions within a test.  Tests were conducted up to four times per day with at least an hour between tests (to reduce order effects), and so each experiment took about two weeks to test.  A second person poured samples for two to four tests every morning according to an instruction sheet with the randomized order of conditions.  Each test sample was 1.5 oz (44 ml), and so more than 74 oz (2.2 liters) of each condition were required for testing.  While the beers were stored close to freezing to preserve flavor, each sample of beer came up to room temperature before tasting.

The subject marked their responses (i.e. indicated the beer that was judged different, and if they thought this beer was more or less flavorful than the others) on a separate sheet.  Testing was conducted in a quiet room with as much time as needed for making a decision.  The subject did not know the correct answers until the end of the experiment.

3.3 Evaluating Results
With eight tests of a comparison and a significance level of 0.05, six tests need to be correctly identified in order to reach statistical significance and reject the null hypothesis of “no perceptual difference.”  At the same significance level, seven of the eight comparisons need to be correctly identified in order to reach statistical significance rejecting the null hypothesis of a just-noticeable difference (JND).  Unfortunately, with only eight results per trial, the power of a significance test comparing no perceptual difference against the JND is an abysmal 6%, meaning that 94% of the time that there really is a just-noticeable perceptual difference, a statistically-significant result will not be obtained.  (This is one reason why a test result that does not show significance should not be used to conclude that the conditions are perceptually equal.  These experiments were conducted with the expectation that there would be more than a just-noticeable difference in at least one comparison.)

With 24 tests of a comparison and the same significance level, 13 tests need to be correctly identified to reach statistical significance and reject the null hypothesis of “no perceptual difference”, and 16 tests need to be correctly identified in order to reject the null hypothesis of a just-noticeable difference.  The power of a test comparing no perceptual difference against the JND is still a miserable 15%, meaning that 85% of the time that there really is a just-noticeable perceptual difference, a statistically-significant result will not be obtained.

In order to obtain more information from the test results, the likelihood ratios and maximum-likelihood estimates of the effect size (d’) with a 95% confidence interval were computed, in addition to significance testing.  For those less familiar with these concepts, there is an interactive tutorial on the terminology and mathematics of perceptual testing.

4. Experiment #1: Varying Steep Times with an Uncovered Kettle
4.1 Experiment #1: Experimental Overview
In this experiment, a late-hop addition was made at 1, 5, 10, or 20 minutes.  The kettle was uncovered during the final 20 minutes of the boil, allowing volatile hop oils to evaporate.

4.2 Experiment #1: Experimental Methods
All conditions used 2.55 lbs (1.16 kg) of Briess Pilsen Dried Malt Extract with 3.37 G (12.75 liters) of 120°F (49°C) water to create 3.50 G (13.25 liters) of room-temperature wort with specific gravity 1.031.  The wort sat for about 90 minutes to let the pH stabilize, at which point the pH was adjusted with phosphoric acid to 5.30.  The wort was boiled (uncovered) for 5 minutes to reduce the foam associated with the start of the boil.  A 12-oz (0.35 liter) sample was taken for measuring specific gravity and a 40-minute timer was started.  The first addition of Amarillo hops (AA rating 8.8%) was made with the weight listed in Table 1 (using a weighted coarse-mesh bag).  The kettle was covered for the first 20 minutes of the boil to reduce evaporation, after which time the cover was removed to allow evaporation.  At each target time, the second addition of 0.850 oz (24.1 g) of the same Amarillo hops (with the steep time listed in Table 1) was added in a weighted coarse-mesh bag.  At flameout the wort was quickly cooled with an immersion chiller to 75°F (24°C) and the hops were removed.  Sterilized, room-temperature water was added to bring the volume up to about 3.0 G (11.36 liters).  The wort was stirred and then sat for about 15 minutes, covered, to settle the heavier trub.  Then, 0.813 G (3.08 liters) of wort was transferred to a sanitized fermentation vessel.  This wort was aerated for 1 minute by vigorous shaking, and 0.08 oz (2.20 g) of Safale US-05 yeast was added.  A final sample was taken from the kettle for measuring specific gravity.

The wort fermented for one week, after which time 92 oz (2.72 liters) were decanted, leaving the trub behind.  From that, a 4-oz (0.12 liter) sample was taken for IBU measurement by Oregon BrewLab.  The remainder was stored at close to freezing with minimal exposure to oxygen until the results from Oregon BrewLab confirmed that the samples were all within 5 IBUs of each other.  Except when bringing samples up to room temperature for tasting, the beers were kept at near freezing and with minimal exposure to oxygen.

The perceptual experiment was conducted as described in Section 3.2.  Conducting up to four tests per day took 17 days.  Due to the difficulty in detecting clear differences between samples, tasting of each sample was spaced out by about 30 seconds and small sips of water or a tiny amount of dry bread was taken between tastings to reset the palate.

Condition: A B C D
weight of 1st addition:
0.379 oz /
10.75 g
0.289 oz /
8.20 g
0.185 oz /
5.25 g
0 oz /
0 g
steep time of 2nd addition:
1 min. 5 min. 10 min. 20 min.
pre-boil specific gravity (SG):
1.031 1.031 1.031 1.031
pre-boil volume:
(measured, room temp.)
3.51 G /
13.30 liters
3.49 G /
13.22 liters
3.50 G /
13.25 liters
3.50 G /
13.26 liters
SG at 1st addition:
1.033 1.033 1.034 1.033
volume at 1st addition:
(estimated from SG)
3.30 G /
12.50 liters
3.27 G /
12.38 liters
3.26 G /
12.34 liters
3.29 G /
12.46 liters
post-boil SG:
(after volume correction)
1.036 1.036 1.036 1.036
post-boil volume:
(estimated from SG)
3.03 G /
11.46 liters
2.99 G /
11.32 liters
3.02 G /
11.43 liters
3.02 G /
11.44 liters
measured IBUs 23.6 24.5 23.0 22.8

Table 1.  Measured and estimated (where indicated) values for the four conditions with an uncovered kettle.

4.3 Experiment #1: Results and Analysis
The IBU levels from the four conditions were well within the perceptual threshold of 5 IBUs.  The average was 23.5 IBUs, with a standard deviation 0.66 IBUs.  The maximum difference between two conditions was 1.7 IBUs.  These results indicate that the beers were not perceptually different in terms of bitterness.

The results of the perceptual test are shown in Table 2.  The top-right corner of the table provides the number of correct responses, the p value associated with this response rate (with the value in bold font if significance was reached), the likelihood ratio for a just-noticeable difference relative to no perceptual difference, and the low, maximum-likelihood, and high estimates of d’ (using a 95% confidence interval; a d’ of 0 corresponds with no perceptual difference, and a d’ of 1 corresponds with a just-noticeable difference).  The bottom-left corner of the table shows the identity of the preferred sample for each correct response.

The expected amount of variability in the results is quite large, given only 8 samples per trial (standard deviation 1.4 samples).  Two trends in the correct-response rate are visible, however: (1) Condition A is more likely to be distinguished from the other conditions, and (2) other comparisons indicate that no perceptual difference is approximately just as likely as a just-noticeable difference.

One unusual result is that the comparison of A vs. B demonstrates a significant difference, and A vs. D also demonstrates a significant difference, but A vs. C does not demonstrate significance.  Jumping ahead a little bit in the story in order to explain these results, the experiment described in Section 6 (to test the impact of evaporation on a 10-minute steep time) has results which indicate that the true underlying trend is probably that A and B actually have the least perceptual difference, A and C probably have a not significant and just-noticeable difference, and A and D have the largest perceptual difference.  In other words, evaporation and steep time probably affects perception, but the effect is more likely to be a gradual change over a period of about 10 or 20 minutes.

For the preferences, all of the correct responses involving Condition A were associated with a preference for Condition A.  For comparison B vs. C, the preference was equally split.  For B vs. D, the single correct response favored D.  For C vs. D, four out of the five favored Condition C.  Shorter steep times appear to be somewhat preferred over longer steep times, but the only universal preference was for the shortest steep time of 1 minute.

These results (and taking into account the results from Section 6) suggest that the shortest hop steep time has the most perceived hop flavor, and that evaporation probably affects hop flavor gradually over a 10- to 20-minute period.  Based on these results, one should keep hops in the wort for the shortest time possible in order to maximize flavor.

Comparison: A: B: C: D:
A:
6 / 8 correct
p = 0.020
LR: d’=1/d’=0 = 2.98
d’ (low, ML, high) =
0.68, 2.79, 4.68
2 / 8 correct
p = 0.805
LR: d’=1/d’=0 = 0.69
d’ (low, ML, high) =
0, 0, 2.10
7 / 8 correct
p = 0.003
LR: d’=1/d’=0 = 4.28
d’ (low, ML, high) =
2.10, 3.75, 4.68
B:
more flavor:
AAAAAA
4 / 8 correct
p = 0.259
LR: d’=1/d’=0 = 1.44
d’ (low, ML, high) =
0, 1.46, 3.75
1 / 8 correct
p = 0.961
LR: d’=1/d’=0 = 0.48
d’ (low, ML, high) =
0, 0, 0.68
C:
more flavor:
AA
more flavor:
BB CC
5 / 8 correct
p = 0.088
LR: d’=1/d’=0 = 2.07
d’ (low, ML, high) =
0, 2.10, 3.75
D: more flavor:
AAAAAAA
more flavor:
D
more flavor:
CCCC D

Table 1.  Results from perceptual testing with an uncovered kettle.  The top-right corner shows analysis of the number of correct responses.  The bottom-left corner shows, for those samples correctly identified as different, which sample was considered to have more hop flavor.

5. Experiment #2: Varying Steep Times with a Covered Kettle
5.1 Experiment #2: Experimental Overview
The experiment with an uncovered kettle showed that hop flavor is probably maximized with the shortest possible steep time.  There are two likely explanations for this: (1) the hop oils degrade when they’re in boiling wort, and/or (2) the hop oils are removed from the wort through evaporation.  If the first explanation is true, then one may be able to vary the temperature of the wort in order to minimize degradation and maximize flavor.  If the second explanation is true, then one only needs to cover the kettle in order to prevent the loss of hop oils.  The experiment described here tested the second explanation by covering the kettle during the boil.  If there is no perceptual difference between any of the conditions, that would suggest that the oils are lost primarily through evaporation.  If results are similar to the experiment with the uncovered kettle, that would suggest that oils are mostly degraded in boiling wort.

5.2 Experiment #2: Experimental Methods
This experiment was conducted using the same general methods as the first experiment.  The first addition of Amarillo hops was made with the weight listed in Table 3 (using a weighted coarse-mesh bag).  The kettle was covered during the entire 40-minute steep time, except for brief stirring and to add the second hop addition.  At each target time, the second addition of 0.765 oz (21.7 g) of Amarillo hops (with the steep time listed in Table 3) was added in a weighted coarse-mesh bag.

The perceptual experiment was conducted as described in Section 3.2.  Unfortunately, a bug in the randomization yielded between 7 and 12 samples per trial, instead of always 8 samples per trial.  Conducting up to four tests per day took 16 days.  Due to the difficulty in detecting clear differences between samples, tasting of each sample was spaced out by about 30 seconds and small sips of water or a tiny amount of dry bread was taken between tastings to reset the palate.

Condition: E F G H
weight of 1st addition:
0.363 oz /
10.30 g
0.274 oz /
7.77 g
0.181 oz /
5.14 g
0.096 oz /
2.71 g
steep time of 2nd addition:
1 min. 5 min. 10 min. 15 min.
pre-boil specific gravity (SG):
1.031 1.032 1.032 1.031
pre-boil volume:
(measured, room temp.)
3.48 G /
13.18 liters
3.48 G /
13.18 liters
3.48 G /
13.19 liters
3.50 G /
13.25 liters
SG at 1st addition:
1.033 1.033 1.033 1.034
volume at 1st addition:
(estimated from SG)
3.31 G /
12.54 liters
3.33 G /
12.62 liters
3.18 G /
12.04 liters
3.28 G /
12.42 liters
post-boil SG:
1.034 1.0345 1.036 1.035
post-boil volume:
(estimated from SG)
3.18 G /
12.03 liters
3.18 G /
12.04 liters
3.01 G /
11.41 liters
3.14 G /
11.88 liters
measured IBUs 20.2 21.4 21.2 18.7

Table 3.  Measured and estimated (where indicated) values for the four conditions with a covered kettle.

5.3 Experiment #2: Results and Analysis
The IBU levels from the four conditions were well within the perceptual threshold of 5 IBUs.  The average was 20.4 IBUs with standard deviation 1.07 IBUs.  The maximum difference between two conditions was 2.7 IBUs.  These results indicate that the beers were not perceptually different in terms of bitterness.

The results of the perceptual test are shown in Table 4.  The top-right corner of the table provides the number of correct responses, the p value associated with this response rate (none of the results reached significance), the likelihood ratio for a just-noticeable difference relative to no perceptual difference, and the low, maximum-likelihood, and high estimates of d’.  The bottom-left corner of the table shows the identity of the preferred sample for each correct response.

In this experiment, condition E (the shortest steep time) does not demonstrate any significant differences against the other conditions.  Overall, the likelihood ratios show no clear trend; for example, conditions with a greater difference in steep time are not more likely to have a just-noticeable difference than conditions with a small difference in steep time.  Unlike the first experiment, all of the 95% confidence intervals include a d’ of 0, or no perceptual difference.

For the preferences, there is also no clear preference for any one steep time.  The number of correct responses is quite small in most comparisons, and the only comparison with more than four correct responses was evenly split in preference between the two conditions.

While it’s not possible to demonstrate that two conditions are perceptually the same using standard significance testing, the set of results here suggests that all conditions in this experiment have at most a just-noticeable difference and quite likely no perceptual difference. In the previous experiment, Condition A had greater perceptual differences from other conditions and was universally preferred over other conditions; those patterns were not observed in this experiment.  These results suggest that hop oils lost through evaporation are an important component of hop flavor.

Comparison: E: F: G: H:
E:
3 / 8 correct
p = 0.532
LR: d’=1/d’=0 = 1.00
d’ (low, ML, high) =
0.0, 0.68, 2.79
2 / 7 correct
p = 0.737
LR: d’=1/d’=0 = 0.79
d’ (low, ML, high) =
0, 0, 2.58
3 / 8 correct
p = 0.532
LR: d’=1/d’=0 = 1.00
d’ (low, ML, high) =
0.0, 0.68, 2.79
F:
more flavor:
EE F
1 / 9 correct
p = 0.974
LR: d’=1/d’=0 = 0.42
d’ (low, ML, high) =
0, 0, 0
4 / 8 correct
p = 0.259
LR: d’=1/d’=0 = 1.44
d’ (low, ML, high) =
0, 1.46, 3.75
G:
more flavor:
GG
more flavor:
G
7 / 12 correct
p = 0.066
LR: d’=1/d’=0 = 2.48
d’ (low, ML, high) =
0, 1.89, 3.38
H: more flavor:
E HH
more flavor:
BB DD
more flavor:
GGG HHHH

Table 4.  Results from perceptual testing with a covered kettle.  The top-right corner shows analysis of the number of correct responses.  The bottom-left corner shows, for those samples correctly identified as different, which sample was considered to have more hop flavor.

6. Experiment #3: Covered vs. Uncovered Kettle with 10-Minute Addition
6.1 Experiment #3: Experimental Overview
The first experiment demonstrated an unexpected result: a significant difference between 1 and 5 minutes (A vs. B comparison with 6 correct responses out of 8 tests), no significant difference between 1 and 10 minutes (A vs. C with 2 out of 8 correct), and a significant difference between 1 and 20 minutes (A vs. D with 7 out of 8 correct).  It is mathematically more likely that the lack of perceptual difference in the A vs. C comparison is an incorrect conclusion, which implies that hop oils quickly evaporate with steam.  However, the number of data points in this experiment was small and therefore the uncertainty is large.  A third experiment was conducted to test this hypothesis with more data.  This experiment had two conditions, J and K, both with a 10-minute late-hop addition.  The primary difference between the two conditions was that in Condition J the kettle was covered during the final 10 minutes and in Condition K the kettle was uncovered (allowing steam to escape).  If the tentative conclusion from the first experiment is correct and hop oils are quickly lost with evaporating steam, then there should be a perceptual and significant difference between Conditions J and K.  (With an estimated d’ of 2.79 in the A vs. B comparison and 3.75 in the A vs. D comparison, an estimate of d’ for a 10-minute steep time is about 3.  With 24 tests and a d’ of 3.0, the power of the test is close to 1.0.)

6.2 Experiment #3: Experimental Methods
This experiment was conducted using the same general methods as the first and second experiments.  Wort for each condition was created using 2.47 lbs (1.12 kg) of DME and 3.27 G (12.38 liters) of water, yielding 3.43 G (13.0 liters) of wort with specific gravity 1.031.  The first addition of 0.176 oz (5.0 g) of Amarillo hops (AA rating 8.8%) was made at 45 minutes before flameout (in a weighted coarse-mesh bag).  Both conditions had 0.811 oz (23.0 g) of Amarillo hops added in a weighted coarse-mesh bag at 10 minutes before flameout.  Safale S-04 yeast was used for fermentation.

For Condition J, the kettle was uncovered for the first 10 minutes after the initial hop addition, and then covered for the remaining 35 minutes of the boil (with the brief exception of adding the 10-minute hop addition).  For Condition K, the kettle was uncovered during the first 10 minutes, covered during the next 25 minutes, and uncovered during the final 10 minutes (after the second hop addition was made).

The perceptual experiment was conducted as described in Section 3.2.  Conducting 24 tests with up to four tests per day, along with the 24 tests in the fourth experiment, took 17 days.  With the expectation of less difficulty in detecting a clear difference between samples and a desire to balance memory effects with adaptation effects, tasting of each sample was spaced out by about 10 seconds and only small sips of water were taken between tastings to reset the palate.

6.3 Experiment #3: Results and Analysis
The measured IBUs were 24.7 for Condition J and 28.9 for Condition K.  The difference between these IBU levels, 4.2, is within the perceptual threshold of 5 IBUs.  These results indicate that the beers were not perceptually different in terms of bitterness.

The results of the perceptual test were that 11 out of the 24 tests were correctly identified, and of those correct responses, 3 times Condition J was preferred and 8 times Condition K was preferred.  The p value associated with this response rate is 0.14 (not significant at a threshold of 0.05), and the likelihood ratio for a just-noticeable difference relative to no perceptual difference is 2.07. The low, maximum-likelihood, and high estimates of d’ are 0.0, 1.24, and 2.32, respectively.

These results were very much unexpected, in the low estimate of d’, the lack of significance, and the general preference for the uncovered late-hop addition over the covered late-hop addition.  These results imply that in the first experiment the A vs. B comparison (1 min. vs. 5 min.) yielded an incorrect result that supported a perceptual difference, and that the A vs. C comparison (1 min. vs. 10 min.) was actually correct in not demonstrating significance.  Given the strength of the A vs. D comparison (1 min. vs. 20 min., with 7 out of 8 correct and consistent responses), it seems prudent to continue to assume that the result of that comparison was correct.

The preference for Condition K over Condition J might be due to (a) difficulty in distinguishing these two conditions (with a fairly low d’) (b) small differences in the perceptual testing methodology that may have had an unexpectedly large effect , (c) the use of a different strain of yeast, and/or (d) flavor changes over time due to the transformation of hop oils in the hot wort in addition to the loss of oils through evaporation.  The simple explanation that hop oils are simply lost through evaporation may or may not be the complete explanation.

Considering the set of results of the first three experiments, it appears that hop flavor does decrease with longer steep times, but only relatively slowly.  We can estimate the perceptual change over time (with an uncovered kettle) as a d’ of roughly 1.0 after 10 minutes (a just-noticeable difference) and a d’ of roughly 3.0 (with a maximum-likelihood estimate of 3.75) at 20 minutes.  With the preference for the shortest steep time in the first experiment not consistent with the preference for the uncovered kettle in the third experiment, it is unclear if flavor changes occur only through evaporation, through additional mechanisms, or if testing differences or statistical variation in the third experiment caused a different result.  The universal preference for the shortest steep time in the first experiment leads to the tentative conclusion that flavor is maximized with the shortest steep time.

7. Experiment #4: Boiling vs. Sub-Boiling Hop Addition
7.1 Experiment #4: Experimental Overview
A comparison of the results from the first and second experiments indicates that covering or not covering the kettle can be responsible for a noticeable change (or lack of change) in hop flavor.  The results of the third experiment suggest that the effect of covering the kettle is only a just-noticeable difference at a 10-minute steep time.  Other than volatile hop oils evaporating with steam, another likely explanation for a change in hop flavor is a transformation of hop oils in contact with boiling wort.  The fourth experiment tested the effect of wort temperature on hop flavor, comparing a 10-minute steep time at boiling (Condition L) with a 10-minute steep time at 170°F (77°C)  (Condition M).

7.2 Experiment #4: Experimental Methods
This experiment was conducted using the same general methods as the previous three experiments.  Dried malt extract was used to create 3.43 G (13.0 liters) of wort with pre-boil specific gravity 1.031.  The first addition of Amarillo hops (AA rating 8.8%) was made at 45 minutes before flameout (in a weighted coarse-mesh bag).  Condition L used 0.176 oz (5.0 g) of hops in the first addition and was identical with Condition J in Experiment #3.  Condition M used 0.388 oz (11.0 g) of  hops in the first addition.   Both conditions had 0.811 oz (23.0 g) of Amarillo hops added in a weighted coarse-mesh bag at 10 minutes before flameout, and the kettle was covered during the final 10 minutes.  In Condition L the wort was kept at boiling; in Condition M, the wort was cooled from boiling to 170°F (77°C) during the 11th minute before flameout using an immersion chiller, and the target temperature was maintained (to within a few degrees) during the final 10 minutes before flameout.  Safale S-04 yeast was used for fermentation.

The perceptual experiment was conducted as described in Section 3.2.  Conducting 24 tests with up to four tests per day, along with the 24 tests in the third experiment, took 17 days.  As in the third experiment, tasting of each sample was spaced out by about 10 seconds and only small sips of water were taken between tastings to reset the palate.

7.3 Experiment #4: Results and Analysis
The measured IBUs were 25.1 for Condition L and 29.6 for Condition M.  The difference between these IBU levels, 4.5, is within the perceptual threshold of 5 IBUs.  These results indicate that the beers were not perceptually different in terms of bitterness.

The results of the perceptual test were that 5 out of the 24 tests were correctly identified, and of those correct responses, 3 times Condition L was preferred and 2 times Condition M was preferred.  The p value associated with this response rate is 0.94 (not significant at a threshold of 0.05), and the likelihood ratio for no perceptual difference relative to a just-noticeable difference is 4.29. The low, maximum-likelihood, and high estimates of d’ are 0.0, 0.0, and 0.8, respectively.

While it is not possible to conclude that two conditions are perceptually identical using significance testing with a null hypothesis of no difference, it would be difficult to get results that more clearly indicate no perceptual difference between the two conditions.  Even random guessing would result in, on average, 8 of the 24 tests being correctly identified.  The result of 5 correct responses is not so low that one should be concerned about experimental error, but low enough that the likelihood of there being no perceptual difference is more than four times greater than there being a just-noticeable difference.  The maximum-likelihood estimate of d’ is 0, indicating no perceptual difference.  In short, there is no evidence that there is a perceptual difference between hops boiled for 10 minutes and hops kept at 170°F (77°C)  for 10 minutes.  I will abuse the mathematics a bit and conclude that a sub-boiling hop stand produces no noticeable difference in hop flavor, at least for a 10-minute steep time and these experimental conditions.

8. Conclusions
8.1 Summary of Results
The results from these experiments indicate that hop flavor is lost primarily through evaporating steam while the hops are steeped in hot wort.  After about 10 minutes of steeping there may be a just-noticeable difference in hop flavor; after about 20 minutes the difference may be more easily perceived.  Flavor appears to be lost through the evaporation of hop oils, but it is also possible that other factors also affect the flavor compounds over time.

The best-practice recommendation resulting from these experiments is to keep hops in boiling wort for as short a time as possible in order to preserve hop flavor, but a difference of 10 minutes or a decrease in wort temperature may not have a perceptible impact, especially with a covered kettle.  This recommendation might be paraphrased as: minimize the time that the hops are in hot wort, but (in the words of Charlie Papazian) relax, don’t worry, and maybe have a homebrew.

One potential concern with a covered kettle is the production of dimethyl sulfide (DMS) which can then not be removed by evaporation.  Most ales, however, “have DMS levels well below threshold” [Fix and Fix, p. 50].  Because the precursor S-methylmethionine (SMM) and DMS are reduced more at ale fermentation temperatures than at lager fermentation temperatures, “any hint of DMS in ales is likely from technical brewing errors, most notably contamination” [Fix, p. 75].  In lagers, the increase in DMS caused by a covered kettle can be counteracted with a longer (uncovered) boil time and/or faster wort cooling [Fix and Fix, pp. 50-51].  (The other option is to not worry about DMS and brew lager in the style of Rolling Rock [Bamforth, p. 18].)

8.2 Comments on Perceptual Testing
In general it was very difficult to tell the beers in these conditions apart, despite the nearly ideal testing conditions.  This difficulty was compounded (or caused) by the first taste of a beer being the most perceptually distinctive and subsequent tastes of other samples having less sensory impact.   There was therefore a balance between waiting long enough to reset the palate but not waiting so long that the specifics of the flavor were forgotten.  Taking small sips of water or eating a tiny amount of dry bread to reset the palate in between tastings seemed to help, but in most cases the differences between conditions were very subtle (or nonexistent).

My general preference for the flavor obtained from a 1-minute steep time with Amarillo hops may or may not be shared by others.  As a counterexample, my wife thinks that every IPA she has ever encountered tastes and smells disgusting.  Another hop variety might yield different results.  In short, your perceptions and preferences may be different from the results of these experiments.

9. Acknowledgment
I would like to sincerely thank Dana Garves at Oregon BrewLab for the IBU measurements in these experiments.  Oregon BrewLab has been a pleasure to work with, and I can always rely on the accuracy of the measured values.

References

  • J. Angevaare,  A New Triangle Test Calculatorhttps://onbrewing.com/a-new-triangle-test-calculator/.  Accessed Apr. 21, 2021
  • C. Bamforth.  Beer is Proof God Loves Us.  FT Press, 1st edition, 2011.
  • R. Daniels, Designing Great Beers: The Ultimate Guide to Brewing Classic Beer Styles. Brewers Publications, 2000.
  • G. Fix, Principles of Brewing Science. Brewers Publications, 2nd edition, 1999.
  • G. J. Fix and L. A. Fix, An Analysis of Brewing Techniques. Brewers Publications, 1997.
  • M. Garetz, Using Hops: The Complete Guide to Hops for the Craft Brewer. HopTech, 1st edition, 1994.
  • C. D. Hahn, A Comprehensive Evaluation of the Nonvolatile Chemistry Affecting the Sensory BItterness Intensity of Highly Hopped Beers.  Master of Science thesis (advisor: T. H. Shellhammer), Oregon State University, 2017.
  • M. G. Malowicki, Hop Bitter Acid Isomerization and Degradation Kinetics in a Model Wort-Boiling System, Master of Science thesis (advisor: T. H. Shellhammer), Oregon State University, 2005.
  • G. J. Noonan, New Brewing Lager Beer. Brewers Publications, 1996.
  • G. Oliver, The Oxford Companion to Beer, Oxford University Press, 2011.
  • C. Papazian, The Home Brewer’s Companion.  William Morrow / HarperCollins,  1st edition, 1994/2002.
  • Society of Sensory Professionals, Triangle Testhttps://www.sensorysociety.org/knowledge/sspwiki/Pages/Triangle%20Test.aspx. Accessed Apr. 21, 2021.
  • Wikipedia.  Hopshttps://en.wikipedia.org/wiki/Hops.  Accessed Apr. 21, 2021.

Predicting Wort Temperature After Flameout

Abstract
In a previous post, I described a method for estimating IBUs that are produced in hot wort after flameout.  This method relies on both (a) relative utilization as a function of temperature, described elsewhere, and (b) a function that describes the decrease in wort temperature after flameout (but before “forced cooling” with a wort chiller).  In this blog post, I describe temperature data collected under a variety of conditions and the resulting formula for predicting the temperature of wort as it naturally cools after flameout.  The data suggest that this rate of natural cooling is primarily influenced by (a) the release of steam, which is in turn influenced by the wort volume, surface area of wort exposed to air, and size of the opening in the kettle through which steam can escape, and (b) radiation of heat from the kettle.  Other factors, such as ambient temperature, are of much lesser significance.  The resulting formula, for homebrew-scale batch sizes, is T = 53.70 × exp(-b × t) + 319.55, where b = (0.0002925 × effectiveArea / volume) + .00538 and effectiveArea = (surfaceArea × openingArea)0.5.  The parameter T is temperature (in degrees Kelvin), t is time after flameout (in minutes), b is the rate constant that describes how quickly the temperature decreases, effectiveArea is the “effective” area through which steam ventilates, surfaceArea is the surface area of wort exposed to air (in square centimeters), openingArea is the area of the opening in the kettle (in square centimeters), and volume is the wort volume (in litres).

1. Motivation
The motivation for the work described here was to predict the temperature decrease of wort after flameout, in order to facilitate computation of the mIBU method of predicting IBUs for homebrew-scale batch sizes.

If one thinks about the various factors that might influence this temperature decrease, many things may come to mind:

  1. The wort volume, with larger volumes potentially cooling more slowly,
  2. The size or surface area of the kettle (which may be much larger than the wort volume), with larger kettles potentially radiating more heat than smaller kettles,
  3. The size of the opening in the kettle (with potentially slower cooling for a smaller opening that traps more heat),
  4. The ambient or room temperature (with wort potentially cooling faster if the room temperature is 10°C (50°F) as opposed to 30°C (86°F)),
  5. The relative humidity (with wort potentially cooling faster in drier conditions),
  6. The specific gravity of the wort (with higher specific gravities potentially cooling differently from water),
  7. The removal of the kettle from the heat source (with potentially slower cooling if the kettle remains on a still-hot burner),
  8. The kettle material (with materials such as aluminum potentially cooling faster than materials such as stainless steel), and
  9. Whether the kettle is insulated or not (with potentially slower cooling for an insulated kettle).

In order to investigate these possibilities, I tested these factors with either wort or (for simplicity) water, plotted the results, and determined which factors have the greatest impact on the rate of temperature change.  With this information, I then constructed a formula for predicting wort temperature after flameout as a function of time.  This function can be used directly in the mIBU method.

2. Data
I measured the decrease in temperature after boiling for 33 conditions in order to test the various factors listed above; these conditions are listed in Table 1 at the very bottom of this post.  (I did not control for ambient temperature and relative humidity separately; generally, a lower ambient temperature was correlated with a higher relative humidity.)  I measured the temperature of wort or water after flameout in 22 conditions with the kettle uncovered, and additional 11 conditions with the kettle partially or fully covered.   I used wort in 5 cases and water in 28 cases.  I used a Thermapen Mk4 for measuring temperature in all cases except condition AG, in which I used a TelTru analog thermometer with a 30 cm (12″) probe.  I took measurements at 1-minute intervals for the first 15 or 20 minutes after flameout. (Measurements were taken for only 15 minutes for three conditions: T, U, and V.)  For the conditions using water, I measured volume to the nearest 30 ml (1 ounce) using a “Legacy Pro” 4000-ml (128-oz) graduated pitcher (which looks identical to the US Plastics Corp. Accu-Pour™ PP Measuring Pitcher),  recorded the temperature of each addition, and normalized from this volume and temperature to the volume at boiling using Equation 3 in “ITS-90 Density of Water Formulation for Volumetric Standards Calibration” (Jones and Harris, Journal of Research of the National Institute of Standards and Technology, vol 97, no. 3, pp. 335-340 (1992)).  For the conditions using wort, I estimated the volume at close to boiling using a measuring stick or the difference between pre- and post-boil specific gravity.  Twelve of the more interesting conditions are plotted in Figure 1, with time on the horizontal axis and temperature (in degrees Celsius) on the vertical axis.

I will mostly use metric units throughout this blog post in order to simplify the presentation, with apologies to readers in the United States.  The final formula uses degrees Kelvin.

tempDecayExp-Fig1-rawData

Figure 1. Temperature (in degrees Celsius) as a function of time (in minutes) for twelve of the 33 conditions.  The legend for each condition specifies the volume of liquid (water or wort, in litres), the amount by which the kettle was covered (in percent; 0% = uncovered and 100% = completely covered), the size of the kettle (in litres), and any other details about the condition, such as ambient temperature or insulation.  Only one of the conditions in this plot used wort (specific gravity 1.052); the other cases here used water.

3. Parameter Estimation
3.1 Exponential Decay

It can be seen that all of the data in Figure 1 can have a good fit to a function with exponential decay.  Those conditions not plotted in Figure 1 also show a similar goodness of fit to an exponential decay function.  (For many of the cases, a straight line also seems to be a good fit, but the exponential decay function can model nearly-straight lines as well as curved lines of the type seen here.)  An exponential decay function is of the general form a × exp(-b×t) + c, where t is (in this case) time, a, b, and c are parameters that describe the shape of the function, and exp(x) indicates the constant “e” to the power of x, or 2.71828x.  In this case, the parameter b is called the rate constant, and it describes how quickly the function (or temperature) decreases.  (I like using fooplot to visualize different functions and parameter values; one can enter something like “54*exp(-0.03x)+46” on this page to see a representative exponential decay function, setting the graph lower limits to 0, the x-axis upper limit to 50, and the y-axis upper limit to 100.)  The liquid was at boiling in all cases at time 0, with an average measured temperature over all conditions of 100.1°C.  (The expected boiling point of water at my elevation (76 meters above sea level) is 99.7°C.  However, the boiling point of wort is higher than that of water, so the average boiling point over all conditions (with 5 of the 33 cases using wort) was higher than 99.7°C.  The difference of less than 0.4°C is within the  specified accuracy of my Thermapen, which is ±0.4°C.)    If t = 0, then exp(-b×t) is 1 for any value of b, and so a + c must equal 100.1.

In order to simplify the parameter estimation, I searched over conditions A through V (those conditions in which the kettle is uncovered) minimizing the root-mean-square (RMS) error to find the best value of b in each case and the values of a and c that were the best over all conditions.  (In other words, a and c were optimized to have the same value over all conditions, whereas b was optimized per condition.)  If the total RMS error over all conditions is small with constant values of a and c, then the different shapes of each curve can be described well with a single parameter, b.

The search for a and c yielded a=53.70°C and c=46.40°C with an overall RMS error of 0.31°C.   The maximum RMS error was 0.64°C for condition O.  The small RMS error over all conditions indicates that we can, in fact, describe the different rates of temperature decay of these conditions with a single parameter, the rate constant b.  The question then becomes whether we can predict b from the various factors in each condition, and if so, if certain factors are more important than others in predicting b.  (It’s also worth noting that the optimal value of c in this case is not room temperature.  Presumably, if time were measured in hours instead of minutes, the values of a and c would have turned out differently, with c at around room temperature.  Or, there are other factors involved at longer time scales that don’t fit well to a simple exponential decay function.)

3.2 Predicting the Rate Constant for Uncovered Kettles
Figure 2 plots the values of b that minimize the RMS error in each uncovered-kettle condition with a=53.70°C and c=46.40°C.  The horizontal axis is the volume of wort or water, and the vertical axis is the value of b.  A few clear patterns emerge: the data obtained from a single kettle are grouped in a curved line with negative slope (for the two cases where there are multiple data points per kettle), and these curves (representing different kettles) are separated from each other by possibly constant scaling factors.  The curved line with negative slope for the 38-litre kettle looks like a function of the form 1/x, where x in this case is volume.  This suggests that b can be approximated as a function of scaling/volume, where scaling is some (still unknown) property of the kettle and volume is the wort volume (in litres).

tempDecayExp-Fig2-rateConstVsVolume

Figure 2. Temperature-decay rate constants for all conditions with uncovered kettles, plotted as a function of wort volume.  Each group (e.g. black squares or red diamonds) is for a different kettle (and kettle diameter).

After considering various possibilities for the factor called scaling, the area of the kettle opening (πr2), which equals the surface area of wort exposed to the air, shows a good fit to this set of data.  The black “×” marks in Figure 3 plot the values of area/volume on the horizontal axis for the uncovered kettle (where area is the area of the kettle opening (or πr2, where r is the radius of the kettle) in square centimeters, and volume is the volume of liquid, in litres) and values of the rate constant b on the vertical axis.  The approximately straight line of black × marks in Figure 3 is interesting.  It implies that the rate of temperature decay, represented by the parameter b, can be predicted quite well from only the area of the kettle opening and the volume of liquid.  The value of b when the area is zero implies a rate of cooling caused by heat radiated from the kettle (with an entirely closed system), and the slope of the line implies faster cooling as more steam escapes the kettle with greater wort surface area.  In other words, if b is modeled as a straight line of the form b = slope × (area / volume) + offset, where slope is the slope of the line and offset is the value when area = 0, then offset represents the temperature decay due to heat radiated from the kettle, and slope represents the temperature decay caused by the loss of heat in the steam.  In this case, a good fit can be seen for the line b = 0.0002925 × (area / volume) + 0.00538.

tempDecayExp-Fig3-rateConstVsArea

Figure 3. Temperature-decay rate constant as a function of (kettle opening area) divided by volume, for uncovered kettles. In this case, the kettle opening area equals the area of wort exposed to air.

3.3 The Rate Constant for (Partially) Covered Kettles
I then plotted the values of b for those cases in which the kettle is partially or completely covered, as shown in Figure 4 (with much lower limits on the X and Y axes of this graph).  The cases in which the kettle is completely covered cluster somewhat around the predicted value of b when the area is zero.  Larger kettles and volumes have smaller values of b, implying less radiated heat loss from larger kettles and/or volumes.  Conditions Z and AG are nearly identical except for the size of the kettle; Z used 15.4 litres of water in a covered 18.9 litre kettle, and AG used 15.6 litres of water in a covered 37.9 litre kettle.  The temperature after 20 minutes was very close in both conditions, and the estimated value of b is nearly the same in both cases (0.00371 vs 0.00373).  Therefore, it seems that the size of the kettle has very little impact on the rate of temperature decay through radiated heat, but the volume of liquid does have an impact on radiated heat.

Again looking at Figure 4, the conditions in which the kettle is only partially covered deviate from the predicted line, regardless of whether area (the horizontal axis) is (a) the exposed wort surface area (blue circles) or (b) the kettle opening area (green squares).  The predicted line lies somewhere between these two extremes.  This suggests that if the kettle is partially covered, the amount of steam produced is (still) roughly proportional to the surface area of the wort exposed to air (i.e. the area of the fully-open kettle), but that this steam is not able to escape quite as quickly, leaving more heat trapped in the kettle.  (For an uncovered kettle, the area of the opening in the kettle and the surface area of wort exposed to air are the same.)

One possibility is that the rate of heat loss is proportional to the geometric average of the wort surface area and the opening area.  We can call this the “effective area,” i.e. effectiveArea = (surfaceArea × openingArea)0.5, where surfaceArea is the wort surface area, openingArea is the area of the kettle opening, and (x)0.5 indicates the square root of x.  In this case, when the area of the opening is zero (for a covered kettle), the effective area is also zero.  When the area of the opening equals the surface area of the wort, the effective area is the same as the surface area of the wort.  When we plot effectiveArea / volume on the horizontal axis and b on the vertical axis in Figure 5, we observe that the data from the partially-covered conditions are much closer to the straight line, allowing us to predict temperature decay fairly well with a small number of parameters.

tempDecayExp-Fig4-rateConstCoveredKettle

Figure 4. Rate constants for uncovered kettles (black “×” marks), fully-covered kettles with area = 0 (red triangles), partially-covered kettles with area = area of kettle opening (green squares), and (the same) partially-covered kettles with area = wort surface area (blue circles). The line with the best fit to uncovered-kettle data is also plotted.

tempDecayExp-Fig5-rateConstCoveredKettleEffectiveArea

Figure 5. Rate constants for uncovered kettles (black “×” marks), partially covered kettles (dark red diamonds), and fully covered kettles (light red triangles), plotted as a function of “effective area”. Effective area is the geometric mean of the exposed wort surface area and kettle opening area.

4. Model Accuracy
4.1 Looking at Factors Potentially Influencing Temperature
In the model we have developed, we can predict temperature after flameout using three parameters: wort volume, kettle diameter (to compute exposed wort surface area), and kettle opening diameter (to compute the area of the opening).  Other factors, such as ambient temperature and specific gravity, have a fairly small deviation from the predicted line, indicating that these factors have only a minor impact on the decrease in temperature.  For example, in Figures 3, 4, and 5 there are two rate constants that have the same area/volume value of 46.9 cm2.  (This is most easily seen in the two “×” marks at area/volume=46.9 on the right-hand side of Figure 4.)  The one just below the predicted line, with a value of 0.0190, was from Condition C with water from an uncovered 19-litre kettle and an ambient temperature of 12°C (53°F).   This case has a predicted temperature of 86.9°C after 15 minutes, which is very close to the measured temperature of 86.8°C.  The one even lower than the predicted line, with a value of 0.0176, was obtained under the same conditions except with an ambient temperature of 33°C (91°F), Condition J.  This case has the same predicted temperature of 86.9°C, but a measured temperature of 87.8°C after 15 minutes.  From this, we can conclude that ambient temperature does have an effect on the rate of temperature decrease, with warmer ambient temperatures yielding a slower decrease in temperature.  However, this effect is minor, with a large difference in ambient temperatures (21°C (38°F)) yielding a small difference of 1.0°C (1.8°F) after 15 minutes.  (I also learned that brewing in very hot climates would not be very pleasant for me.  I respect anyone with the dedication to brew when the temperature is above 30°C (86°F).)

Over all conditions, the average absolute difference in temperature at 15 minutes between measured and modeled temperatures is 0.8°C.  The largest difference at 15 minutes, 1.9°C, is for Condition AE, which has a large volume in an entirely closed kettle.

4.2 Factors Potentially Influencing Temperature
In general, we can look at the difference in measured temperatures at 15 minutes between two conditions when only one factor is different between the conditions.  A factor with a larger difference can be considered more important in influencing temperature decay than a factor with a smaller difference.  (The value of 15 minutes is somewhat arbitrary but I think not unreasonable.  It is the largest time point for which I have measured data in all conditions.)  One issue with this metric is that smaller volumes will generally have greater temperature differences over time than larger volumes.  In addition, the diameter of the kettle and area of the kettle opening will have an impact on the magnitude of the measured temperatures.  In order to normalize for these factors, one could look at the difference divided by the temperature of one of the conditions, but this relative error is less intuitive.  I’m not aware of an intuitive error metric that addresses the dependence on volume and kettle characteristics, so I’ll simply report the measurement difference as well as the volume.  Unless otherwise indicated, the kettle diameter and area of the kettle opening are the same within each comparison.

As discussed in the previous section, a high ambient temperature can have a measured temperature difference after 15 minutes of 1.0°C at 15.6 litres.  Removing the kettle from the hot metal burner yields a measured difference of -0.8°C at 25 litres.  A stainless steel kettle (instead of an aluminum kettle) yields a measured difference of 1.0°C at 7.8 litres.  The enamel kettle yields a measured difference of -3.3°C at 11.7 litres, but the two kettles have different exposed surface areas (75.2 cm2 for enamel, 60.6 cm2 for aluminum), and so this difference may appear larger than it is, even after accounting for volume.  (The predicted difference in temperature for the enamel kettle is -1.4°C).  An insulated kettle yields a measured difference of 1.0°C at 15.6 litres.  (The insulation in this case was a combination of closed-cell foam insulation and mylar wrap, around and over the aluminum kettle.)   As noted earlier for covered kettles, larger volumes have slower temperature decay than smaller volumes, with a measured difference of 0.7°C for 31.2 litres compared with 15.6 litres.

To compare the decrease in temperature of wort with water, we can compare (a) the temperature of the 24.6-litre wort case (R) with the 27.3-litre water case (M), with a difference in measured temperatures of -0.3°C; (b) the temperature of the same 24.6-litre wort case (R) with the 23.4-litre water case (D), with a difference of 1.1°C; (c) the temperature of the 29.1-litre wort case (T) with the 31.2-litre water case (E), with a difference of 0.2°C; and (d) the temperature of the 28.9-litre wort case (U) with the 31.2-litre water case (E), with a difference of 0.8°C.  In short, Condition R has no real difference with the temperature of water, while conditions T and U have a small positive difference that is contrary to the expected small negative difference based on different volumes.  The difference between measured and predicted temperatures for conditions R, T, U, and V are 1.5°C, 0.6°C, 1.3°C, and 0.0°C, respectively.  Overall there does not seem to be a large difference between the temperature decrease of wort and of water, although the model may predict slightly lower temperatures than are observed.

4.3 Incorporating Additional Factors into the Model
Because the factors described above seem to have at least some impact on the rate of temperature decrease, should we be modeling them in the temperature-decrease formula?  The answer to that question depends on our purpose (predicting IBUs) and our tolerance for error.  If we have a scenario with fairly typical homebrewing conditions, we can look at how a temperature difference of 3°C after 15 minutes impacts IBUs predicted with the mIBU method.  A difference of 3°C at 15 minutes is somewhat arbitrary, and is 1.5 times larger than the largest observed difference in these 33 conditions, but might be observed with a combination of factors different from those factors used to develop the formula.  Given a post-boil volume of 19.9 litres (5.25 gallons) in an uncovered kettle with diameter 36.8 cm (14.5 inches), a single addition of 28.35 g (1.0 oz) of 10% AA hops at flameout, and a 15-minute hop stand, the predicted temperature after 15 minutes with the formula developed in this blog post is 85.55°C (186.0°F), and we predict 9.92 IBUs using the mIBU method.  If we change the rate constant from 0.02106 to 0.01614 so that the temperature after 15 minutes is 88.55°C (191.4°F), or 3°C warmer, we then predict 10.87 IBUs, or a difference of 0.95 IBUs. If the temperature decreases by 3°C using a rate constant of 0.02638, we predict 9.06 IBUs, or a difference of -0.86 IBUs.  If, instead of a 15-minute whirlpool, we use the same rate constants with a 45-minute whirlpool, we predict 12.48 IBUs when the temperature is 85.55°C after 15 minutes, 14.31 IBUs when the temperature is 88.55°C after 15 minutes, and 10.97 IBUs when the temperature is 83.55°C after 15 minutes, or IBU differences of 1.83 and -1.51 in a 45-minute whirlpool.

Can we tolerate a difference of about 1 to 2 IBUs if our temperature decay model is off by 3°C after 15 minutes?   The short answer to that question is “yes,” for two reasons.  First, it has been reported that people can’t detect a difference less than 5 IBUs (e.g. J. Palmer, How to Brew, p. 56).  So a prediction error of even 2 IBUs is well below our ability to detect with our taste buds.  Second, there are a wide variety of other factors that make IBU prediction so inexact that getting anywhere close to a measured IBU value is cause for celebration.  For example, things that are not accounted for in the Tinseth or basic mIBU formulas are: (a) the inherent variability (up to 15 to 20%) in alpha acid levels within a single bale of hops (M. Verzele, and D. De Keukeleire, Chemistry and Analysis of Hop and Beer Bitter Acids, p. 331), (b) the hopping rate, which can have a significant impact on IBUs, (c) wort pH, which can affect IBU losses, (d) wort clarity, (e) krausen deposits or loss, (f) age of the beer, (g) the effect of pellets instead of hop cones, and (h) the age and storage conditions of the hops.  Any of these factors alone can yield a difference greater than 2 IBUs, and in combination the net effect is a high degree of uncertainty in predicted IBU values.

In summary, factors such as ambient temperature, kettle size, insulation, kettle material, etc. do have an impact on the rate of temperature decay.  However, for our purposes, it does not seem necessary to extend the formula to specifically account for these factors.

5. Summary and Conclusion
The final formula for predicting wort temperature as a function of time after flameout, for homebrew-scale batch sizes, is

T = 53.7 × exp(-b × t) + 319.55
b = (0.0002925 × effectiveArea / volume) + 0.00538
effectiveArea = (surfaceArea × openingArea)0.5

where T is temperature (in degrees Kelvin), t is time after flameout (in minutes), b is the rate constant that describes how quickly the temperature decays, effectiveArea is the “effective” area through which steam ventilates, surfaceArea is the surface area of wort exposed to air (in square centimeters), openingArea is the area of the opening in the kettle (in square centimeters), and volume is the wort volume (in litres).  The area values can be easily determined from the diameters of the kettle and the kettle opening.

It is not clear how well this formula will scale up to commercial-size batches.  If anyone who has such a system can provide me with the necessary parameter values and temperature measurements, I’ll be happy to evaluate the formulas and adjust as necessary.  To contact me for this or any other reason, send an e-mail to the name associated with this blog (no spaces or other punctuation) at yahοο.

Appendix: Specifics of Each Condition
This section lists some details about each condition in table form.  The volume is either of water or wort; if specific gravity is not specified, water was used.  For partially-covered kettles, I constructed cardboard and aluminum-foil “lids” that had openings of 25%, 50%, or 75% of the area of the open kettle.  The kettle size is noted using approximate capacity, in litres.  Unless otherwise noted, the ambient temperature was approximately 13°C (55°F), and the kettle material was aluminum.

 

Condition volume (litres)
kettle size (litres)
wort surface area (cm2)
percent of kettle covered other notes
measured temp. at 15 minutes (°C)
predicted temp. at 15 minutes (°C)
A
7.8 18.9 710.33 0% 78.4 79.6
B 15.6 37.9 1083.80 0% 81.8 82.9
C 15.6 18.9 710.33 0% 86.8 86.9
D 23.4 37.9 1083.80 0% 87.6 86.8
E 31.2 37.9 1083.80 0% 88.9 88.9
F 7.8 11.4 457.30 0% stainless steel kettle 84.2 84.7
G 7.8 11.4 500.39 0% 83.2 83.7
H 7.8 37.9 1083.80 0% 73.0 73.3
I 15.6 18.9 710.33 0% ambient temp. 27°C 87.8 86.9
J 15.6 18.9 710.33 0% ambient temp. 33°C 87.8 86.9
K 3.9 37.9 1083.80 0% 61.7 61.0
L 11.7 37.9 1083.80 0% 78.6 79.4
M 27.3 37.9 1083.80 0% 89.0 88.0
N 11.7 18.9 710.33 0% 83.9 84.4
O 11.7 18.9 881.21 0% enamel kettle 80.6 82.0
P 19.5 37.9 1083.80 0% 85.2 85.2
Q 15.6 37.9 1083.80 0% insulated kettle 82.8 82.9
R 24.6 37.9 1083.80 0%  wort (SG=1.052) 88.7 87.2
S 25.4 37.9 1083.80 0%  wort (SG=1.052), kettle removed from heat source 87.8 87.5
T 29.1 37.9 1083.80 0%  wort (SG=1.042), loose cones 89.1 88.5
U 28.9 37.9 1083.80 0%  wort (SG=1.042), pellets 89.7 88.4
V 4.6 18.9 710.33 0% wort (SG=1.065), mIBU Exp.#3 71.6 71.6
W 15.4 18.9 710.33 25% 87.3 88.0
X 15.4 18.9 710.33 50% 88.2 89.3
Y 15.5 18.9 710.33 75% 89.7 91.2
Z 15.4 18.9 710.33 100% 97.2 95.9
AA 23.4 37.9 1083.80 50% 87.8 89.3
AB 23.4 37.9 1083.80 75% 90.9 91.1
AC 31.2 37.9 1083.80 50% 90.8 90.9
AD 31.2 37.9 1083.80 75% 92.4 92.3
AE 31.2 37.9 1083.80 100% 97.8 95.9
AF 7.8 11.4 500.39 100% 94.8 95.9
AG 15.6 37.9 1083.80 100% 97.2 95.9

Table 1. Details about each condition in this blog post.

Late Hop Experiment #1 (a.k.a. Hop-Stand Experiment #3)

Abstract
In my quest for lots of hop flavor, I previously found that a hop stand did not provide the increase in flavor I expected.  The current experiment looks at several aspects of the brewing process that might provide an increase hop flavor: covering the pot during the last minutes of the boil, varying the time of late-hop additions, and hop stands with a somewhat different technique than I used previously.

I found that covering the pot to prevent oils escaping with the steam may provide some improvement, but this result was not definitive.  A late-hop addition at flameout (followed by 10 minutes of natural cooling with the lid on) contributed much more hop flavor than additions at 5 or 10 minutes.  Holding the wort (and hops) at 170°F (77°C) for an additional 45 minutes may have contributed something, but not an increase in hop flavor.  It seems that hop flavor is lost with extended contact time with boiling wort, and not increased with below-boiling temperatures.

I’ve also created a summary blog post that describes the techniques I’ve found to be useful at maximizing hop flavor and aroma.

Background
Flameout Hops Additions and Hop Stands

When I first started brewing, I would immediately cool the wort when the 60-minute boil time was up.  That was fine, until I started reading about hops additions at zero minutes/flameout.  Why add a whole bunch of hops, only to immediately cool down the wort and remove them?  I came across a discussion on BeerSmith about adding hops at flameout and then letting the wort sit for a while.  There’s another interesting discussion at BeerAdvocate about how long to let the wort sit before cooling.  There’s also an article in BYO on hop stands, in which it’s explained that “pro brewers [give] their flameout hops extended contact time with the wort”.  Last but not least, there’s an interesting discussion on ProBrewer about how long professional brewers whirlpool their hops after flameout.  In short, the wort is often not cooled immediately, which creates a hop stand (whether or not hops are added at flameout, due to any hops already in the wort that have not yet reached maximum utilization).  This extended contact gives flameout hops time to contribute something to beer flavor (and bitterness) at below-boiling temperatures.  In my previous hop-stand experiments, I added post-flameout hops only after the target temperature (e.g. 170°F (77°C)) had been reached, and steeped for a relatively long period of time (60 minutes).  Since those experiments didn’t demonstrate an increase in hop flavor, maybe higher temperatures or shorter steep times are critical for hop flavor.  In the current experiment, I let all batches sit for 10 minutes after flameout, with the lid on.  (I chose 10 minutes pretty much by chance; now I think that shorter times are better.)

Balancing Bitterness Across Conditions
The goal of the current experiment was to look at hop flavor, but I wanted to examine hop flavor independently of bitterness.  In other words, I wanted to vary the timing of late-hop additions and keep the wort at high temperatures after flameout, but hold the bitterness level of all conditions relatively constant.   If one uses a standard formula for computing IBUs (e.g. Tinseth’s formula), hops additions at 0 minutes contribute no bitterness to the beer.  This is true if one immediately force-cools the wort at flameout, but since I allowed the wort in this experiment to sit for 10 minutes after flameout at high temperatures, there was bitterness that was not accounted for by this formula.  In order to keep the conditions in this experiment at roughly the same bitterness level, I developed a modified version of Tinseth’s IBU formula that predicts bitterness contributions after flameout.  I used this formula to vary the timing and amount of hops added to each condition, in an attempt to equalize bitterness levels. There was a bug in my code at the time I used it for this experiment, and I didn’t have the finished beer tested for IBUs, so despite my good intentions I have no idea how well bitterness was kept constant.  (I’ve since corrected those errors in a different set of experiments.)

Introduction
This experiment looked at three techniques that may contribute to hop flavor: (1) covering the pot during the last minutes of the boil, (2) varying the time of late-hops additions, and (3) a 45-minute hop stand held at 170°F (77°C), with hops added at flameout instead of when the target temperature is reached.  In all cases, the wort was left to stand for 10 minutes after flameout, which may be a critical detail.

(1) Covering the Pot
It’s well known that volatile oils from the hops escape with the steam during the boil (e.g. Daniels, Designing Great Beers, p. 101; Fix and Fix, An Analysis of Brewing Techniques, p. 33; Lewis and Young, Brewing, 2nd ed., p. 271; Papazian, The Homebrewer’s Companion, p. 63). However, an uncovered boil is essential to drive off the precursors of DMS (e.g. Palmer, How to Brew, p. 82; Fix and Fix, An Analysis of Brewing Techniques, p. 50).  To minimize the risk of DMS, I usually leave my pot uncovered during the entire brewing process, in accordance with Papazian’s instructions to “never cover a boiling wort with a lid”. (Papazian, p. 138).  Most ales, however, “have DMS levels well below threshold” (Fix and Fix, p. 50).  Because SMM and DMS are reduced more at ale fermentation temperatures than at lager fermentation temperatures, “any hint of DMS in ales is likely from technical brewing errors, most notably contamination” (Fix, p. 75).  This then brings up the question:  will covering the pot during the last additions of hops yield more (good) hop flavor in the (hop-forward) beer than (bad) DMS?  There’s only one way to find out:  brew one condition with the pot uncovered during the entire boil, then brew a nearly identical batch with the pot covered after the last addition of hops.

(2) Varying the Time of Late Hops Additions
Late hop additions are also well known to provide more hop flavor than early additions.  I’ve seen many general statements to the effect of “Thirty minutes is a traditional cut-off point for flavor hops” (Daniels, p. 101) or “Flavor hops additions are considered to be in the last 10 to 20 minutes of the boil” (Strong, p. 65).  Papazian provides an informative graph, showing an increase in flavor starting at 0 minutes, peaking at 10 minutes, and decreasing to zero at 45 minutes (Papazian, The Homebrewer’s Companion, p. 68).  This graph is a “general guide,” though, and I wanted to examine the effect of hops additions in the final 10 minutes, and include a 10-minute stand after flameout.  Therefore, the current experiment looks at the effect on flavor when adding hops at 10 minutes, 5 minutes, and 0 minutes before flameout.  In all cases, I let the wort cool for 10 minutes after flameout.  This post-flameout wait provided at least a brief hop stand for all batches, but it means that my results will be different from someone who does late hopping and then cools their wort at flameout.

(3) Hop Stand
In my previous attempts at a hop stand, I found that the hops added during the stand contributed very little hop flavor, and that the resulting fuller-bodied beer was most likely the result of non-enzymatic browning of the wort.  Not what I was looking for.  But I added the hops only after the wort had reached the target temperature.  Some (or most?  nearly all?) people conduct a hop stand by adding the hops at flameout, bringing the temperature down (either naturally or by forced cooling), and then (possibly) holding the wort at a target temperature.  In the current experiment, there is an additional condition in which I added the hops at flameout, let the wort cool naturally (while covered) for 10 minutes, force-cooled the wort to the hop-stand target temperature, and then held that temperature for 45 minutes. This allows a direct comparison of how effective a hop stand is for longer time periods at lower temperatures.

Methods
This experiment used five conditions:
(A) The baseline: a beer with a late-hop addition at 10 minutes before flameout and no covering of the pot.  This was a pretty generic beer.  The “bittering” hop addition of 0.25 oz in 1.3 G of wort (7 g in 4.9 liters) was made at around the ~20 minute mark (instead of the normal 60 minute mark), under the assumption that at 20 minutes and more, the contribution to hop flavor is minimal.
(B) A late-hop addition at 10 minutes, with the pot covered during the final 10 minutes.  The bittering hop addition of 0.25 oz (7 g) was also around the 20-minute mark.
(C) A late-hop addition at 5 minutes, with the pot covered during the final 5 minutes.  The bittering hop addition was slightly more hops (0.30 oz or 8.5 g) at around the 30-minute mark, to attempt to keep the bitterness level about the same as in other conditions.
(D) A late-hop addition at flameout (0 minutes).  The bittering hop addition was even more hops (0.35 oz or 10 g) at the 45-minute mark, to try to keep the bitterness level about the same as in other conditions.
(E) A late-hop addition at flameout.  The hops additions (amount and timing) were the same as in Condition D.  This condition was different from Condition D in that it was followed by holding the wort at 170°F (77°C) for 45 minutes after the 10-minute natural cooling period.

For all conditions, the wort was left to cool for 10 minutes after flameout with the pot covered. The target OG of all conditions was 1.060.  More details are provided below in Table 1.

Comparisons
Condition A can be compared with B, to determine if covering the pot during the last hop addition (at 10 minutes, in this case) improves hop flavor.  Conditions B, C, and D can be compared with each other to determine which late-hop time (10 minutes, 5 minutes, 0 minutes) yields the most hop flavor (given the subsequent 10-minute hop stand).  Conditions D and E can be compared to determine if a 45-minute hop stand at 170°F (77°C) contributes to increased hop flavor.

I originally intended to compare the bitterness levels across all conditions, as a test of a modification to Tinseth’s IBU formula.  However, due to a bug in my initial calculations, the bitterness level will probably be somewhat different across the batches.  I report on the perceived bitterness levels in the Results: Comparisons section, below.

Recipes
As usual in these experiments, a very simple recipe of Briess liquid malt extract, Cascade hops (8.9% AA), Citra hops (13.9% AA), and Safale US-05 yeast was used.  Rather than brewing the best beer possible, the idea was to keep things as simple and as replicable as possible.  The target volume of the wort at the end of each boil was 1.3 G (4.9 liters).  The goal was to end up with more than 1 G (3.8 liters) per condition, and to ferment only 3½ quarts (3.3 liters), as it’s better to throw wort away (including wort used in SG readings and settled trub) than to not have enough.  The 3½ quarts (3.3 liters) leaves (just) sufficient head room for fermentation.

condition
A
condition
B
condition
C
condition
D
condition
E
Extract:
2½ lbs (1.13 kg) Briess light LME 2½ lbs (1.13 kg) Briess light LME 2½ lbs (1.13 kg) Briess light LME 2½ lbs (1.13 kg) Briess light LME 2½ lbs (1.13 kg) Briess light LME
Initial Water: 1.80 G
(6.8 liters)
1.68 G
(6.3 liters)
1.80 G
(6.8 liters)
1.95 G
(7.4 liters)
2.0 G
(7.6 liters)
Boil Time: 30 min 30 min 35 min 45 min 45 min
Bittering Hops Addition: 0.25 oz (7 g) Cascade (8.9% AA) at 19 min 0.25 oz (7 g) Cascade (8.9% AA) at 21.3 min 0.30 oz (8.5 g) Cascade (8.9% AA) at 30.5 min 0.35 oz (10 g) Cascade (8.9% AA) at 45 min 0.35 oz (10 g) Cascade (8.9% AA) at 45 min
Aroma/
Flavor Hops Addition:
0.4 oz (11 g) Cascade (8.9% AA) and
0.4 oz (11 g) Citra (13.9% AA)
at 9.3 min,
not covered
0.4 oz (11 g) Cascade (8.9% AA) and
0.4 oz (11 g) Citra (13.9% AA)
at 9.3 min,
covered
0.4 oz (11 g) Cascade (8.9% AA) and
0.4 oz (11 g) Citra (13.9% AA)
at 5.0 min,
covered
0.4 oz (11 g) Cascade (8.9% AA) and
0.4 oz (11 g) Citra (13.9% AA)
at 0 min,
covered
0.4 oz (11 g) Cascade (8.9% AA) and
0.4 oz (11 g) Citra (13.9% AA)
at 0 min,
covered
Hop Stand:
no no no no 45 minutes at 170°F (77°C)
Final Target Volume:
1.3 G
(4.9 liters )
1.3 G
(4.9 liters )
1.3 G
(4.9 liters )
1.3 G
(4.9 liters )
1.3 G
(4.9 liters )
Yeast:
~3.4 g Safale US-05 in 1.6 oz water added to 3½ quarts (3.3 liters) ~3.4 g Safale US-05 in 1.6 oz water added to 3½ quarts (3.3 liters) ~3.4 g Safale US-05 in 1.6 oz water added to 3½ quarts (3.3 liters) ~3.4 g Safale US-05 in 1.6 oz water added to 3½ quarts (3.3 liters) ~3.4 g Safale US-05 in 1.6 oz water added to 3½ quarts (3.3 liters)
Priming Sugar:
0.5 oz (14 g)
corn sugar
0.5 oz (14 g)
corn sugar
0.5 oz (14 g)
corn sugar
0.5 oz (14 g)
corn sugar
0.5 oz (14 g)
corn sugar
Target OG:
1.060 1.061 1.061 1.060 1.061

Table 1. Recipes and predicted values for the five conditions.

These recipes assumed an evaporation rate of 0.90 G/hr (3.4 liter/hr) during the uncovered boil, 0.35 G/hr (1.3 liter/hr) at temperatures less than boiling (uncovered), and 0.10 G/hr (0.38 liter/hr) for a covered boil or stand.  (The value for the covered boil was a guess, and assumed some small amount of loss due to various factors.)  The amount of water, the weight of bittering hops, and the timing of all hops additions were varied to attempt to achieve about the same OG, the same post-boil volume, and the same bitterness levels.

At 10 minutes after flameout, each condition was cooled to 75°F (24°C) using a wort chiller and let sit for an additional 10 minutes.  After transferring 3½ quarts (3.3 liters) into a sterile 1 G (4 liter) container (a.k.a. milk jug), the jug was shaken vigorously for 90 seconds, the yeast was pitched, and an airlock was applied.  Fermentation and conditioning proceeded for 3 weeks at around 64°F (18°C), followed by bottling and bottle conditioning for an additional 3 weeks (also around 64°F (18°C)).  Priming used 0.50 oz (14 g) of glucose per condition to yield 2.11 volumes CO2. The yield was 8 12-oz bottles per condition.

I don’t think that the level of precision indicated in these recipes is required in order to obtain perceptually identical beers; a point or two of OG difference or a variation of 5 IBUs (Daniels, p. 76) probably won’t be perceptible.  I tried my best to obtain the target numbers indicated, however, and hoped that any measurement errors would, on average, cancel each other out.

Results
Results: (In)Ability to Follow the Recipes (a.k.a. Mistakes)
If I had been able to follow the recipes above to the letter and not had any bugs in my software, then this sub-section wouldn’t be necessary.  But nothing new ever goes completely according to plan, and so there were some unintended deviations from the recipes.  This part discusses what went differently and if I think there may be an impact on results.

(1) Evaporation Rates: Apparently, the 0.90 G/hr (3.4 liter/hr) evaporation rate that I’ve measured in the past (when making 5-gallon batches) was larger than my observed evaporation rate in this experiment.  This may have been because I used a smaller pot (which had a smaller opening), or because I’ve been so worried about too much evaporation that I applied less heat overall.  Likewise, the below-boiling evaporation rate seems to have been slightly overestimated.  Finally, the evaporation rate when the pot was covered was probably much closer to zero.  I realized something was off when Condition A was finished with the boil.  My solution for conditions B, C, D, and E was to wait an additional 5 to 10 minutes during the boil before adding any hops.  Even so, my measured OG values were 1.059 to 1.060 instead of 1.060 to 1.061.  I don’t believe that I can detect the difference of a few points of OG, and the over-estimation of evaporation rate was roughly the same for all conditions, so I don’t think that this will affect results.

(2) Condition A: I mistakenly used 1.85 G (7.0 liters) of water instead of 1.80 G (6.8 liters).  In addition, because the assumed evaporation rates were incorrect, I ended up with 1.75 G (6.6 liters) of wort after the boil instead of 1.60 G (6.0 liters).  My solution was to use a hop-less stand after the boil (at 170°F (77°C)) for 30 minutes in order to evaporate the extra 0.15 G (0.57 liters).  This meant that Condition A probably had a little bit more body than Conditions B, C, and D due to non-enzymatic browning, but body is not one of the factors I’m intending to evaluate in this experiment.

(3) Condition E: By the time I got to Condition E, apparently I was starting to really increase the heat in order to increase evaporation.  I ended up with an OG of 1.061.  Since the other conditions ended up with OGs around 1.059, I added ¼ cup (60 ml) of water to the final 3½ quarts (3.3 liters), which resulted in an OG of 1.060.

(4) Post-Flameout Temperature Decrease: Before brew day, I did a quick experiment in my kitchen to measure how quickly temperatures decrease after flameout.  This test showed that for 1.6 G (6.0 liters) in an uncovered pot, the temperature after 10 minutes was 182°F (83°C), and for 1.6 G (6.0 liters) in a covered pot, the temperature after 10 minutes was 201.5°F (94°C).  Since I planned to keep the lid closed after flameout, I used a line based on the second measurement to predict post-flameout bitterness.  What I forgot to take into account was the minute or so immediately after flameout, when I stirred the wort one last time and took a sample for SG reading.  In this brief time, the temperature quickly dropped while the pot was uncovered.  Also, the temperature in my kitchen (68°F (20°C)) was much greater than in my garage where I brew (around 60°F (15.5°C)).  As a result, I ended up with temperatures between 190°F (88°C) and 195°F (90.5°C) at 10 minutes after flameout.  Because of lower observed temperatures, I achieved less hop utilization during the 10 minutes after flameout than I had predicted.

(5) Bug in the Calculations: While this batch was fermenting, I worked on a blog post to explain a modification to the prediction of IBU values that takes into account post-flameout bitterness.  In the course of this writeup, I found a bug in my code.  As a result of this bug, I was computing less post-utilization flameout than I should have been for earlier hops additions, and so the (hopefully) more correct bitterness levels (mIBU values) decrease with the later hops additions instead of being constant.

After all those mistakes, here is a table summarizing the observed original gravity and final gravity for each batch:

condition
A
condition
B
condition
C
condition
D
condition
E
Original
Gravity
 1.059  1.060  1.059  1.059  1.060
Final
Gravity
 1.013  1.014  1.013  1.013  1.013

Table 2. Measurements of Each Condition

Results: Comparisons
The following table summarizes the results of the comparisons.  The top right half of the table (in blue) is for the “hops flavor” comparison, and the bottom left half of the table (in green) is for a “relative bitterness” comparison. The letter in each box indicates which of the two conditions was preferred; a question mark indicates that no difference could be reliably detected.  Multiple values indicate multiple comparisons of the two conditions, which I did to detect possible random variation.

Condition A
Condition B Condition C Condition D Condition E
Condition A
   ?,B,?  –
 –  –
Condition B  ?,?,A    ?,C,C  D,D  –
Condition C  –  ?,?,?    D,D,D  –
Condition D  –  ?,?  ?,D,C     ?,?,?
Condition E  –
 –  –  ?,?,?
 

A/B Comparison Notes. First tasting: condition A had slightly more body, as expected by the non-enzymatic browning caused by the extra time for evaporation. Condition B had very slightly more hops/citrus flavor, but not enough to be a reliable difference.  It seemed that covering the pot during the last 10 minutes had a negligible effect on flavor.  Second tasting:  Condition B had a very slightly crisper, more citrus flavor than A, as one would expect from Cascade and Citra hops.  The beers were very, very similar, but there was a reliable, detectable difference.  Bitterness levels were the same.  Third tasting:  This time, A seemed slightly more bitter; B more “mellow.”  (In hindsight, it’s likely that I was hallucinating the difference in bitterness; I’m also not sure what would make B more “mellow”.)  I could detect only a very slight difference in hops flavor, with B having slightly more but not enough for me to consider it a reliable difference. In short: B was preferred for hops flavor all three times, but only once did I think it noticeable enough to be considered a “reliable” difference.

B/C Comparison Notes. First tasting: these beers had nearly identical taste.  At first I decided that C was ever so slightly more bitter than B; a half glass later, I decided that B was just slightly more bitter than C.  So I marked it as “no detectable difference” in terms of bitterness.  At first, I could detect no difference in hops flavor.  By the end of the first tasting, I thought that C had slightly more hops flavor than B, but not much.  Second tasting: this time, I could reliably detect a small amount of more hops flavor in C, even from the first sips.  Bitterness levels were about the same, although C seemed maybe just a little more bitter than BThird tasting: C had distinctly more hops flavor than B, although not dramatically more.  The difference was small but noticeable.  I thought B might have been a little bit more bitter than C (the opposite of my second tasting result), but not enough to make it a reliable difference.  In short: bitterness levels were about the same, and C had consistently more hop flavor than B.

C/D Comparison Notes. First tasting: OK, this was the first clear and compelling difference!  D definitely had more hops flavor.  This was a real plus.  On the other hand, it also had more of a tannin flavor.  I had a hard time deciding which was more bitter.  D might have been a tad sweeter, but it also seemed like it might have had more of a tannin or “astringent” bitterness, in contrast with the “clean” bitterness of C.  So in the end the bitterness level seemed about the same.  One unanswered question is whether the astringent bitterness was caused by the longer boil time of the “bittering” hops or the later addition of the “flavor” hops.  Second tasting: D had much more hops flavor, by a wide margin.  C had a definite citrus-hop character, but D brought it out much more.  I thought that D was more bitter, in contrast with the predicted bitterness levels.  Third tasting: Again, and without question, D had more hops flavor.  C seemed to be slightly more bitter, but the bitterness was a “cleaner” bitterness rather than an “astringent” or “grassy” bitterness.   Since these tastings, I’ve found a relevant comment by Greg Noonan: “the bitterness derived from long boiling is coarser than that from a more moderate period” (Noonan, New Brewing Lager Beer, p. 154).  Condition D had a larger amount of bittering hops in the boil for a longer time, and so the difference in bitterness quality probably came from the bittering hop addition rather than the late hop addition.  In short: D clearly had more hop flavor than C; bitterness levels were difficult to judge but about the same.

D/E Comparison Notes. First tasting:  There was almost no difference between these beers.  There was a very slight and subtle difference, but I couldn’t figure out if it E was slightly more astringent, or had more body, or what.  In short, there was no difference between D and E that I could label with any category.  Second tasting:  Same results as the first.  I thought there might be some difference between the two conditions, but I couldn’t quite place what it was.  More bitter?  Fuller?  More sweet?  Crisper?  I really didn’t know.  They were not identical, but not reliably different in either hops flavor or bitterness.  Third tasting:  This time I was able to pin a label on the difference: E was slightly smoother than D.  Once I had decided on that label, I could distinguish them.  Since “smooth” is neither hop flavor nor bitterness, I marked this comparison as “?” in both categories.  The “smoothness” description fits in well with the flavor effects of a hop stand that I observed in Hop Stand Experiment #1In short: bitterness and hop flavor levels were about the same for D and E; E was slightly “smoother”.

B/D Comparison Notes. After the main comparisons (A/B, B/C, C/D, D/E), I had enough bottles left to compare B and D twice, so I did.  First tasting: As expected, D had more hops flavor than B, but I couldn’t detect a difference in bitterness… if anything, D seemed slightly more bitter.  Second tasting: Once again, D had more hops flavor than B.  At first I thought that B was more bitter, then I decided that I really couldn’t tell.  In short: D had more hops flavor than B.

Summary
Covering the lid during the final 10 minutes of the boil (immediately after the last hops addition) had a small impact.  There might be some benefit to covering the pot, resulting in a barely detectable increase in hops flavor.  Certainly there was no downside, and no extra effort.

A hops addition at flameout, with a 10-minute stand, contributed much more hops flavor than otherwise identical additions at 5 and 10 minutes.  A hop addition at 5 minutes contributed more hops flavor than a 10-minute addition, but much less than the flameout addition.  This may be compatible with Papazian’s graph showing a peak in hops flavor at 10 minutes, since his graph may assume cooling at flameout, whereas my batches were kept hot for 10 minutes after flameout.

Holding the hops in the wort at 170°F (77°C) for 45 minutes yielded no reliably-quantifiable effect on hops flavor or bitterness, except for the possibility that the wort held at 170°F (77°C) was slightly smoother.  Unless you’re really trying to squeeze every last possible iota of goodness from your process, when the wort cools to ~180°F (82°C), you might as well force-cool to pitching temperature and get on with the day.

Conclusion and Future Work
Within the constraints of this experimental setup, the best way to maximize hop flavor is to add hops at flameout, cover the pot, and let the wort cool naturally for 10 minutes.  Longer hops additions are not as effective as flameout additions.  Covering the pot provides a very small increase in flavor.  Holding the wort at 170°F (77°C) may provide some benefit, but is probably not worth the effort.

Hop Stand Experiment #2

Abstract
In a previous episode known as Hop Stand Experiment #1, four hop stands were held at various temperatures for an hour each to determine an “optimal” temperature for a hop stand.  (The hop additions were made when target temperature had been reached.)  That experiment yielded a surprising result: the hop-stand batches did not have more of what I consider a standard hops flavor, but were fuller-bodied and/or somewhat “sweeter” than the non-hop-stand (control) batch.  I had a consistent preference for a hop stand at 170°F (76.7°C), but the flavor effects were unexpected.

The current experiment tried to figure out what is causing the fuller-bodied flavor resulting from a hop stand.  These results were more difficult to interpret, compared with the earlier experiment.  That being said, I now believe that adding hops at below-boiling temperatures in a hop stand for one hour has a small impact on hop flavor (a sweet/crisp flavor when using Cascade and Citra hops), and that non-enzymatic browning during a hop stand has a small but noticeable impact on body/fullness.  The combination of hops and non-enzymatic browning is probably what results in the small but distinctive change in flavor and body that I described in Experiment #1 as “sweeter” and “more full”.  The hop-flavor benefits of a hop stand may also be masked by the contribution of late-addition hops.

In short, if you want to maximize hops flavor, it may be better to skip the hop stand and use those hops instead as late additions.  If you want more fullness/body, then either a hop stand or a “hopless” stand at 170°F (76.7°C) for 1 hour should provide that.  A follow-up experiment shows that it’s beneficial to add hops at flameout and then cool to the target temperature for shorter periods of time (e.g. 10 minutes) in order to get hop flavor.  I’ve also created a summary blog post that describes the techniques I’ve found to be useful at maximizing hop flavor and aroma.

Approach
This experiment used five conditions to determine what causes fuller flavor in beer made using a hop stand.  The baseline condition (condition A) was a hop stand with ~0.50 oz of hops per gallon (14 g in 3.78 liters), held at 170°F (76.7°C) for 1 hour, covered.

To determine if higher gravity (caused by more evaporation at higher temperatures in an uncovered stand) is the cause of fuller flavor, condition B was a hop stand with the same amount of hops, held at 170°F (76.7°C) for 1 hour, covered for the first 30 minutes and uncovered for the second 30 minutes.  If condition B has fuller-bodied flavor than condition A, then increased gravity can be considered at least part of the contribution.

To determine if the fuller flavor is caused not by the hops at all, but by non-enzymatic browning of the wort during the stand, condition C was a “hop” stand held at 170°F (76.7°C) for 1 hour, covered, but with no hops actually added during the stand.  In other words, a hopless stand.  If condition C is nearly equivalent to condition A, then non-enzymatic browning can be considered a significant factor.  One can also determine how much of an impact the hops in a hop stand is having on hop flavor… in Experiment #1, I found none of the hops/citrus flavor I was expecting, but if A has more hops flavor than C, that would indicate that the hops are contributing something.  And if A doesn’t have more hops flavor than C, it means that a “hopless” stand is just as effective as a regular hop stand.

To determine if the hop stand simply accentuates other flavors present in the wort, condition D had ~0.1 oz (2.8 g) hops per condition added at the last five minutes of the boil, followed by a hop stand with ~0.5 oz hops per gallon (14 g in 3.78 liters), held at 170°F (76.7°C) for 1 hour, covered.  This was then compared with condition E, which also had ~0.1 oz (2.8 g) hops per condition added at the last five minutes of the boil, but no hop stand.  If the hop stand accentuates other flavors (such as the late hopping), then condition D will have much more of the expected hops flavor than condition E, in addition to having a fuller-bodied flavor.

Finally, as a set of “sanity checks”, condition D should also have more hops flavor than condition A due to the late hopping, but the same full bodied-flavor.  Condition E should have more hops flavor than condition A, but less full-bodied flavor than condition AE may or may not have less body than C, depending on whether the full-bodied flavor is caused by non-enzymatic browning and/or the hops in a hop stand.  E should have more hops flavor than C, due to the addition of hops late in the boil.

Methods
Preparing the Batches
The general methods used were similar to the previous experiment.  A very simple recipe of Briess liquid malt extract, Cascade hops (7.37% AA) for 60 minutes, and Safale US-05 yeast was used.  (See detailed recipe below).  The idea was to keep things as simple and as replicable as possible, with focus on the hop stand.  The volume of the wort at the end of the boil was 6.5 G (24.6 liters).  The goal was to end up with more than 1 G (3.78 liters) per condition, and to ferment only 3¼ quarts (3.0 liters), as it’s better to throw wort away (including settled trub) than to not have enough.  The 3¼ quarts (3.0 liters) left plenty of head room for fermentation; next time I might increase it to 3½ quarts (3.3 liters) per condition.

The initial hops were separated into two mesh bags of 1.65 oz (46.8 g) (3/5 of total hops weight) and 1.10 oz (31.2 g) (2/5 of total hops weight) each, with both added at the beginning of the boil.  After somewhat less than 55 minutes, both bags were removed, the wort was stirred, and 2/5 of the wort (2.6 G, 9.8 liters) was put in a separate pot.  The 1.65 oz (46.8 g) was added back to the remaining 3/5 wort (3.9 G, 14.8 liters) and brought back to a boil for 5 minutes, followed by rapid cooling to 75°F (24°C).  The 1.10 oz (31.2 g) was then added to the 2/5 wort, along with a total of 0.2 oz (5.67 g) of Cascade (0.1 oz (2.83 g) per condition, or about 0.077 oz per gallon (0.577 g/liter)), and this was brought to a boil for 5 minutes, followed by rapid cooling to 75°F (24°C).  The 2/5 wort will yield a somewhat more bitter beer, but the expected difference of 2 IBUs should be below the just-noticeable difference (JND) of 5 IBUs (Palmer, p. 56).

At this point, condition E was finished (a late hop addition followed by no hop stand), and so the wort was stirred and 1.3 G (4.92 liters) of the 2/5 wort was set aside in a covered, sanitized container.

1.3 G (4.92 liters) of the 3/5 wort was put in one pot for condition A, and 1.3 G (4.92 liters) of the same 3/5 wort was put in another pot for condition B.  Both pots were brought to 170°F (76.7°C), and 0.30 oz of Cascade and 0.30 oz of Citra (8.5 grams each) were added to each pot.  (That would correspond to 3 oz in a full 5-gallon batch, or 85 grams in 19 liters.)   The temperature of 170°F (76.7°C) was maintained as closely as possible.  This ended up being a range from ~166°F (74.4°C) to ~175°F (79.4°C) for all conditions, but most of the time temperatures were within a few degrees of the target.  The first pot remained covered for the duration of the 1-hour stand.  The second pot was covered for the first 30 minutes, and then uncovered for the remainder.  At the end of the hour, both were cooled to 75°F (24°C) and stored in sanitized 1-G (4-liter) containers (a.k.a. plastic milk jugs).

The final 1.3 G (4.92 liters) of the 3/5 wort was then added to one pot for condition C, and the remaining 1.3 G (4.92 liters) of the 2/5 wort was added to a second pot for condition D.  Both pots were brought to 170°F (76.7°C), and 0.30 oz of Cascade and 0.30 oz of Citra were added to the pot (17 g total) with condition D.  The target temperature was maintained as closely as possible, and both pots remained covered for the duration of the 1-hour stand.  At the end of the hour, both were cooled to 75°F (24°C) and stored in sanitized 1-G (4-liter) containers.

Approximately 0.53 oz (15 g) of Safale US-05 yeast (package age 7 months) was added to 0.6 cups (142 ml) of 80°F (27°C) water and let sit for 15 to 20 minutes.  The containers with the five conditions were each aerated by vigorous shaking.  The yeast slurry was then divided equally among the 5 conditions, mixed further, and airlocks were applied.  Fermentation and conditioning proceeded for 3 weeks at around 65°F (18°C).  After priming with 2.40 oz (68 g) of sucrose (0.48 oz (13.6 g) per condition, to yield 2.12 volumes CO2) and bottling, bottle conditioning took another 3 weeks at around 65°F (18°C).  The yield was 8 12-oz bottles per condition.

condition
A
condition
B
condition
C
condition
D
condition
E
Original
Gravity
 1.060  1.063  1.062  1.061  1.060
Minimum
Stand Temp
 166°F
(74.4°C)
 166°F
(74.4°C)
 166°F
(74.4°C)
 168°F
(75.6°C)
 N/A
Maximum
Stand Temp
 174°F
(78.9°C)
 175°F
(79.4°C)
 174°F
(78.9°C)
 175°F
(79.4°C)
 N/A
Final
Gravity
1.014  1.016  1.015  1.015  1.014

Table 1. Measurements of Each Condition

From the measured original gravity of each condition, I suspect that the OG was about 1.061 for all four batches without hop-stand evaporation, and ~1.063 for the batch with evaporation.  This would translate to an error of ±0.001 in my specific gravity measurements, which is in line with the first experiment and, in my opinion, a reasonable degree of accuracy.  (I’ve since bought a hydrometer with a longer stem that’s easier to read, and I’ve been much happier with my more recent readings.)  The final gravity was around 1.014 or 1.015 for all batches except B, which as expected had a slightly higher FG (1.016).  (All gravity readings have been corrected to the hydrometer reference of 68°F/20°C.)  This final gravity was higher than that of the previous experiment.  At this point, I think that the high FG was due to insufficient aeration prior to pitching the yeast.  (Next time, I’ll be more diligent with shaking and aerating the containers.)  At any rate, the beer tasted fine, the carbonation level after bottle conditioning was as expected, and all conditions were treated equally, so I think the higher FG is acceptable.

Recipe
The following table is for the full batch.  Each item ended up separated into 5 equal portions.

Amount Ingredient Notes
10¼ lbs (4.65 kg) Briess Light LME Added to 6.7 G (25 l) of water to yield 7.6 G (29 l) of wort.  After boil, volume was 6.5 G (25 l), with OG 1.061.
2¾ oz (78 g) Cascade whole hops, 7.37% AA added at 60 min. to yield 68 IBU according to the Tinseth formula, or a bitterness ratio of 1.12.
0.53 oz (15 g) Safale US-05 dry yeast age 7 months, yielding ~0.75 million cells per ml and °P.
2.40 oz (68 g) sucrose to yield 2.12 volumes CO2.

This next table is for the late-hopping and hop-stand specifics of each condition.  Again, the stand was for 1 hour at 170°F (76.7°C) for conditions A, B, C, and D.  The late hopping was added in the last 5 minutes of the boil for conditions D and E.  Each condition had a total volume of 1.3 G (4.92 liters), and after settling out trub, 3¼ quarts (3.0 liters) was retained for fermentation.

condition
A
condition
B
condition
C
condition
D
condition
E
Late Hopping:
N/A N/A N/A 0.10 oz (2.8 g) Cascade 0.10 oz (2.8 g) Cascade
Hop Stand: 0.30 oz (8.5 g) Cascade,
0.30 oz  (8.5 g) Citra
0.30 oz  (8.5 g) Cascade,
0.30 oz  (8.5 g) Citra
no hops 0.30 oz  (8.5 g) Cascade,
0.30 oz (8.5 g) Citra
N/A

Comparing the Batches
Over the period of about one month (beginning three weeks after bottling), I did a series of pairwise comparisons of different batches.  Each comparison tried to answer the two questions “which has a fuller-bodied flavor?” and “which has more hops flavor?”  I had enough bottles to test twice all of the comparisons described above and summarized below.  (I was able to test the D/E comparison four times.)  I did the taste comparisons starting out knowing which condition was in each glass.  As I continued drinking both, I usually then tried a semi-blind tasting (shuffling the labeled glasses around and waiting a few minutes until I forgot which was which).  In the case of subtle differences in flavor (which happened often in this experiment), I waited for the beer to become warmer and flatter.  I often took a sip of water between tastes to clear my palate.

Pretty Pictures
Everything is better with illustrations.  Here are some pictures of the process:

The scale I have is accurate to 0.05 oz. I tried to get to the midpoint of the 0.10 measurement.

Here is 0.10 oz (2.8 g) of Cascade hops.  It’s a small amount!  This ended up being 0.20 oz (5.7 g) for the combination of conditions D and E.

Here's Condition A, pretty close to 170°F.

Here’s Condition A, pretty close to 170°F (76.7°C).

And here's condition B during the final 30 minutes.

And here’s condition B during the final 30 minutes. Yes, that’s a propane grill keeping it warm.

And here are the five conditions, ready to begin fermentation!

And here are the five conditions, ready to begin fermentation!

Results
The following table summarizes the results of the comparisons.  The top right half of the table (in blue) is for the “fullness” comparison, and the bottom left half of the table (in green) is for the “hops flavor” comparison. The letter in each box indicates which of the two conditions was preferred; a question mark indicates that no difference could be reliably detected.  Multiple values indicate multiple comparisons of the two conditions, which I did to detect possible random variation.

Condition A
Condition B Condition C Condition D Condition E
Condition A
   ?,?  ?,?
 ?,?  A,A
Condition B  ?,?    –  –  –
Condition C  ?,A  –    –  ?,C
Condition D  D,D  –  –   D,D,?,D
Condition E  E,E
 –  E,E  ?,D,?,D
 

A/B comparison notes: I couldn’t detect a clear difference… or any difference, really.  Apparently, the difference of a few OG points is below my taste threshold.  At any rate, there definitely was not the difference in “fullness” or “sweetness” I was looking for.  The second comparison confirmed the results of the first; I could tell absolutely no difference between the two.  This was the one comparison that truly yielded no taste difference; other comparisons often had slight differences, even if they ended up with a ‘?’ rating because the difference wasn’t clear.

A/C comparison notes:  In the first taste comparison, I thought that A was just an eentsy bitty tiny bit more full-bodied.  The more I thought about it and tasted A and C, the less certain I was of this.  I got my wife involved for a second opinion.  She thought that C was “richer”, which was about the opposite of my initial opinion. We agreed that the difference between the two was very, very subtle. As a result, the A/C comparison got a ‘?‘ result in both the fullness and hops flavor comparisons.  In the second comparison, I thought that A was more “crisp” and B was more “mellow”, which I attributed to a difference in hops flavor.  (The “crisp” flavor was reminiscent of a Granny Smith apple, if that helps.)  I may have described the “crisp” flavor as “sweetness” in Experiment #1.  The amount of body was the same.  In summary, I think that the hops in the hop stand added a small amount of extra hop flavor, but the body that I noticed in Experiment #1 was not due to the hops.

A/D comparison notes: The body was about the same.  D had somewhat more citrus/hops flavor than A.  The difference was large enough that I could reliably tell them apart without knowing which was in each glass.  The flavor difference was definitely a citrus/hops flavor, and upon reflection, “crisp” vs. “mellow” was not a bad way of describing the difference.  This flavor difference was much larger than in the A/C comparison.

A/E comparison notes: The first time, it took me almost half a glass each to reliably detect the difference.  It helped when the beers were warmer and flatter.  But in the end I felt that A was smoother and slightly fuller-bodied, and E had more citrus/hops flavor.  The differences were very subtle; I may have been biased by my expectations.  When I did the second comparison, I thought that E was definitely more “crisp”, maybe more bitter, and a tad “thinner”.  A was slightly fuller-bodied and a little more mellow.  I did the comparison when the beers were warm and flat, and found the differences easy to detect.  In short, A had slightly more body than E, and E definitely had more citrus/crisp/hops flavor.

C/E comparison notes: In both of the taste comparisons I did, E definitely had more of a hops/citrus/crisp flavor, which would be a result of the late hopping.  It was difficult to tell if C had more body than E, or if it was just more smooth because it was less hoppy/crisp. In the first taste comparison, I could not be sure, and so I marked that difference as ‘?‘.  In the second comparison, I thought that C did have more body, although I may have been biased by my expectations.

D/E comparison notes: Half the time (the first and fourth times), D and E had the same difference in full-bodied flavor that I remembered from Experiment #1, comparing the hop stand at 170°F (76.7°C) with the non-hop-stand control.  At other times (the second and third times), the difference was less clear.  The second time, I finally concluded that D was definitely more full-bodied.  The third time, I concluded that D was more full-bodied, but because this decision took a long time to reach and I wasn’t sure of the result, I rated it as “no reliable difference”.  The first and third times I tested D and E, I couldn’t tell a difference in the level of hop flavor.  The second and fourth times, I thought that the hop flavor was slightly greater in D.  In short, D was rated overall as having fuller body, but the amount of hops flavor was not dramatically increased.

Summary
The results indicate that a hop stand adds more body as well as a small amount of a “sweeter” or “crisper” flavor due to a combination of hops and non-enzymatic browning.  However, the taste differences were often more subtle in this experiment than in the previous experiment.  Separating these two taste components from each other led to differences that were much closer to (and sometimes below) my taste threshold.

Looking at the specific comparisons and results from those comparisons, the increased body of a hop stand is not due to a slightly higher specific gravity from evaporation, as shown by the comparison of A and higher-gravity B. Non-enzymatic browning does contribute to the body, as evidenced by the comparison of A with the hopless stand C.  The hops in a hop stand may contribute to a sweet/crisp character, from the comparison of A and C.  The hop stand probably does not simply accentuate other flavors, since late-hopped and hop-standed (hop-stood?) D was not consistently rated as more hoppy/crisp/sweet than late-hopped but hop-stand-less E.  (D was rated as (slightly) more hoppy than E half the time, which is probably a small effect from the hop stand, but the additional hops did not produce a large change in hop flavor, which would have happened if the hop stand accentuated other flavors.)

From the comparison of A and E, late hopping seems to overwhelm the subtle hops taste of a hop stand, but a hop stand does still contribute body.

Conclusion & Future Work
In short, if you want more hops flavor, it may be better to skip the hop stand and use those hops as late additions.  If you want more fullness/body, then either a hop stand or a hopless stand at 170°F (76.7°C) for 1 hour will provide that.  These results may be specific to the way I did the hop stand; other hop-stand techniques (e.g. adding the hops at flameout before cooling) may yield different results.

I have one more experiment that incorporates a hop stand, adding hops at flameout instead of after the target temperature has been reached.

Hop Stand Experiment #1

Abstract
One reason I started brewing was to be able to drink a beer I really, really like.  Something that’s just the way I want it to be.  And to me (and many others) that means lots of hops flavor and aroma in an IPA.  This experiment tried to determine the best temperature of a 60-minute hop stand in order to maximize flavor.

The main result was that a 60-minute hop stand with a temperature of 170°F (76.7°C) tasted the best (out of a set of temperatures ranging from 150°F to 180°F (65.6°C to 82.2°C)).  However, none of the hop stands had the increase in hop flavor that I was expecting; instead, the hop-stand beers had fuller-bodied flavor.

A follow-up experiment (Hop Stand Experiment #2) looks at the causes of flavor and body in a hop stand.  Still another follow-up experiment looks at other techniques that might actually increase hop flavor.  I’ve come to the conclusion that 60 minutes is too long of a steep time to get any hop flavor; shorter times (e.g. 10 minutes) are much more effective.  I’ve created a summary blog post that describes the techniques I’ve found to be useful at maximizing hop flavor and aroma.

Introduction
One technique for maximizing hop flavor and aroma is the “hop stand”, where the hops sit for an extended period (up to 80 minutes) at below-boiling temperatures.  (One question is whether the hops should be added at flameout or after the target temperature has been reached.  I decided in this experiment to add the hops only once the target temperature had been reached.)  I was intrigued by an article by Van Havig (now Master Brewer at Gigantic), “Maximizing Hop Aroma and Flavor Through Process Variables“, in which the beer with the most flavor and aroma was the result of both a hop stand for 80 minutes and dry hopping. I tried a 5-gallon (19 liter) batch with 2½ oz (71 g) in a 180°F (82.2°C) hop stand, and I thought the results were quite successful.  But I couldn’t find sufficient information on what temperature the hop stand should be conducted at. In a large brewery, I believe that the temperature will very slowly decrease after flameout until the wort passes through the wort chiller, but with small-scale homebrewing we should be able to easily and quickly cool to the “best” temperature, maintain that temperature for the full time period, and then cool to pitching temperature.  Whatever “best” is, which is what this experiment tried to find out.  Lee Morgan at Hot Water Magic conducted an experiment, but at only two temperatures: 200°F (93.3°C) and 175°F (79.4°C).  Since his results showed a clear preference for the hop stand at 175°F (79.4°C), I wanted to try a number of different temperatures all below 200°F (93.3°C).

Related Work
Here are some interesting hop-stand descriptions from around the web:

Methods
The methods of this experiment were simple: Create four 1-gallon (4-liter) batches at different hop-stand temperatures (with a 60-minute stand in each case), and another 1-gallon batch with no hop stand, which I called the “control” or “baseline”.  See which batch among the first four yields the most flavor, through a series of pairwise comparisons.  See how much effect the hop stand is having, relative to the control, again with pairwise comparisons.

Because I restricted myself to four hop-stand batches and one control batch, that limited the temperature range I could try.  I decided to go with 150°F, 160°F, 170°F, and 180°F (65.6°C, 71.1°C, 76.7°C, and 82.2°C).  All batches were made from the same 1.065 OG wort and 2¼ oz (64 g) of Centennial hops added at 30 minutes, so the only (intentional) difference between batches was in the hop-stand temperature.  No crystal malt, no late hopping, and no dry hopping, to keep the focus on the effects of the hop stand.  I used ¼ oz of Cascade and ¼ oz of Citra (7 g of each) in each of the four 1-gallon batches, which would translate to 2½ oz in a 5-gallon batch (71 g in 19 liters).

Some might worry about the risk of infection with steeping for a long time at below-boiling temperatures.  Dave Miller, in The Complete Handbook of Home Brewing, says that “most of the bacteria that are classified as wort spoilers grow best at temperatures from 80°F to 120°F” (27°C to 49°C) (p. 148).  According to the Thermoworks food temperature FAQ, “Bacteria thrive between the temperatures of 40°F (4.5°C) and 140°F (60°C).  Food should not be stored between these temperatures for extended periods of time.”  The minimum temperature used here was above 140°F (60°C), and none of the batches I’ve used with a hop stand (this experiment plus several other batches) have shown any signs of an infection.  In short, I think it is never unreasonable to worry about infection, but for a hop stand the risk is negligible.

Recipe, per 1-gallon (4 liter) batch:

Amount Ingredient Notes
1.9 lbs (0.86 kg) Briess Light LME to yield target OG of 1.065.  This was actually 9½ lbs (4.3 kg) in what eventually became one 5-gallon (20-liter) batch, which was then divided into 5 one-gallon (4-liter) batches.
0.45 oz (12.75 g) Centennial 10.9% AA added at 30 min. to yield 77 IBU according to the Tinseth equation.  This was actually 2¼ oz (64 g) of Centennial in the full 5-gallon (20-liter) batch.
¼ oz (7 g) Cascade added when wort reached target hop-stand temperature
¼ oz (7 g) Citra added when wort reached target hop-stand temperature
0.092 oz (2.6 g) Safale US-05 dry yeast age 8 months, yielding ~0.75 million cells per ml and °P.  This was actually 13 grams divided into 5 equal parts.  Fix & Fix (An Analysis of Brewing Techniques, p. 68) recommend 0.75M cells per ml and °P.
0.48 oz (14 g) sucrose to yield 2.2 volumes CO2.  This was actually 2.40 oz (68 g), divided into 5 equal parts.

Procedure
Preparing the Batches
The 9½ lbs (4.3 kg) of LME was added to 5¾ G (22 liters) of water, yielding 6½ G (25 liters) of 1.053 wort.  At the end of the 30-minute boil, 5¾ G (22 liters) of wort remained.  (I used a more vigorous boil than in my “numbers” post.)  The Centennial hops were removed.  This wort was divided into 5 equal batches.  One batch was set aside, covered, as the control batch.  Since I had two burners available, I did the 160°F (71.1°C) and 180°F (82.2°C) hop stands simultaneously (uncovered), then cooled them to pitching temperature.  I then did the 150°F (65.6°C) and 170°F (76.7°C) hop stands simultaneously (uncovered), and cooled them and the control batch to pitching temperature. Each batch was transferred immediately after cooling to a 1-gallon (4-liter) container, with 0.70 to 0.87 G (2.6 to 3.3 liters) of wort remaining after separation from the break. The 13 grams of yeast were prepared in ½ cup (120 ml) of 80°F (26.7°C) water for 15 minutes, and this volume of slurry was equally divided amongst all five batches.  I shook each batch vigorously to aerate.  Airlocks were applied.  Time passed.  When bottling after 3 weeks with 2.40 oz (68 g) of sucrose divided amongst the five batches, I labeled each bottle with its hop-stand temperature.  Yield was 6 to 7 bottles per condition.

The hop stands were kept as close as possible to the target temperature using low heat.  Each was cooled initially to the target temperature using an immersion wort chiller, and then put on low heat to maintain the temperature.  For the most part, the target temperature was maintained to within a few degrees.

The measured OG of the 160°F (71.1°C) and 180°F (82.2°C) batches was 1.066.  The measured OG of the 150°F (65.6°C) and 170°F (76.7°C) batches was 1.065.  I forgot to measure the OG of the control batch, which in hindsight was unfortunate, since it had the least evaporation (due to covering) and hence lowest OG.  The measured final gravities (FG) of the batches were: 1.012, 1.014, 1.014, 1.013, and 1.015 for the control, 150°F, 160°F, 170°F, and 180°F batches, respectively.  According to the Tinseth formula, there should be 77 IBUs in this beer, for a BU:OG ratio of 1.18.

Comparing the Batches
Over the period of about one month (beginning two weeks after bottling), I did a series of 15 pairwise comparisons of different batches.  The question in each case was the same: which tastes better?  Of course, I also kept notes about the flavors and qualities of each batch.  With the number of possible combinations being very large and the number of bottles much smaller, I used the following procedure to select comparisons.  First, the control was compared against 180°F (maximum expected difference), 150°F against 180°F (maximum hop-stand difference), 150°F against 160°F (minimum difference), 160°F against 170°F (minimum difference), and 170°F against 180°F (minimum difference).   The remaining 10 tests were determined sequentially, based on the results up until that point and what comparison at that point would provide the most cumulative information about optimal temperature. (I’ll convert to Celsius one more time, and then refer to the conditions by Fahrenheit only: 150°F = 65.6°C, 160°F = 71.1°C, 170°F = 76.7°C, 180°F = 82.2°C.)

I did the taste comparisons knowing which batch was in each glass.  At times I tried a semi-blind tasting (shuffling the glasses around behind cover and waiting until I forgot which was which); these tastings generally confirmed my other, much easier, tests.

Pretty Pictures
Here are some pictures of the one-gallon batches.

This is the 160°F hop stand, hops in a mesh bag.

This is the 160°F hop stand, with hops in a mesh bag.

This is the 180°F hop stand.

This is the 180°F hop stand.

Here are the five one-gallon batches

Here are the resulting five one-gallon batches, just after applying the airlocks.  Like a police lineup… except they’re all guilty!

Note the color differences. The control was the lightest, 180F was the darkest

The difficult job of performing a pairwise taste comparison. Note the color difference, which was much less pronounced when fermentation began. The control was the lightest, 180°F was the darkest.

color difference between 150°F and 170°F

150°F was lighter than 170°F, although I had to improve my methods and take this picture against a non-brown background, in order to make the color difference more apparent.

Results:
The table below shows the results of the 15 comparisons.  Each row and column indicate a pairwise comparison.  The value in each cell indicates which of the two temperatures tasted better.  Multiple values indicate multiple comparisons at those two temperatures, which I did to detect possible random variation.

Control 150 160 170 180
Control 150 160 180
150 160 170, 170 180, 180
160 170, 170, 170 180
170 170, 170
180

I feel that I was able to detect a difference in nearly all cases, although the 150°F and 160°F samples were nearly identical.  The taste differences between hop-stand temperatures were usually small but distinct.  Multiple tests of the same conditions always yielded the same results, increasing my confidence that the differences were not just random guessing.  All of the hop stand conditions tested had a noticeable, positive difference over the control.

If these were all the results I obtained, then we’d be done and there’d be no need for a follow-up experiment.  However, I was surprised at what the beers tasted like.  Prior to running this experiment, I was expecting the hop-stand beers to have a more citrusy flavor than the control, due to the Cascade and Citra hops, but for them to have the same level of bitterness due to the below-boiling temperatures.  I expected the flavor difference between the control and the hop-stand batches to be large, and for there to be smaller differences between the different target temperatures.  I also was expecting the color of the beers to be more or less the same, with the hop stands possibly darker than the control.  What I found instead was that the control was more “dry”, “light”, and “thin/fragile”, and that the hop-stand batches were somehow “sweeter”, more “full”, and somehow more “rounded/robust” with more body.  The different target temperatures had different levels of this dry/light or sweet/full character, with the 170°F batch having the most of the “full” character.  While I wasn’t expecting the addition of hops to make the beer more bitter, I really wasn’t expecting the addition of hops to make the beer seem somehow sweeter.  And the “full”, “smooth” taste was not quite the citrus taste I was expecting.  It’s almost as if the addition of hops had the same effect as the addition of more malt.  I found the fuller, somehow sweeter flavor more pleasant, and therefore I rated it higher.  (Perhaps I should just skip the hop stand and increase my malt levels!)  The color did increase more than expected with the hop-stand beers, and darkness correlated with temperature.

While the control batch did have the lowest final gravity (1.012) and the 180°F batch had the highest final gravity (1.015), there wasn’t a linear relationship between temperature and final gravity (possibly due to variance in my FG measurements).  If the sweeter taste were due to a higher final gravity, then the 170°F batch (FG of 1.013) should have been rated worse than all batches other than the control.  It seems that the measured final gravity did not correlate well with perceived taste.  My guess is that my FG measurements are accurate to about ±0.001 (I should be so lucky… reading a hydrometer can seem like an imprecise art form), and that the actual FG of the hop-stand batches was 1.014 across the board.  The control batch may have had a slightly lower FG (for unknown reasons), or my FG measurement error might be closer to ±0.002.

Conclusion and Future Work:
The use of a hop stand was found to be beneficial, relative to the batch without any hop stand.  The hop-stand temperature that yielded the best flavor was 170°F (76.7°C).

Why did the use of a hop stand result in a fuller-bodied beer?  I can think of a number of possible causes, which need to be have been tested in Experiment #2.  In no particular order, here are some possibilities:

  • The evaporation during the hop stand increased the gravity of the wort.  The addition of hops had little or no effect, perhaps due to the addition of too little hops.  If this were the case, I would expect the 180°F batch to have had the most evaporation, and hence the highest rating.  Since the 170°F batch was perceived as fuller and sweeter than the 180°F batch, this option is unlikely.
  • The heat applied during the hop stand resulted in nonenzymatic browning (Maillard reaction) of the wort.  Again, the addition of hops had little or no effect, perhaps due to the addition of too little hops.  According to Fix (Principles of Brewing Science, p. 76), three levels of complexity of nonenzymatic browning can produce a range of flavors including “toasty”, “cooked cabbage” (not due to SMM), “sweet corn” (not due to DMS), “burnt”, “nutty”, and “smoky”.  Palmer describes the flavors as being “malty, toasty, biscuity” (How to Brew, 3rd edition, p. 122), with low temperatures associated more with malt and fresh bread flavors. It is possible that the full, sweet perception I had was actually a toasty flavor caused by just the right amount of heat applied for just the right amount of time.  This browning would also explain the color increasing with each hop-stand temperature.  However, and there’s no better way to say this, but the hop-stand beers tasted “fuller”, but not more “toasty”.  And I have no idea if, or how much, nonenzymatic browning occurs in the temperature range of 150°F to 180°F… Palmer says that the Maillard reaction can occur as low as 120°F (49°C) (How to Brew, 3rd edition, p. 122); certainly the color of the batches was affected.
  • Maybe a hop stand really doesn’t add the flavors I was expecting, but the hops oils and resins that are extracted from the hop stand cause a fuller flavor that brings out other flavors from late hopping and/or dry hopping and/or the malt.  This doesn’t seem entirely likely, however, since hops are well studied and no previous work has described this effect… although, now that I think about it, I recall that Havig did report that the combination of a hop stand and dry hopping was more effective than the hop stand alone.
  • Maybe it’s important to have some of the hops oils at near-boiling in order to maximize the hops taste, even though this will add some level of bitterness.  This might explain why the hop-stand batches didn’t have more of the expected hoppy flavor, but it wouldn’t explain why I rated all of them as better than the control batch.  This hypothesis might also be in contradiction with Mogan’s results (in which he preferred the 175°F batch over the 200°F batch).

These hypotheses lead to the setup for Experiment #2.  In this next experiment, a hop stand with temperature of 170°F will be the control.  The hop stand will be covered to prevent evaporation.   The amount of hops added in the stand will be increased somewhat from the amount in Experiment #1.  To test whether the taste difference is due to evaporation and a higher-gravity beer, the first test condition will be a hop stand at 170°F with no cover (and compared with the control).  To test whether the taste difference is due to nonenzymatic browning, the second test condition will be a “hop stand” at 170°F, covered, but with no hops actually added (and compared with the control).  To test whether the taste difference is due to interactions with other compounds, two final test conditions will be late hopping for 5 minutes followed by (a) no hop stand and (b) a hop stand conducted at 170°F, covered.  These tests will be compared against each other (does the hop stand enhance the hop flavor caused by late hopping?) and against the control (which is better if you can only pick one, late hopping or hop stand?).  A number of other questions about the effectiveness of a hop stand will have to wait, since I can only brew five one-gallon batches in one brew session.