Partnership Variability

up arrow
by Matthew Kidd
November 2, 2013

“Who do you think were the two best heavyweights who ever fought? I don’t really care who you pick, but take those two fighters, both at the peak of their careers, put them in a ring and let them slug it out for 15 rounds. Whoever wins is the champ. That’s IMPS. Now take the same two fighters, blindfold them and tie one hand behind their backs. Divide the ring diagonally with a solid barrier. Now go down to the local tavern and collect 20 drunks. Place 10 drunks on each side of the ring and let the fighters go at it. Whoever knocks out his drunks first is the winner. That’s matchpoints!”

Bob Hamman (quoted in Bridge World, April 1991)

How much variability does the field create in matchpoints? How variable is the performance of a typical regular partnership? What is the range of variability? Are stronger partnerships less variable? Is some degree of variability good for the game?

The partnership statistics generated by the Payoff Matrix software provide enough data to investigate these questions. Let’s begin by looking histograms of the performance of several regular partnerships in the La Jolla unit game over the six year period from the start of 2008 through October 2013 as shown in the figure below, where partnership strength decreases from left to right. The shapes are roughly Gaussian, i.e. bell-shaped. The standard deviations are 5.70, 4.94, 5.53, and 5.64 percent from left to right.

Histograms of partnerships percentages from 2018-2013 for four regular partnerships in the La Jolla unit game

As an aside, it is worth noting that the matchpoint results on any given board are not at all Gaussian. Because matchpoint scoring ranks all raw scores from top to bottom, regardless of how small the difference is in the raw score, e.g. +630 vs. +620, the distribution of matchpoint scores should be fairly uniform, i.e. flat rather than bell-shaped. However, the distribution is not absolutely uniform due to ties. For example, if the results on a board are +630, +620, +620, +620, and −50, the pair with the +630 score will receive a matchpoint score of 4 (a “top”), all three pairs with +620 will receive a score of 2 (an average), and the last pair will receive a 0 (a “bottom”). In this scenario, there is a spike in the matchpoint award distribution at 2 and no awards of 1 or 3 on the board. But over a large number of boards, ties will also occur near the bottom and top as well as the middle and the distribution is likely to end up fairly uniform. Still it is good to check actual data. The plot below shows the distribution of N-S matchpoint scores for all boards from a collection of 299er events during the first nine months of 2013 at the Charlotte Bridge Studio. To avoid binning effects, the data has been restricted to events with a matchpoint top of 7.

Distribution of matchpoint scores on boards with a top of 7 in 299er events from the Charlotte Bridge Studio

The distribution of integer matchpoint scores is fairly uniform. Half integer matchpoint scores are less frequent because they can only be generated by a two, four, or six-way tie of raw scores. The exception is the peak at 3.5 at the board average which is occurs much more frequently, presumably because even 299ers sometimes bid, play, and defend a board normally, or at least achieve the normal result through a cancellation of errors. Excluding the center peak, half integer matchpoint scores are 40% less frequent than integer matchpoint scores. But the fractional matchpoint score at the center is 18% more frequent than an integer matchpoint score.

A partnership’s percentage in an event is the mean of its percentage on all the boards it plays, i.e. a sum divided by the number of boards. That we obtain roughly Gaussian distributions for any given partnership’s percentages despite summing from many non-Gaussian distributions, follows from the Central Limit theorem.

Distributions of Variability

The first figure above showed the variability of four partnerships. The figure below summarizes the variability of all partnerships who have played 20 or more sessions in the La Jolla unit game from 2008-2013 (57 partnerships) or 15 or more sessions during the first ten months of 2013 in the Charlotte 199er or 299er games (61 partnerships). The average partnership variability, expressed as the standard deviation of the partnership’s session results, is 5.65% and 6.45% respectively.

Distribution of standard deviations of partnership percentages from the La Jolla Unit game and the Charlotte Bridge Studio 199er and 299er games combined

Sources of Variability

The variability of a partnership’s results can be broken down as the variability caused by the field and the variability caused by the partnership, i.e the intrinsic partnership variability. You have surely experienced field variability. One session, two or three opponents give you gifts, making mistakes that will not be replicated. When you bid a pushy slam, the opponents are put to a guess on the opening lead and fail to cash their two immediate tricks, giving you top instead of a bottom. Another session, nobody seems to give you any gifts. Even weaker pairs perform annoying competently at your table even though you know they’re still handing out gifts, just not to you. Better pairs take shots that happen to work. None of your results are terrible and all of them will be shared by several other pairs, and yet they all feel like slightly below average results and the final result bears this out.

Most field variability is due to randomness, i.e. it is field noise. To a lesser extent variability may be caused by poor seeding. For example, in a Mitchell movement the N-S and E-W directions should be balanced but if too many pairs request a N-S direction and the director is not vigilant, field balance can be thrown off. At other times, masterpoint holdings may not properly reflect skill, but the director may not compensate for this properly. Or significant disparities in skill may make field balance difficult.

Variability caused by a partnership comes from many sources. Alertness and focus vary from session to session, such that the partnership strength varies. Emotional factors may come into play. Bidding methods play a role, especially if a pair employs methods such as weak no-trump or even a full system such a Precision, that is not being used by most of the field. Bidding style can make a difference—a pair that makes many close “matchpoint” doubles or aggressive preempts will have more variable results. Nearly pure guesses in bidding or card play can cause additional variability; some decisions will split expert panels down the middle and even simulations may reveal that a given circumstance is a nearly 50-50 guess.

Partnership variability may also come from long term changes in partnership skill. Strictly speaking, this is a trend in the signal, i.e. the true partnership strength, rather than noise, i.e. random variability. It could be removed by some sort of detrending procedure, perhaps one as simple as a linear fit over time. The six year period covered by the La Jolla unit data is certainly long enough for partnerships to evolve significantly. However, if the La Jolla unit data is limited to just 2012-2013, the average partnership variability only falls by only 0.01%, suggesting that changing partnership skill is not a significant factor in the analysis. This isn’t necessary surprising. Most partnerships do not span the entire six year period. Moreover the La Jolla unit game is fairly strong; many of the participants were already experienced players in 2008 and so have less room to improve significantly since then.

Field Variability

Quantifying Hamman’s complaint amounts to quantifying the field variability. The field variability must be lower than the lowest partnership variability. However, it is not quite that simple because even the standard deviation for each partnership is just an estimate of the true standard deviation that would be observed if it were based on an unlimited, or at least very large number of measurements. For example, the freakishly low standard deviation of 3.10% for the partnership of Peter and Katherine and Moyer is probably not accurate (this result doesn’t appear in the histogram above because it is based on only 14 sessions which is less than the cut of 20 imposed above). The three partnerships with the lowest variability based on 50 or more sessions in the La Jolla unit game are Lynne Anderson and Ursula Kantor (4.16%), Dave and Jules Borack (4.66%), and Bill Grant and Lynne O’Neill (4.94%). The La Jolla unit field variability is therefore almost certainly under 4.16%. Similarly, the 199er+299er field variability is almost certainly under 4.5%. Not surprisingly, the field variability appears to be higher in the weaker game.

Can we also place a lower bound on the field variability? The total variability caused by two multiple independent sources of variability is not the sum of the two standard deviations but rather their addition in quadrature, mathematically, σtot2 = σa2 + σb2. The field variability is really just the variability introduced by all the other partnerships. The variability of a typically variable pair in the field should be √2 × σfield. This puts a lower bound of 5.65% ⁄ √2 = 4.00% for the La Jolla unit game and 6.45% ⁄ √2 = 4.56% for the combined Charlotte 199er and 299er games. The upper and lower bounds are in good agreement, bracketing the field noise at roughly 4% and 4.5% respectively for the two games.

What does field standard deviation of 4% mean in practice? It means that 32% of the time, your result is 4% higher (or lower), than your actual performance, i.e. one standard deviation away. Here is a little chart that shows the influence of the field in more detail.

% of sessions Score versus true performance
6.7 % Score is higher than expected by 6% or more
9.2 % Score is higher than expected by 4-6%
15 % Score is higher than expected by 2-4%
19 % Score is higher than expected by 0-2%
19 % Score is lower than expected by 0-2%
15 % Score is lower than expected by 2-4%
9.2 % Score is lower than expected by 4-6%
6.7 % Score is lower than expected by 6% or more

Intrinsic Partnership Variability

Each partnership’s observed variability is a result of the field noise and the partnership’s intrinsic variability. We can remove the field noise via the following formula:

σintrinsic = sqrt( σ2observed − σ2field )

where sqrt() takes the square root. That leads to this pair of histograms from the same data shown above based on partnerships who have played at least 20 or 15 sessions respectively. The removal of the field noise emphasizes the range of intrinsic variability across different partnerships. Note: small errors in the field noise will have a big impact on partnerships with low intrinsic variability. Partnerships in the leftmost 0–0.5% bin could easily be in the 0.5–1.0% bin.

Distribution of standard deviations of partnership percentages due to the partnership (i.e. intrinsic) from the La Jolla Unit game and the Charlotte Bridge Studio 199er and 299er games combined

What makes for a partnership with low intrinsic variability? I know something about Bill Grant and Lynne O’Neill, a strong partnership with an intrinsic variability of about 3%. Both are calm people who play a logical game. They are aggressive in the bidding when the vulnerability merits it but their bidding system is very ordinary. Lynne’s bidding is so old that she even plays Strong Two bids with some partners, though not Bill who enjoys some aggressive preempts. My opinion is that they gain more from solid card play than sophisticated bidding. I also know something about Lynne Anderson and Ursula Kantor, because I’ve played a lot with Ursula. They are both calm people. Ursula’s bidding is pretty basic. She will almost never miss a game (1♠ 2♠ 3♠—self alert; theoretical invite that will never be passed! 4♠) and is quick to run to 3NT even with distribution that might give slam a chance if her shape were bid out. Her contracts are seldom anti-field.

Ron and Mary Huffaker are a strong pair with an intrinsic variability of about 4.0%, a little bit on the high side. Ron is excitable but Mary is excellent at calming him down. They play 4½ card majors (1♠ promises five), which may lead to some anti-field results. Steve Johnson and Diana Marquardt are a strong pair with an intrinsic variability of about 4.7%. Both of them can be quite emotional. Alan and Debbie Gailfus, a very strong pair, also have a high intrinsic variability of 4.7%. One look at Debbie’s face during competition tells that story. Marvin French and Alice Leicht have an intrinsic variability of 4.9%. I attribute this in part to a significant skill mismatch in the partnership; random fluctuations in which one declares or is on lead may account for a lot of variability.

I’m curious why the Manoochehr Bahmanian–Sally Ishihara and Joyce Bailey–Barbara Blake partnerships have the highest intrinsic variabilities at 5.7% and 6.4% respectively. I don’t know either partnership well but I do think Sally’s game has improved significantly over the 2008-2013 time period analysed; a trend may be masquerading as variability.

Are stronger partnerships less variable?

One might expect strong partnerships to be more disciplined and thus perhaps to have less intrinsic variability. And yet even a casual glance at the spreadsheet suggests otherwise. Consider Maritha Pottenger’s partnerships with Greg House (56.25 ± 1.04%) and Kent Hartman (54.21 ± 1.28%). Both partnerships are respectable and yet their intrinsic variabilities are wildly different at 2.1% and 5.6% respectively. Let’s look at some scatter plots.

Scatter plot of intrinsic partnership variability versus average partnership percentage in both the La Jolla Unit game and the Charlotte Bridge Studio 199er and 299er games combined

At first glance, hardly any correlation is evident. Let’s check this more closely by binning the data in 4% bins. The error bars shown are errors on the mean value for the bin. Only one pair in the La Jolla unit game falls in the 58-62% bin so there is no error bar on that bin. In the strong game, there is no evidence of reduced partnership variability after the first bin, with the intrinsic variability seeming to approach a floor at 3%. However, the 199er and 299er combination shows a downward trend towards 4%, meeting the variability at the weaker end of the stronger La Jolla unit game.

Binned intrinsic partnership variability versus average partnership percentage in both the La Jolla Unit game and the Charlotte Bridge Studio 199er and 299er games combined

Variability as a strategy

To victor go the spoils. In matchpoints, the top flight A masterpoint award scales linearly with the number of tables in play and is further adjusted by the game type, e.g. an ordinary club game, unit championship, a charity game, etc. The second place award is 70% of the first place award and third place is 50% of the first place award. This applies similarly to flight B and flight C but the top awards are based on fewer tables, as determined by the total number of pairs in flight B and C, and just in flight C respectively. A partnership in flight A will collect far more masterpoints if it alternates between 50% and 60% sessions than if it consistently scores 55%. It’s more complicated for a partnership in flight B but in general variability is a good strategy if masterpoints are the goal. If some of your partnership variability is caused by emotional issues, work on that since it will pull your percentage up. But you might want to keep your aggressive preempting style, weak no-trumps, crazy slam bidding, or whatever else your partnership does to generate variability as long as the actions are not anti-percentage, at least not significantly so.

Married Partnerships

My regular partner Trish Lane asked me if married partnerships are more variable than other partnerships. “Wouldn’t you think, looking at some of them?” was her guess. Books have even been written. I recommend Roselyn Teukolsky’s humorous How to Play Bridge With Your Spouse… and Survive even if you are not in a married bridge partnership.

Based on a set of 19 married partnerships in the La Jolla unit game, after lowering the session cut from 20 to 10 to increase the number of married partnerships, the average partnerships variability was 5.63%. This is very slightly lower than the 5.65% for the set of partnerships above. To make sure we are comparing apple to apples, I lowered the session cut to 10 and excluded the married couples, which resulted in an average variability of 5.69% for unmarried partnerships. Removing the field noise, results in average intrinsic partnerships variability of 3.96% and 4.04% for married and unmarried partnerships respectively. Despite the special ways in which married couples can bicker at the bridge table, their play is not more variable than unmarried partnerships

However, I don’t think this is the last word. Married partnerships have usually been playing together for a long time. This means they don’t have the variability of new partnerships that comes from guessing in common but undiscussed bidding and card play situations. It would be more meaningful to compare married and unmarried partnerships that have been playing bridge for similar amounts of time. This is a project for another day.

Is variability inherently bad?

Poker is maddeningly variable. One bad beat for a big pot might wipe out a night of competent play. In an aggressive game it can be hard to tell for awhile whether one is playing well or lucky or conversely playing poorly or unlucky. Compared to poker, Hamman’s complaint about matchpoints seems picayune.

David Sklansky and Mason Malmuth have written a lot about poker. They even have their own press, Two Plus Two. Years ago Sklansky wrote an article about the balance of skill and luck in poker. If it were solely a game of skill, the inexperienced wouldn’t play because they would lose every time. If it were too much about luck, the professional players would give up. Sklansky argued that No Limit Texas Hold’em went too far in the direction of skill. Bridge is a game of considerable skill. If we want to keep the tournament game alive we may need to embrace the degree of luck rather than shun it.