No, seriously: what the heck is expected goals (xG)?

Expected goals (xG)

You’ve seen it on Twitter, been confused by it on blogs and enraged by it on Match of the Day – at least if you're Jeff Stelling. But how does expected goals actually work?

Bayern Munich probably had good reason to rue their luck after bowing out in the semi-finals of the 2015/16 Champions League against Atletico Madrid – they had lost by the finest of margins. Pep Guardiola’s side, having been beaten 1-0 in the first leg, knew they had to win the return clash in Bavaria by two clear goals.

The hosts unleashed an almighty siege on the Rojiblancos’ goal – 33 shots to Atleti’s seven, 11 of which hit the target to the visitors’ four. Yet, most tellingly, they scored two goals to their opponents’ one and were unceremoniously dumped out on the away goals rule.

The above statistics alone hinted Diego Simeone’s men might have been a touch fortunate – but a more qualitative measure suggested that their progression actually bordered on the miraculous.

"Seems I've upset the nerds"

You are talking to me about expected goals in the Champions League semi-final they’ve just lost? What an absolute load of nonsense

- Craig Burley, ESPN

The next day, speaking on American sports network ESPN, Italian journalist Gabriele Marcotti mentioned in passing that, on another night, the Bundesliga giants would have achieved the result they required to reach the final – after all, their expected goals rating for the two-legged tie was 4.2 to Atleti’s 1.7.

“You are talking to me about expected goals in the Champions League semi-final they’ve just lost? What an absolute load of nonsense,” came the incredulous reply from pundit Craig Burley, the former Chelsea and Scotland midfielder clearly unimpressed with the writer’s use of the increasingly popular analytical tool.

“I expect things at Christmas from Santa Claus, but they don’t come, right? What I deal in is facts!”

David Alaba at the end of the Bayern/Atletico game which provoked Marcotti vs Burley

Before Marcotti could calmly expand on the finer points of expected goals – or xG, as it’s also known – the agitated Scotsman let rip again.

“Look at the results! That’s what the game is all about. Whether [or not] you or I or anybody likes it, the game is about results. That is why managers get the sack – not all this nonsense about expected goals.”

As video of the heated exchange went viral, Burley posted on Twitter: “Seems I’ve upset the nerds.”

A predictable response

Marcotti, and football’s burgeoning analytics community then took a deep breath – funnily enough, this was exactly the kind of reaction they had now come to expect. To the uninitiated, expected goals can appear like little more than an overwhelmingly complex equation. However, when you break it down, the very essence of the idea is one fans, pundits and managers have been sidestepping for decades.

“The reason I like expected goals is that it’s quite intuitive when you try to strip the math out,” says the writer and analytics expert Michael Caley, who has been exploring expected goals for a number of years. He has shared his discoveries in written articles and social media posts that have helped popularise xG among number-crunching supporters and journalists.

“Basically, it’s the idea of trying to evaluate the quality of scoring chances,” he explains. “When a pundit on television claims a team was a bit unlucky and that they could have won a game, what they’re trying to say is that the team created better scoring chances, but the goals just didn’t come.”

Bayern found out the hard way

It may have only started appearing on Match of the Day this season (more on that later), but xG has been around for more than five years and continues to be constantly refined as more matches are played.

“Opta first came up with the concept of expected goals when one of our data scientists – Sam Green, who has since gone on to work at a Premier League club – devised an analytical model based on similar things being done in American sport,” says Duncan Alexander, Opta’s head of data editorial.

“Once the theory existed, various people in the analytics community worked on and adjusted it – making a few little tweaks to the model to try to perfect it. So there are actually several different xG models in existence, but there is only really a very slight difference with the numbers.”

Expected goals uses a whole bunch of indicators based on Opta’s on-ball event data... to determine exactly how likely it is that a particular opportunity will result in a goal

- Michael Caley

Among those to have tweaked the xG model is Caley, who originally began toying with football analytics in his spare time while studying for a PhD in the History of Religion at Harvard University. He’s therefore well placed to explain, in layman’s terms, how the whole thing works.

“Expected goals uses a whole bunch of indicators based on Opta’s on-ball event data – where on the pitch the shot had been taken from, what part of the body was used, the type of pass that had set up the chance, how quickly the move progressed down the pitch before the shot, the proximity of the opposition players, and so on – to determine exactly how likely it is that a particular opportunity will result in a goal.

“For example, if it’s a cross onto a player’s head, that’s going to have lower expected goals because those are more difficult to score from. If it’s a through-ball to feet, which is going to eliminate a number of defenders, that’s going to increase the chances of a goal. And if it's a corner-kick, there’ll be a load of defenders in the box so you’re less likely to score.

“You essentially pull all of that into one math equation that then spits out a number – expected goals – which can be tallied up over the course of a game or a season, and for a player or a team.”

Crystal Palace lost at Turf Moor in spite of outperforming Burnley on xG

Crystal Palace’s xG for their 1-0 defeat at Burnley in September, which ultimately cost Frank de Boer his briefly held job, was 1.74. Over the course of the 90 minutes, they spurned several presentable chances that on another day they would have buried. Burnley’s xG in the same match was a mere 0.43. The Clarets were evidently far more clinical.

At this stage, it's also worth making a key distinction – that between statistics and analytics.

“The thing that really irks me when I hear it is the word ‘stats’,” says Billy Beane – a man who certainly speaks with authority. Beane, as many readers will be aware, was at the heart of the data revolution in baseball during his time as general manager of the Oakland A’s.

His use of sabermetrics (“the use of objective data – what we would now call analytics – and mathematically finding a more efficient way of putting together a baseball team”) allowed the A’s to go toe-to-toe with Major League Baseball’s richest franchises, despite their own financial limitations. His tale was told in the book Moneyball and the 2011 movie of the same name. He’s also a huge football fan.

An important distinction

“Stats are results,” Beane tells FFT. “You can have the same outcome, such as a goal, from two different events but both of them can be very different in terms of how difficult they were. Take a [Lionel] Messi goal, where he has weaved through nine guys, versus a tap-in. Those goals are the same statistically, but they require two different skill sets – one was harder to score than the other.”