What's new
FORUMS - COASTERFORCE

This is a sample guest message. Register a free account today to become a member! Once signed in, you'll be able to participate on this site by adding your own topics and posts, as well as connect with other members through your own private inbox!

"Statistics aren't everything"; to what extent is this true? A statistical analysis into how roller coaster statistics affect ranking

Matt N

CF Legend
Disclaimer: This is a long, geeky post. If you don't like statistics or maths talk, turn back now! If you'd like a more concise summary, a short summarising paragraph can be found at the bottom.
Hi guys. In theme park enthusiast circles, one of the most common comments you will hear in relation to roller coasters is "statistics aren't everything". This is often said in contrast to the common non-enthusiast view that rides with more impressive statistics are automatically better, and is often sneered derisively by at least one enthusiast whenever a tall, fast ride is announced. Most recently, many have said this in relation to Falcon's Flight, the monstrous new multi record-breaking roller coaster currently under construction in Saudi Arabia. But with parks continuing to go for records, and indeed the entire worldwide coaster industry between the late 1980s and the late 2000s being locked into the one-upmanship battle that was the Coaster Wars, one does have to wonder; how true is this common enthusiast view? To what extent does the old adage of "statistics aren't everything?" actually hold?

With this in mind, I've decided to perform a good old bit of statistical analysis to put this common enthusiast adage to the test! Join me over this two-part analysis where I aim to determine how much statistics actually affect how highly a ride is ranked by the collective "hive mind" of enthusiasts, and determine whether statistics actually mean anything in terms of determining the community ranking of a roller coaster.

Before we begin, let me introduce you to the dataset I'm using, and the methods that I used to filter said dataset...
Methodology
The dataset I'm using for this analysis is the May 2025 World Roller Coaster Ranking from Captain Coaster: https://captaincoaster.com/en/ranki...s[manufacturer]=&filters[openingDate]=&page=1

There are 1,978 roller coasters ranked in this list in total. Captain Coaster only ranks rides with a certain level of ridership, as well as rides not deemed to be "kiddie" coasters by the site management (the definition appears to be a little inconsistent, I'll admit).

I should point out, however, that I only used 1,863 of these listed rides in my analysis. My main reason for filtering out 115 listed roller coasters was that I decided to stick more rigidly to the RollerCoaster DataBase definition of a distinct roller coaster, with RCDB being a very trusted source on these matters. Resultantly, the following things that were included in Captain Coaster's list were filtered out of my dataset:
  • Relocations: While Captain Coaster ranks each location of a relocated roller coaster separately (e.g. Traumatizer at Pleasureland Southport and Infusion at Blackpool Pleasure Beach would be ranked separately), I decided not to. Therefore, I included the location that had the highest ranking in the list, with all others being excluded.
  • Like-for-like retracks: Captain Coaster ranks original rides and like-for-like retracks separately (e.g. the original Shaman at Gardaland and the retracked Shaman at Gardaland are ranked separately), but as RCDB does not consider them distinct roller coasters, I decided not to. Therefore, I included the track version that had the highest ranking in the list, with all others being excluded.
  • Dubious creds: Captain Coaster's list includes some rather... debatable roller coasters that are not on RCDB (e.g. Maximus Blitz Bahn at Toverland is ranked despite not being on RCDB). I'm mainly talking about you, odd little bobsleigh contraptions! For the most part, anything that I could not find a listing for on RCDB was excluded, with the only notable exception being travelling roller coasters (RCDB does not list travelling coasters unless they appear at a permanent park, but I did include the highest ranking list entry for individual travelling coasters, as many were ranked in both permanent park locations and as travelling rides).
To source statistics for this analysis, I mostly used RCDB (hence my rigid adherence to RCDB's definition of a distinct roller coaster). Where I could not use RCDB (mainly for travelling coasters), I used an alternative source such as Coasterpedia, or as a last resort Captain Coaster itself in a couple of cases. If it is relevant, height, speed and length statistics were logged using the imperial measurement system (aka ft for height and length, mph for speed).

The term "statistics" is very broad; technically, anything about a roller coaster can be a statistic! But for the purposes of this analysis, I limited my dataset to five key statistics; height, speed, length, number of inversions and opening year. I chose these five as I felt that they were likely to be five of the most influential concrete statistics in terms of swaying people's opinion on a ride, and they are also by far the most prevalently present roller coaster statistics on RCDB.

Before I begin, I should also stress a few caveats. These are:
  • The data entry and filtering were performed manually by myself, so there is the potential for a touch of error. I tried my level best to be rigorous, but I am only human!
  • I'm aware that some may question my use of Captain Coaster as a source. However, I do feel that it's the most representative sample of the community hive mind I could think of. It has a wide user base, and it now encompasses a great number of different rides. The only alternatives I could think of were the Mitch Hawker poll, which hasn't run in over 10 years, and the Golden Ticket Awards roller coaster ranking, which is commonly dismissed as being too biased in favour of the USA, so I felt that Captain Coaster was a favourable choice.
  • Whatever my experiments find, it does not in any way counteract your own individual opinion! I am simply relaying the opinions of the enthusiast community's overall "hive mind" (or more specifically, the opinions of my chosen sample of the enthusiast community) on the matter; you are of course free to hold your own views on the matters discussed!
With some of these caveats out of the way, let's get into the fun side of the analysis, shall we? For Part 1 of this analysis, I decided to stick to our good old friend... correlation analysis!
Part 1: Correlation Analysis of the Relationship between various statistics and Ranking
Initially, I'm going to stick to the (relatively) simple end of the spectrum and perform some correlation analyses on various statistics and their relationship with ranking in the list.

For those who haven't read any of my previous theme park data analyses or need a refresher, let me once again briefly explain correlation. In essence, correlation measures the strength and direction of relationship between two variables. It can fall between -1 and 1, with -1 representing a perfect negative correlation (i.e. "as variable 1 increases in value, variable 2 decreases in value"), 0 representing no correlation (i.e. "variable 1 increasing in value has no effect on the value of variable 2"), and 1 representing a perfect positive correlation (i.e. "as variable 1 increases in value, variable 2 increases in value in tandem").

For each of these analyses, I also decided to use two correlation coefficients, for the sake of transparency. Pearson's correlation coefficient measures the strength of the linear relationship between two variables, while Spearman's correlation coefficient measures the strength of any relationship between two variables regardless of linearity.

For the purposes of differentiation, I should define the following thresholds of absolute correlation coefficients in terms of denoting strength. I can't remember where I heard these thresholds, but I quite liked them:
  • Below 0.3: No significant correlation
  • 0.3-0.5: Weak correlation
  • 0.5-0.7: Moderate correlation
  • 0.7-0.9: Strong correlation
  • Above 0.9: Very strong correlation
Finally, I should also probably point out that as a "high" ranking equates to a low physical number in this particular instance, a negative correlation coefficient will actually represent a positive effect on ranking here, if that makes any sense at all?

Let's dive into it, shall we? I firstly looked at the big one... height!
Height
When plotted on a graph, the relationship between height and ranking looked something like this:
Relationship-between-Height-and-Ranking.png

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Height and Ranking, the results were as follows:
Correlation Coefficient TypeCoefficient Value (2dp)Strength and Nature of Correlation
Pearson-0.55Moderate Negative Correlation (Moderate Positive Effect on Ranking)
Spearman-0.60Moderate Negative Correlation (Moderate Positive Effect on Ranking)
So from this, we can conclude that Height is moderately correlated with Ranking. As a general rule, taller roller coasters will rank higher than shorter ones on average, with height having a moderate effect on ranking. This is not a perfect correlation, so this rule will not apply in every instance, but I think we can safely infer a general rule of taller coasters being ranked more highly from this. This could be because in general, taller coasters are inherently targeted more towards thrillseekers and pitched at a higher intensity level, so taller rides are more likely to appeal to thrillseekers than shorter ones.

Let's now look at speed...
Speed
When plotted on a graph, the relationship between Speed and Ranking looked something like this:
Relationship-between-Speed-and-Ranking.png

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Speed and Ranking, the results were as follows:
Correlation Coefficient TypeCoefficient Value (2dp)Strength and Nature of Correlation
Pearson-0.65Moderate Negative Correlation (Moderate Positive Effect on Ranking)
Spearman-0.69Moderate Negative Correlation (Moderate Positive Effect on Ranking)

As such, I think we can infer that Speed and Ranking are moderately, bordering on strongly, correlated. Faster rides, in general, rank more highly. Interestingly, the correlation coefficients for speed are a good 0.1 higher than their equivalents for height, indicating that speed is a more influential statistic than height in determining people's ranking of a ride. This could be explained by looking at some patterns among high-ranking coasters; one thing I notice is that highly ranking rides that aren't especially tall often have launches or terrain usage that give them speed that punches above their height statistic. One high-ranking example of this I'd cite is Taron at Phantasialand, which ranks in the #21 spot; at only 98ft tall, it is not an especially tall coaster, but its top speed of 73mph ranks much more highly in the speed stakes, so its speed compensates heavily for its lack of stature. With this in mind, it could be argued that speed is perhaps a more direct determinant of thrill level than height; a tall coaster will usually be fast, but a fast coaster will not always be tall.

Let's take a look at length now...
Length
When plotted on a graph, the relationship between Length and Ranking looked something like this:
Relationship-between-Length-and-Ranking.png

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Length and Ranking, the results were as follows:

Correlation Coefficient TypeCoefficient Value (2dp)Strength and Nature of Correlation
Pearson-0.49Weak Negative Correlation (Weak Positive Effect on Ranking)
Spearman-0.57Moderate Negative Correlation (Moderate Positive Effect on Ranking)

Resultantly, we can infer that Length and Ranking share a weak-to-moderate correlation. In general, longer rides are ranked more highly, but interestingly, this relationship is not quite as strong as in the case of height and speed. I would suggest that similarly to what I said about height and speed, this is because length is a slightly less obvious determinant of thrill level and intensity pitching. While tall and fast coasters often are proportionally long, they are not always, and there are also long coasters that aren't pitched overly high in terms of thrill level. For example, family mine train coasters, such as the Big Thunder Mountain rides at Disney parks, are often long in track length, but relatively tame in thrill level. And in this particular dataset, alpine coasters are also included, which can be very long but are often not targeted at a particularly high intensity level. People always say "it's not the length that matters, it's what you do with it", and while the correlation between length and ranking would infer that increased length does generally make a ride rank more highly, the relatively weaker nature of this correlation compared to height and speed would suggest that there might be something in this saying!

Let's move away from height, speed and length and look at something a tad more discrete... inversions!
Inversions
When plotted on a graph, the relationship between Inversions and Ranking looks something like this:
Relationship-between-Number-of-Inversions-and-Ranking.png

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Inversions and Ranking, the results were as follows:

Correlation Coefficient TypeCoefficient Value (2dp)Strength and Nature of Correlation
Pearson-0.24No Significant Correlation (No Significant Effect on Ranking)
Spearman-0.26No Significant Correlation (No Significant Effect on Ranking)

From this, I think we can see that inversions are a significantly weaker statistic in terms of predicting how highly a ride will be ranked, not really having any overly significant correlation with ranking. In my view, the blame for this can be placed squarely at the door of how wide the range of non-inverting roller coasters is; if you look at the scatter graph, you can see that 0 inversions covers pretty much every inch of the ranking continuum! Fury 325 at Carowinds, the #10 ranked roller coaster in the list and one of the tallest and fastest operating coasters in the world, has no inversions, along with many of the other tallest, fastest and longest coasters in the world... but every single low-ranked, tin-pot family coaster also has no inversions. So when you consider that fact alone, number of inversions is really quite a poor discriminator in terms of determining ranking. If you were to take non-inverting coasters out of the equation or normalise for the effect of another statistic such as height or speed, it might be more influential, but without any adjustment, it doesn't really have any overly significant effect on ranking.

Let's finally look at a slightly different metric to see if age has any influence... opening year!
Opening Year
When plotted on a graph, the relationship between Opening Year and Ranking looks something like this:
Relationship-between-Opening-Year-and-Ranking.png

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Opening Year and Ranking, the results were as follows:

Correlation Coefficient TypeCoefficient Value (2dp)Strength and Nature of Correlation
Pearson-0.17No Significant Correlation (No Significant Effect on Ranking)
Spearman-0.25No Significant Correlation (No Significant Effect on Ranking)

With this in mind, I think we can say that Opening Year has no significant correlation with Ranking, and thus, a very limited effect on how highly a ride is ranked. I would blame this on the fact that each opening year has a very broad range of rides opening within it; while thrillseekers' minds might be drawn to the big new thrill rides each year, there are also small, lowly-ranking family coasters opening each year as well. Therefore, opening year alone, like number of inversions, is quite a poor discriminant without adjusting for the effect of another statistic such as height or speed, and its effect on ranking is very limited.

Let's now summarise some final conclusions for Part 1...
Conclusions
So, having performed some correlation analyses on various ride statistics, let's reflect on our base question. To what extent is the phrase "Statistics aren't everything" true, and to what extent do statistics influence how highly a roller coaster is ranked?

Well, I think the analysis shows us that it strongly depends on the statistic you choose to focus on.

Some statistics did have significant correlations with ranking. Height and speed had moderate correlations with ranking, suggesting that on average, taller and faster rides are generally ranked more highly. Speed had a particularly significant correlation with ranking, with its own correlation coefficients bordering on the threshold of a strong correlation. To a slightly lesser degree, length also had a significant correlation with ranking. On average, longer rides are ranked more highly, but the relationship was not quite as strong as for height and speed.

However, other statistics did not have overly significant correlations with ranking. Both number of inversions and opening year had little significant effect on ranking, likely due to the breadth of rides being covered by certain values. Without normalisation for the effect of another more influential statistic, these two are unlikely to be having any major effect.

So overall, then, I'd perhaps argue that the phrase "Statistics aren't everything" is true to an extent. The correlations shown were not perfect correlations in any case, so there will always be exceptions, and in the case of how old the ride is or how loopy the ride is, those statistics aren't really having much effect at all. With that being said, I would argue that the phrase is not entirely true; the correlations ranking shared with height and speed in particular would suggest that building taller and faster can have some positive effect on ranking!

Thank you for reading Part 1; I hope you've enjoyed this analysis! I'm receptive to any feedback, good or bad! In Part 2, I'm going to do something a little bit more technical and delve into some regression algorithms to see whether statistics can be used to predict how highly a ride will rank within the list, as well as to see how much of a ride's ranking they actually explain!

If you'd like to take a look at the dataset underpinning this analysis, the Google Sheet is here: https://docs.google.com/spreadsheets/d/1OfxlATMBJJwN1VjW7sb0jKwIFywJcACbZYJHIxbjdVc/edit?usp=sharing

And if you are Pythonically inclined at all, here is a Google Colab notebook showcasing my Python code for you to look at, if you'd like to: https://colab.research.google.com/drive/1lWPWocnkxwvkT706QbZdz6B9SSwk-kB3?usp=sharing
Summary: In enthusiast circles, people commonly dismiss statistics as an important factor in how highly a roller coaster is rated, so I decided to analyse the effect that 5 different statistics (height, speed, length, number of inversions and opening year) have on a roller coaster's community ranking using data from Captain Coaster. When correlation analyses were performed, it was found that height, speed and to a lesser extent length were all significantly correlated with ranking. Speed had a particularly strong correlation, with its coefficient bordering on the threshold of a strong correlation. Inversions and opening year were less significantly correlated with ranking, meaning that neither have an overly significant effect in isolation. As such, we can conclude that "statistics aren't everything" to some extent depending on the statistics you choose, but that some do have a definite positive effect on ranking on average.
 
Last edited:
It seems to me that your data, Matt, roughly backs up what I've always thought - that stats do matter, but not the extremes. I think you need a minimum of about 60ft for a great ride, with (for me) a maximum of 150ft. I think your data says you get diminishing returns after about 130ft, which sounds about right. Any bigger than that and you're getting something that's harder to maintain for questionable benefits, apart from the initial marketing.

Length is about 3,000ft which, again, sounds about right. I think with length you also get that thing where it has to be long enough compared to the height, otherwise there's something that's just not satisfying about it.

Inversions at 5-7 also sounds about right, otherwise you run the risk of it flipping you upside-down just for the sake of it, with no non-inverting sections in between.

I am more surprised at opening year, as your stats appear to show that anything from the mid-90s has an equal chance of ranking high. I say I'm surprised because I got the impression the more fluid, RMC-inspired, multi-launching modern coasters were a lot more highly regarded. But it just goes to show, if you do it right in the first place, it lasts.
 
I apologise, as I didn’t realise that the post pasted in with weird formatting last night… I’ve now sorted out the formatting, and it should now look a bit better and be a bit easier to read!
 
Back
Top