"Statistics aren't everything"; to what extent is this true? A statistical analysis into how roller coaster statistics affect ranking

Matt N · Friday at 1:27 AM

Disclaimer: This is a long, geeky post. If you don't like statistics or maths talk, turn back now! If you'd like a more concise summary, a short summarising paragraph can be found at the bottom.
Hi guys. In theme park enthusiast circles, one of the most common comments you will hear in relation to roller coasters is "statistics aren't everything". This is often said in contrast to the common non-enthusiast view that rides with more impressive statistics are automatically better, and is often sneered derisively by at least one enthusiast whenever a tall, fast ride is announced. Most recently, many have said this in relation to Falcon's Flight, the monstrous new multi record-breaking roller coaster currently under construction in Saudi Arabia. But with parks continuing to go for records, and indeed the entire worldwide coaster industry between the late 1980s and the late 2000s being locked into the one-upmanship battle that was the Coaster Wars, one does have to wonder; how true is this common enthusiast view? To what extent does the old adage of "statistics aren't everything?" actually hold?

With this in mind, I've decided to perform a good old bit of statistical analysis to put this common enthusiast adage to the test! Join me over this two-part analysis where I aim to determine how much statistics actually affect how highly a ride is ranked by the collective "hive mind" of enthusiasts, and determine whether statistics actually mean anything in terms of determining the community ranking of a roller coaster.

Before we begin, let me introduce you to the dataset I'm using, and the methods that I used to filter said dataset...
Methodology
The dataset I'm using for this analysis is the May 2025 World Roller Coaster Ranking from Captain Coaster: https://captaincoaster.com/en/ranki...s[manufacturer]=&filters[openingDate]=&page=1

There are 1,978 roller coasters ranked in this list in total. Captain Coaster only ranks rides with a certain level of ridership, as well as rides not deemed to be "kiddie" coasters by the site management (the definition appears to be a little inconsistent, I'll admit).

I should point out, however, that I only used 1,863 of these listed rides in my analysis. My main reason for filtering out 115 listed roller coasters was that I decided to stick more rigidly to the RollerCoaster DataBase definition of a distinct roller coaster, with RCDB being a very trusted source on these matters. Resultantly, the following things that were included in Captain Coaster's list were filtered out of my dataset:

Relocations: While Captain Coaster ranks each location of a relocated roller coaster separately (e.g. Traumatizer at Pleasureland Southport and Infusion at Blackpool Pleasure Beach would be ranked separately), I decided not to. Therefore, I included the location that had the highest ranking in the list, with all others being excluded.
Like-for-like retracks: Captain Coaster ranks original rides and like-for-like retracks separately (e.g. the original Shaman at Gardaland and the retracked Shaman at Gardaland are ranked separately), but as RCDB does not consider them distinct roller coasters, I decided not to. Therefore, I included the track version that had the highest ranking in the list, with all others being excluded.
Dubious creds: Captain Coaster's list includes some rather... debatable roller coasters that are not on RCDB (e.g. Maximus Blitz Bahn at Toverland is ranked despite not being on RCDB). I'm mainly talking about you, odd little bobsleigh contraptions! For the most part, anything that I could not find a listing for on RCDB was excluded, with the only notable exception being travelling roller coasters (RCDB does not list travelling coasters unless they appear at a permanent park, but I did include the highest ranking list entry for individual travelling coasters, as many were ranked in both permanent park locations and as travelling rides).

To source statistics for this analysis, I mostly used RCDB (hence my rigid adherence to RCDB's definition of a distinct roller coaster). Where I could not use RCDB (mainly for travelling coasters), I used an alternative source such as Coasterpedia, or as a last resort Captain Coaster itself in a couple of cases. If it is relevant, height, speed and length statistics were logged using the imperial measurement system (aka ft for height and length, mph for speed).

The term "statistics" is very broad; technically, anything about a roller coaster can be a statistic! But for the purposes of this analysis, I limited my dataset to five key statistics; height, speed, length, number of inversions and opening year. I chose these five as I felt that they were likely to be five of the most influential concrete statistics in terms of swaying people's opinion on a ride, and they are also by far the most prevalently present roller coaster statistics on RCDB.

Before I begin, I should also stress a few caveats. These are:

The data entry and filtering were performed manually by myself, so there is the potential for a touch of error. I tried my level best to be rigorous, but I am only human!
I'm aware that some may question my use of Captain Coaster as a source. However, I do feel that it's the most representative sample of the community hive mind I could think of. It has a wide user base, and it now encompasses a great number of different rides. The only alternatives I could think of were the Mitch Hawker poll, which hasn't run in over 10 years, and the Golden Ticket Awards roller coaster ranking, which is commonly dismissed as being too biased in favour of the USA, so I felt that Captain Coaster was a favourable choice.
Whatever my experiments find, it does not in any way counteract your own individual opinion! I am simply relaying the opinions of the enthusiast community's overall "hive mind" (or more specifically, the opinions of my chosen sample of the enthusiast community) on the matter; you are of course free to hold your own views on the matters discussed!

With some of these caveats out of the way, let's get into the fun side of the analysis, shall we? For Part 1 of this analysis, I decided to stick to our good old friend... correlation analysis!
Part 1: Correlation Analysis of the Relationship between various statistics and Ranking
Initially, I'm going to stick to the (relatively) simple end of the spectrum and perform some correlation analyses on various statistics and their relationship with ranking in the list.

For those who haven't read any of my previous theme park data analyses or need a refresher, let me once again briefly explain correlation. In essence, correlation measures the strength and direction of relationship between two variables. It can fall between -1 and 1, with -1 representing a perfect negative correlation (i.e. "as variable 1 increases in value, variable 2 decreases in value"), 0 representing no correlation (i.e. "variable 1 increasing in value has no effect on the value of variable 2"), and 1 representing a perfect positive correlation (i.e. "as variable 1 increases in value, variable 2 increases in value in tandem").

For each of these analyses, I also decided to use two correlation coefficients, for the sake of transparency. Pearson's correlation coefficient measures the strength of the linear relationship between two variables, while Spearman's correlation coefficient measures the strength of any relationship between two variables regardless of linearity.

For the purposes of differentiation, I should define the following thresholds of absolute correlation coefficients in terms of denoting strength. I can't remember where I heard these thresholds, but I quite liked them:

Below 0.3: No significant correlation
0.3-0.5: Weak correlation
0.5-0.7: Moderate correlation
0.7-0.9: Strong correlation
Above 0.9: Very strong correlation

Finally, I should also probably point out that as a "high" ranking equates to a low physical number in this particular instance, a negative correlation coefficient will actually represent a positive effect on ranking here, if that makes any sense at all?

Let's dive into it, shall we? I firstly looked at the big one... height!
Height
When plotted on a graph, the relationship between height and ranking looked something like this:

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Height and Ranking, the results were as follows:

Correlation Coefficient Type	Coefficient Value (2dp)	Strength and Nature of Correlation
Pearson	-0.55	Moderate Negative Correlation (Moderate Positive Effect on Ranking)
Spearman	-0.60	Moderate Negative Correlation (Moderate Positive Effect on Ranking)

So from this, we can conclude that Height is moderately correlated with Ranking. As a general rule, taller roller coasters will rank higher than shorter ones on average, with height having a moderate effect on ranking. This is not a perfect correlation, so this rule will not apply in every instance, but I think we can safely infer a general rule of taller coasters being ranked more highly from this. This could be because in general, taller coasters are inherently targeted more towards thrillseekers and pitched at a higher intensity level, so taller rides are more likely to appeal to thrillseekers than shorter ones.

Let's now look at speed...
Speed
When plotted on a graph, the relationship between Speed and Ranking looked something like this:

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Speed and Ranking, the results were as follows:

Correlation Coefficient Type	Coefficient Value (2dp)	Strength and Nature of Correlation
Pearson	-0.65	Moderate Negative Correlation (Moderate Positive Effect on Ranking)
Spearman	-0.69	Moderate Negative Correlation (Moderate Positive Effect on Ranking)

As such, I think we can infer that Speed and Ranking are moderately, bordering on strongly, correlated. Faster rides, in general, rank more highly. Interestingly, the correlation coefficients for speed are a good 0.1 higher than their equivalents for height, indicating that speed is a more influential statistic than height in determining people's ranking of a ride. This could be explained by looking at some patterns among high-ranking coasters; one thing I notice is that highly ranking rides that aren't especially tall often have launches or terrain usage that give them speed that punches above their height statistic. One high-ranking example of this I'd cite is Taron at Phantasialand, which ranks in the #21 spot; at only 98ft tall, it is not an especially tall coaster, but its top speed of 73mph ranks much more highly in the speed stakes, so its speed compensates heavily for its lack of stature. With this in mind, it could be argued that speed is perhaps a more direct determinant of thrill level than height; a tall coaster will usually be fast, but a fast coaster will not always be tall.

Let's take a look at length now...
Length
When plotted on a graph, the relationship between Length and Ranking looked something like this:

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Length and Ranking, the results were as follows:

Correlation Coefficient Type	Coefficient Value (2dp)	Strength and Nature of Correlation
Pearson	-0.49	Weak Negative Correlation (Weak Positive Effect on Ranking)
Spearman	-0.57	Moderate Negative Correlation (Moderate Positive Effect on Ranking)

Resultantly, we can infer that Length and Ranking share a weak-to-moderate correlation. In general, longer rides are ranked more highly, but interestingly, this relationship is not quite as strong as in the case of height and speed. I would suggest that similarly to what I said about height and speed, this is because length is a slightly less obvious determinant of thrill level and intensity pitching. While tall and fast coasters often are proportionally long, they are not always, and there are also long coasters that aren't pitched overly high in terms of thrill level. For example, family mine train coasters, such as the Big Thunder Mountain rides at Disney parks, are often long in track length, but relatively tame in thrill level. And in this particular dataset, alpine coasters are also included, which can be very long but are often not targeted at a particularly high intensity level. People always say "it's not the length that matters, it's what you do with it", and while the correlation between length and ranking would infer that increased length does generally make a ride rank more highly, the relatively weaker nature of this correlation compared to height and speed would suggest that there might be something in this saying!

Let's move away from height, speed and length and look at something a tad more discrete... inversions!
Inversions
When plotted on a graph, the relationship between Inversions and Ranking looks something like this:

Relationship-between-Number-of-Inversions-and-Ranking.png

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Inversions and Ranking, the results were as follows:

Correlation Coefficient Type	Coefficient Value (2dp)	Strength and Nature of Correlation
Pearson	-0.24	No Significant Correlation (No Significant Effect on Ranking)
Spearman	-0.26	No Significant Correlation (No Significant Effect on Ranking)

From this, I think we can see that inversions are a significantly weaker statistic in terms of predicting how highly a ride will be ranked, not really having any overly significant correlation with ranking. In my view, the blame for this can be placed squarely at the door of how wide the range of non-inverting roller coasters is; if you look at the scatter graph, you can see that 0 inversions covers pretty much every inch of the ranking continuum! Fury 325 at Carowinds, the #10 ranked roller coaster in the list and one of the tallest and fastest operating coasters in the world, has no inversions, along with many of the other tallest, fastest and longest coasters in the world... but every single low-ranked, tin-pot family coaster also has no inversions. So when you consider that fact alone, number of inversions is really quite a poor discriminator in terms of determining ranking. If you were to take non-inverting coasters out of the equation or normalise for the effect of another statistic such as height or speed, it might be more influential, but without any adjustment, it doesn't really have any overly significant effect on ranking.

Let's finally look at a slightly different metric to see if age has any influence... opening year!
Opening Year
When plotted on a graph, the relationship between Opening Year and Ranking looks something like this:

And when I looked at the Pearson and Spearman correlation coefficients for the relationship between Opening Year and Ranking, the results were as follows:

Correlation Coefficient Type	Coefficient Value (2dp)	Strength and Nature of Correlation
Pearson	-0.17	No Significant Correlation (No Significant Effect on Ranking)
Spearman	-0.25	No Significant Correlation (No Significant Effect on Ranking)

With this in mind, I think we can say that Opening Year has no significant correlation with Ranking, and thus, a very limited effect on how highly a ride is ranked. I would blame this on the fact that each opening year has a very broad range of rides opening within it; while thrillseekers' minds might be drawn to the big new thrill rides each year, there are also small, lowly-ranking family coasters opening each year as well. Therefore, opening year alone, like number of inversions, is quite a poor discriminant without adjusting for the effect of another statistic such as height or speed, and its effect on ranking is very limited.

Let's now summarise some final conclusions for Part 1...
Conclusions
So, having performed some correlation analyses on various ride statistics, let's reflect on our base question. To what extent is the phrase "Statistics aren't everything" true, and to what extent do statistics influence how highly a roller coaster is ranked?

Well, I think the analysis shows us that it strongly depends on the statistic you choose to focus on.

Some statistics did have significant correlations with ranking. Height and speed had moderate correlations with ranking, suggesting that on average, taller and faster rides are generally ranked more highly. Speed had a particularly significant correlation with ranking, with its own correlation coefficients bordering on the threshold of a strong correlation. To a slightly lesser degree, length also had a significant correlation with ranking. On average, longer rides are ranked more highly, but the relationship was not quite as strong as for height and speed.

However, other statistics did not have overly significant correlations with ranking. Both number of inversions and opening year had little significant effect on ranking, likely due to the breadth of rides being covered by certain values. Without normalisation for the effect of another more influential statistic, these two are unlikely to be having any major effect.

So overall, then, I'd perhaps argue that the phrase "Statistics aren't everything" is true to an extent. The correlations shown were not perfect correlations in any case, so there will always be exceptions, and in the case of how old the ride is or how loopy the ride is, those statistics aren't really having much effect at all. With that being said, I would argue that the phrase is not entirely true; the correlations ranking shared with height and speed in particular would suggest that building taller and faster can have some positive effect on ranking!

Thank you for reading Part 1; I hope you've enjoyed this analysis! I'm receptive to any feedback, good or bad! In Part 2, I'm going to do something a little bit more technical and delve into some regression algorithms to see whether statistics can be used to predict how highly a ride will rank within the list, as well as to see how much of a ride's ranking they actually explain!

If you'd like to take a look at the dataset underpinning this analysis, the Google Sheet is here: https://docs.google.com/spreadsheets/d/1OfxlATMBJJwN1VjW7sb0jKwIFywJcACbZYJHIxbjdVc/edit?usp=sharing

And if you are Pythonically inclined at all, here is a Google Colab notebook showcasing my Python code for you to look at, if you'd like to: https://colab.research.google.com/drive/1lWPWocnkxwvkT706QbZdz6B9SSwk-kB3?usp=sharing
Summary: In enthusiast circles, people commonly dismiss statistics as an important factor in how highly a roller coaster is rated, so I decided to analyse the effect that 5 different statistics (height, speed, length, number of inversions and opening year) have on a roller coaster's community ranking using data from Captain Coaster. When correlation analyses were performed, it was found that height, speed and to a lesser extent length were all significantly correlated with ranking. Speed had a particularly strong correlation, with its coefficient bordering on the threshold of a strong correlation. Inversions and opening year were less significantly correlated with ranking, meaning that neither have an overly significant effect in isolation. As such, we can conclude that "statistics aren't everything" to some extent depending on the statistics you choose, but that some do have a definite positive effect on ranking on average.

Graeme · Friday at 9:47 AM

It seems to me that your data, Matt, roughly backs up what I've always thought - that stats do matter, but not the extremes. I think you need a minimum of about 60ft for a great ride, with (for me) a maximum of 150ft. I think your data says you get diminishing returns after about 130ft, which sounds about right. Any bigger than that and you're getting something that's harder to maintain for questionable benefits, apart from the initial marketing.

Length is about 3,000ft which, again, sounds about right. I think with length you also get that thing where it has to be long enough compared to the height, otherwise there's something that's just not satisfying about it.

Inversions at 5-7 also sounds about right, otherwise you run the risk of it flipping you upside-down just for the sake of it, with no non-inverting sections in between.

I am more surprised at opening year, as your stats appear to show that anything from the mid-90s has an equal chance of ranking high. I say I'm surprised because I got the impression the more fluid, RMC-inspired, multi-launching modern coasters were a lot more highly regarded. But it just goes to show, if you do it right in the first place, it lasts.

Matt N · Friday at 10:33 AM

I apologise, as I didn’t realise that the post pasted in with weird formatting last night… I’ve now sorted out the formatting, and it should now look a bit better and be a bit easier to read!

Matt N · 2025-05-11T02:41:49+0100

Disclaimer: This is, once again, a long, geeky post. If you don't like statistics or maths talk, turn back now! If you'd like a more concise summary, a short summarising paragraph can be found at the bottom.
Part 2: Regression Models (How accurate are raw statistics alone at predicting a coaster's ranking?)
Right then, folks; let's delve into the second part of this analysis! And in this part, I'm going to go slightly more technical and develop some regression models to see just how accurate raw statistics alone are at predicting a coaster's ranking!

Before we dive in deeply, let me firstly explain in relatively basic terms what regression actually is, for those who might not know. In short, regression is a way of quantifying the impact that a series of predictive variables have upon a continuous, numerical target variable and using this to make predictions of values of the target variable based on the predictors. For example, a regression model could be used to predict the sale price of a house, with variables such as square footage, number of bedrooms, distance from a train station etc being used as predictors. In this instance, Ranking will be our target variable and Height, Speed, Length, Inversions and Opening Year will be our predictive variables.

Let me now explain some of my choices in the first vital step of any machine learning project; pre-processing!
Pre-Processing
Before you do anything with a machine learning model, you always need to pre-process your data. Certain data might need handling in a certain way, and there are always things that need to be considered before you dive straight into model implementation!

The first thing that I felt was particularly important to consider with this particular dataset was missing values. Quite a number of these rides had at least 1 statistic missing out of the 5; in fact, a total of 660 roller coasters were actually missing at least one statistic, which equates to around 35.4% of the data points. There are multiple ways in which you can handle records with missing values; you can impute missing values and replace them with another value (e.g. the mean value), you can interpolate them (fill in the gaps based on the other variable values) or you can remove any records with missing values. It was a tough choice, but I opted to remove in this instance. While this did sacrifice a considerable number of data points, I felt that imputation of a value such as the mean was an overly blunt instrument for this use case and would likely skew relationships (for example, this would say that a ride of 400ft tall and 5,000ft of track with no speed value would only go 45mph, if 45mph was the mean speed). With many of the coasters missing statistics missing multiple statistics (the majority were missing at least 2, and quite a few were missing 3 or 4 out of the 5), I also felt that interpolation was likely to skew relationships. For example, many of the alpine coasters were missing height and speed statistics, but had some of the longest lengths. Interpolation on an alpine coaster with 7,000ft of track, for instance, would likely return some absolutely obscene height and speed values that are unlikely to parallel reality. Thus, I felt that losing the 660 records entirely was the least worst option here; while it reduced the dataset size to 1,203, it at least ensured accurate relationships for the data that remained.

The other thing that I felt was important to handle was feature scaling. In regression, it is often the case that in an unscaled dataset, features that are on the largest scale have the most influence. So in this instance, length and opening year would likely have the most influence due to their scale being the largest. As such, I applied min-max scaling to normalise the feature scales and prevent this from happening. Min-max scaling essentially converts raw values into percentiles (e.g. the lowest height value would have a height value of 0, the middle height value would have a height value of 0.5, and the highest height value would have a height value of 1).

Furthermore, I should also add that I split the data into training and test sets, with 80% of the data allocated to training and 20% held out and allocated to testing. This ensures that the generalisation ability of the regressors can be tested; to be truly accurate, a regressor needs to be able to perform on unseen data.

So now we've handled pre-processing, let's move onto the meat of the regression models themselves!
Regression Models
Before I dive too deeply into the regression models, I should probably familiarise everyone with the statistics I am going to use to assess model performance and truly answer the base question. The three performance metrics I am going to use are:

R Squared (R2): R2 is essentially a measure of how much of the variation in the target variable can be explained by the model and the given predictors. It is a % value, and can go up to 100%. An R2 value of 100% would denote a model that could explain all of the target variable variation, an R2 value of 0% would denote a model that can explain the variation no better than randomly guessing could, and a negative R2 value would denote a model that was actively misleading compared to randomly guessing.
Root Mean Squared Error (RMSE): The gold standard in regression assessment metrics, RMSE is a measure of the average prediction error. It is slightly different from simply averaging the prediction errors (that would be mean absolute error, or MAE) in that it penalises larger errors more heavily, leading to slightly more of a "worst case" performance measure. It is popular because it can be directly mapped to the unit of the target variable (so in this case, RMSE can be expressed and easily interpreted in terms of number of ranking spots). An RMSE of 0 equates to no error, so lower is better.
Scatter Index (SI): Scatter Index is essentially the RMSE as a percentage of the mean target variable value in the test set, to provide a sense of scale and put the RMSE into perspective. An SI of 0% equates to no error, so lower is better.

Now I've explained that, let's dive into the models!

The first approach I tested was linear regression. This is by far the simplest form of regression; if you're familiar with the straight line equation y = mx + c, where m denotes the gradient (the slope of the line) and c denotes the y intercept (the value of y when all x values equal 0), linear regression builds heavily on this concept. A linear regression model effectively consists of the straight line that aims to minimise the average prediction error in a case where there's only one predictor. As we have multiple predictors, it's a little bit more complicated here, but the same basic principle applies. The model, in basic terms, is a line of the form y = m1x1 + m2x2 + ... + c, where each x represents a variable value, each m represents its coefficient (the impact that value has upon y), and c represents the y intercept. The model is fitted to the dataset, and as discussed, it is the line that produces the minimum prediction error.

When linear regression was tested on the dataset in question, the results on the test set were as follows:

Performance Metric	Value (1dp)
R Squared (R2)	42.5%
Root Mean Squared Error (RMSE)	423.3
Scatter Index (SI)	50.6%

So from this, I think we can conclude that linear regression is not making the statistics terribly accurate at predicting ranking. The linear regression model is only explaining around 42% of the variation in ranking, which still leaves over half unaccounted for, and on average, its predictions are off by 423 ranking spots, which is not that great! The model does provide meaningful information compared to a random guess, but it's not terribly accurate at this point.

I should stress, though, that linear regression is by far the simplest tool in the shed of regression algorithms! With its simplicity, there is naturally a degree of bluntness that comes along with that; it makes assumptions that often don't hold up in the real world, such as that all relationships are completely linear.

As such, I think we should move onto a different regression algorithm... I next decided to try an ensemble method in the form of random forest regression. Random forests are built off of a different kind of model called decision trees, which effectively work kind of like moving through a decision-based flowchart. In a regression context, decision trees split data down into increasingly refined categories and assign a y value to each category (e.g. "if height is above 200ft, y = 100, if height is below 200ft, y = 500"). As variable values are ascertained, the predictor moves further and further down the tree until the final branch is reached and a y value is assigned to that specific combination of variable values. However, standalone decision trees are quite unstable and prone to overfitting (memorising the training data word-for-word as opposed to learning general trends), so to mitigate this, random forest regression integrates multiple decision trees together and averages their predictions to come up with a weighted vote of sorts. It's kind of like asking a panel to vote on something rather than asking just one person.

I optimised my random forest regressor using a grid search (a computational approach that finds the best parameters by testing every permutation in a grid), and when this was applied to my dataset, the results on the test set were as follows:

Performance Metric	Value (1dp)
R Squared (R2)	65.2%
Root Mean Squared Error (RMSE)	329.0
Scatter Index (SI)	39.3%

So from this, I think we can ascertain that random forest regression improved quite considerably upon linear regression, explaining 65% of the variation in rankings using the statistics provided! But still, I'd argue it has considerable room for improvement in terms of accuracy; 35% is not an insignificant percentage of variation to be left unaccounted for.

There's one more approach that I decided to try... I decided to bring out the big guns and try a neural network! Neural networks are probably the most sophisticated form of machine learning around today; they fall into a sub-category known as deep learning, and at their most complex level, they even function as the tech underpinning the likes of ChatGPT!

In essence, neural networks are inspired by the human brain. They are complex models with many layers, and each layer contains many neurons. Data is passed down through the layers and has mathematical equations performed on it until an output is eventually regurgitated out. Neural networks are trained through a principle known as reinforcement learning, wherein the model is shown the data over a number of training periods (known as epochs) and allowed to gradually form neural connections and learn how to do the activity it's being trained to do (determining a coaster's ranking, in this case). At the end of each epoch, the model is evaluated on a validation set, and it takes the feedback on board to learn further in future epochs. It's kind of like a human learning a new skill; it starts off terrible, but gradually practices and gets better over time.

Now I should add that building neural networks is a very complicated science, and not at all exact. There are many, many parameters you can optimise and play around with, and many different types of layers and enhancements you can add! I played around with quite a few different settings, but I settled on 5 regular dense layers with differing numbers of neurons, supplemented by a dropout layer in the middle. The dropout layer essentially makes the model forget aspects of what it's learned in order to foster better generalisation and prevent overly intense memorisation of the training set. I also added an early stopping callback, to prevent the model from training beyond the point where optimal accuracy is attained, as well as a learning rate scheduler which drops the learning rate on plateau to get the most optimal learning rate. Learning rate is a very important parameter; if it's too high, the model converges on sub-optimal accuracy, but if it's too low, the model will never converge.

When my neural network was applied to my dataset, the results on the test set were as follows:

Performance Metric	Value (1dp)
R Squared (R2)	54.5%
Root Mean Squared Error (RMSE)	376.5
Scatter Index (SI)	45.0%

Interestingly, the neural network was actually worse than the random forest regressor, despite being more complicated. It only explained 54% of the variation in rankings, and its predictions were off by 377 ranking spots on average, so it's still not particularly accurate! I guess that's proof that the most complex approaches aren't always the best approaches! However, I have two theories as to why a neural network didn't work as well here:

The training dataset was too small. These neural networks are data-hungry, and ideally, they seek datasets in at least the tens of thousands and hundreds of thousands, if not in the millions, to train on, so this ~1,000 strong training dataset was definitely suboptimal in terms of size for a neural network.
The model may not have been optimal, unlike the other two. The possibilities are practically endless with a neural network, so while I did test various approaches, who knows if the approach I picked is optimal? You could technically do a grid search with a neural network, but it would be painfully, painfully slow...

But wait! That's not all! One final machine learning tool we could test out here is feature selection...
Feature Selection
When you have a dataset, it's very common for not every feature to be providing huge amounts of value or for some even to be misleading. Feature selection is a way of mitigating this; it helps sort the most useful variables from those that aren't providing much value. This can reduce model complexity in a manner that reduces accuracy by the lowest amount, or even potentially increase accuracy due to the prevention of overfitting.

Many feature selection methods exist, but I'm going to use a simpler one that builds on Part 1 somewhat; correlation. Using my significant correlation threshold of 0.3, I decided to remove Inversions and Opening Year from the dataset (these had correlation magnitudes of below 0.3), as they weren't really showing significant relationships with the target variable of Ranking.

When feature selection was applied to a linear regression model, the results on the test set were as follows:

Performance Metric	Value (1dp)
R Squared (R2)	36.8%
Root Mean Squared Error (RMSE)	443.8
Scatter Index (SI)	53.0%

When feature selection was applied to a random forest regression model, the results on the test set were as follows:

Performance Metric	Value (1dp)
R Squared (R2)	58.3%
Root Mean Squared Error (RMSE)	360.5
Scatter Index (SI)	43.1%

When feature selection was applied to a neural network, the results on the test set were as follows:

Performance Metric	Value (1dp)
R Squared (R2)	38.7%
Root Mean Squared Error (RMSE)	437.0
Scatter Index (SI)	52.2%

With these results in mind, I think it's safe to conclude that feature selection probably didn't have any benefit in this instance, with R2 values falling by 7-16% and RMSEs increasing by a fair bit. In all honesty, the number of variables was probably a bit too small to warrant feature selection; it works better when you have a number of variables in double figures to sift through!

Let's now wrap things up and summarise the conclusions of this part...
Conclusions
So to conclude, let's loop back to the original question we started this part with: "To what extent can raw statistics accurately predict a roller coaster's community ranking?".

In answer to this question, I think it's safe to say; not a particularly large extent. The models did all improve on a random guess, with some like random forest regression accounting for up to 65% of the variation in ranking, but all of them were returning RMSEs in the order of triple figures, so there is considerable room for improvement. This proves that statistics are by no means meaningless, but that they are also pretty far from telling the whole story.

As much as these particular regression models are not going to win "Accurate Regression Model of the Year", I still think that they provide a valuable insight in that they prove that raw numbers alone are useful to an extent, but have their considerable limitations. It's really important to acknowledge that raw numbers have limits, in my view, and in a case like coaster ranking, there's so much that goes into it that simply can't be quantified. You may be able to put a number on height or speed, but you can't attach a number to "vibes", and they (or at least, the woollier qualitative aspects of a coaster) do account for a considerable amount of a coaster's reception.

Thank you for reading; I hope you've enjoyed this two-part analysis! I hope that it's provided some interesting insights; even though I'll admit that I still have a lot to learn about data science, with my only current qualifications/experience in the matter being an undergraduate degree in Computer Science and a currently-in-progress postgraduate degree in Data Science and Analytics, I hope I've still helped to provide more quantitative clarity to this question!

If you'd like to see my Python code, I once again have a Google Colab notebook available for your perusal: https://colab.research.google.com/drive/15yk8F-norc_CbgdfyBT7LBwNjKnYRrhz?usp=sharing
Summary: Following on from Part 1, I decided to try and build some regression models to determine how accurate raw statistics are as a predictor of a coaster's community ranking. I tested three model types; linear regression, random forest regression and a neural network. The random forest regressor returned the highest accuracy, being able to account for around 65% of the variation in rankings. Correlation-based feature filtering was also attempted, with height, speed and length alone being tested as predictors, but this did not really add any benefit. These results prove that raw statistics are useful to an extent, but that they alone cannot provide a particularly accurate prediction of community ranking, with qualitative aspects of a coaster also playing a highly important role.

Heth · 2025-05-11T06:57:18+0100

I have to say I do not understand most of what you are saying, but I cannot help but be impressed and captivated by it.

Hyde · 2025-05-11T16:09:49+0100

Great work @Matt N!

On Part 1 - Interesting to see the correlation results, which mostly align with correlation I've run on my own personal rankings (Pearson correlation being used here):

FWIW, I'd argue the Spearman correlation is actually the more apt approach, as Pearson inherently assumes a linear relationship between data and correlation movement - Spearman would give more wiggle room for jumps and gaps in height, speed, inversion, etc. stats. Inversion count for instance is particularly non-linear, as some world class coasters will have zero inversion, while other top fivers could have a flip or two.

Moving into Part 2, it's exciting to see the different model designs. It's not surprising that as more complexity is added to the model, such as with neural networks, the risk of overfitting increases. This is likely due to having too few data parameters, which can lead regression models to chase noise or find spurious patterns. It would be interesting to explore whether adding new data fields, such as drop height or drop angle, could improve the R^2 values and overall model performance.

However, it's crucial to remember a fundamental concept in statistical regression models: "regression to the mean." This principle assumes that the sample of all roller coaster statistics follows an average distribution. This is significant because it implies that roller coasters with extremely high or low statistics (e.g., Falcon's Flight) are outliers. Their rankings may be influenced by other, external factors not accounted for in the model. In contrast, more "average" roller coasters have a higher confidence interval, suggesting that their length, height, and other factors are more likely to be the determining elements of their ranking.

For instance, Falcon's Flight might have a high ranking due to its massive height, length, and speed. However, as an outlier in statistics, there are likely other external factors also impacting its rank. For example, "shorter, slower" roller coasters (e.g., RMCs like Zadra, Steel Vengeance, or Iron Gwazi) could still achieve high rankings due to other qualitative aspects such as airtime or layout design. There are always other qualitative factors at play, especially when roller coasters are vying for top rankings.

Agree overall that all of these models are:

Fun!
Can predict general sentiment for a roller coaster with given statistics.
Are easier to predict ranking for roller coasters with more average statistics, versus those will more outlying statistics/attribute.

Something that would be fun is to run new for 2025 roller coasters through your models @Matt N before they open and see which model most accurately predicts roller coaster rank. Siren's Curse for instance could be a good candidate given it's late opening!

Search

Search

"Statistics aren't everything"; to what extent is this true? A statistical analysis into how roller coaster statistics affect ranking

Matt N

CF Legend

Graeme

Mega Poster

Matt N

CF Legend

Matt N

CF Legend

Heth

Mega Poster

Hyde

Matt SR