At the risk of beating a dead horse, anyone who thinks that one or two testers, however superb and experienced they are, can scientifically evaluate a boot, a ski, or a banana for that matter, is, ah, optimistic. Choosing the variables is the easy part. The real problem here will be associating those variables with on-slope performance. And that requires some thought about how you SAMPLE on-slope performance. With apologies to those who already know all this, an example:
Each tester will have a different slope-side take on how a variable, say lateral boot stiffness, influences its performance. So if he/she awards a score, say a 4.5, that score includes a "real" component (the "actual" performance) and an error component (doesn't mean "mistake," but rather how replicable and accurate his/her scores are) There are ways of getting at this error, but without them you'll never know how much of that 4.5 is real and how much is noise.
The noise in turn can be random or systematic (termed bias, again no perjorative meaning), where one skier unintentionally tends to score one brand higher or lower than another. I'd guess expert skiers have fairly low random error, but bias is another story. Peter K on Realskier, for instance, loves Heads and says so. Then there's tester differences in size, build, strength, and experience. Female experts may lean toward a lighter ski at any length, like in the recent scores for Fischers in the Ski Canada tests. That's informative, obviously, but because it includes both real and error terms, we can't be sure HOW informative.
Put another way, if a Fischer gets a 3.7 for long turns and a Head gets a 3.9, how much of that difference is "true" (Heads make better long turns), and how much is noise (no real difference)? To find out, you can either have hundreds of tests of Fischers and Heads, which reduces the impact of noise, or you can start out by correcting the noise, and then you'll be OK with a smaller number of tests.
So how do we "correct" all this stuff? Well, one way is after the fact - huge numbers of testers (hundreds) under large numbers of conditions and terrain, and regressed scores; the biases and errors partly cancel out. I deduce that Realskiers does some version of the first.
Another way is before the fact. Take a much smaller number of expert skiers, measure each's test-retest and accuracy against a standard. You actually "teach" each skier beforehand to evaluate each ski with similar error. When you read a newspaper article about some big new study on heart disease or nutritional status, anything that involves more than one person collecting data, that's the approach they've honed before the techs ever go out the door.
I haven't found any ski site that does this. Ironically, when we read and compare evals from a dozen different mags and sites, we're doing a third approach, where the errors of each "study" are lumped together in our heads and (we hope) evened out. Sort of like reading all the college football polls and deciding who the real #3 is. (Go Blue. Sigh.)
But you just don't get a grip on this stuff by having a half dozen great skiers do one run on a taped ski or boot. That's why magazines and websites have such bizzarely divergent takes on the same ski. It isn't "preference," it's plain old error.
Now I happen to like reading tea leaves among all the ski tests, and mulling them over on these threads. But then, I also like footballs polls more than a national championship. More stuff to argue about on cool fall nights at the local pub.
But if you claim to want to associate all these numbers about flex and weight and such with real performance in a way that is technically superior than other approaches, your MAIN task is to handle slope test error. Just the way it is, folks...