<
>

Page 2 of 2

1

2
 Thread (44 posts)
Mahni  4/20/08 8:32:03 PM

Rank: 4/100 Rank: 4/100 Rank: 4/100 Rank: 4/100 Rank: 4/100

Novice Member

Joined: 1/31/08
Posts: 58

Originally posted by apwrsmage

In statistical analysis, depending on what you're doing, the ideal sample size is 150 to 3,000. A sample size in the millions starts to pollute the numbers.

"Ideal" sample size depends on estimated effect size in the population - there could be cases were a very small effect size posed such a great risk (say a small % increase in aircraft failures) that it would warrant a large sample size to detect it.

But for this case, your point is valid - a sample size of 150 would be enough to detect an effect size of interest (assuming no analysis of higher order interactions).

Larger sample sizes (for example, "in the millions") would not "pollute the numbers" or introduce any type of systematic bias.  If you have a large sample (say hundreds of thousands of transaction records, as an example) and you were to fit the *entire* sample with certain techniques (for example, a decision tree algorithm such as CART or CHAID) you could end up with a result that was not *generalizable* due to overfitting, but this would be the fault of the statistician for not using an appropriate validation technique, *not* for having "too much data".

As I said in my previous post, my greatest concerns here are how representative the data is, and with statistical validity.  I don't have concerns about power or whether differences are statistically significant.

 

 
Czzarre  4/20/08 9:31:37 PM

Rank: 61/100 Rank: 61/100 Rank: 61/100 Rank: 61/100 Rank: 61/100

Hard Core Member

Joined: 9/10/07
Posts: 2042

MMORPG Character Monuments

...When its time for your character to take a well deserved rest...

I appreciate everyones input. Especially the latter statistical analysis. Thankfully I will not be publishing any of this data in Epidemiology journals. As for validity....well, the degree to which this data supports the intended conclusion ...all I can say, is that these numbers represent the searches of players who at least had some inclination to look for other MMORPGs.

Limitations would include
Single players doing multiple searches
"playing around"..people just pluggin in different criteria just to see what comes up yet having no desire to actually play a "P2P 2D graphic Pirate MMORPG" (Yes thats a real search)
catagories are not all inclusive.

To this point, the majority of visitors have found the finder code helpful and easy to use. An although that may not be statistically significant, its the result I care most about.

Thanks for everyone's input. I will keep working on it.

Torrential

P.S. "you need millions of searches for anyone to care or listen" ..... Boy? What are you smoking?!?! :)

apwrsmage  4/20/08 11:13:37 PM

Rank: 1/100 Rank: 1/100 Rank: 1/100 Rank: 1/100 Rank: 1/100

Novice Member

Joined: 4/04/08
Posts: 38


Originally posted by Mahni

Originally posted by apwrsmage

In statistical analysis, depending on what you're doing, the ideal sample size is 150 to 3,000. A sample size in the millions starts to pollute the numbers.


"Ideal" sample size depends on estimated effect size in the population - there could be cases were a very small effect size posed such a great risk (say a small % increase in aircraft failures) that it would warrant a large sample size to detect it.
But for this case, your point is valid - a sample size of 150 would be enough to detect an effect size of interest (assuming no analysis of higher order interactions).
Larger sample sizes (for example, "in the millions") would not "pollute the numbers" or introduce any type of systematic bias. If you have a large sample (say hundreds of thousands of transaction records, as an example) and you were to fit the *entire* sample with certain techniques (for example, a decision tree algorithm such as CART or CHAID) you could end up with a result that was not *generalizable* due to overfitting, but this would be the fault of the statistician for not using an appropriate validation technique, *not* for having "too much data".
As I said in my previous post, my greatest concerns here are how representative the data is, and with statistical validity. I don't have concerns about power or whether differences are statistically significant.

Using a sample set in the millions, unless you're dealing with a very straight-forward, black and white situation can introduce a certain amount of random variance that, depending on the situation, can play with power. Or, again depending on the situation, the differences in conclusions from the larger sample size are negligible enough to not warrant the extra effort... say a 20.05% trend versus a 20.03% trend.

Nevertheless, in this situation a control certainly can't be established given the nature of the situation and its uses, but the tool isn't being used for clinical trials, and there are so many possible ways the tool can be used it can't declare a definitive result, it can only show a trend.

Which brings us all back to the original point: A trend. The tool shows the trend of curiosity among those that have used it as to what games and systems they're currently looking for. It seems to me that was the whole point of the tool in the first place. No more, no less.

 
Vendayn  4/21/08 12:54:14 AM

Rank: 95/100 Rank: 95/100 Rank: 95/100 Rank: 95/100 Rank: 95/100

Guide

Joined: 12/30/04
Posts: 2650

Ask a question and you are stupid for 30 seconds, never ask and you are stupid for life.

I enjoy your site, I have it saved under my favorites. You did a great job on it

 

And interesting about what players look for, it will be interesting to see what it looks like after you get a lot more searches.

------------------Signature-----------------
My future site for the Fantasy stories I've typed(in development):
www.dragon-masters.com

My MMORPG.com blog: http://www.mmorpg.com/blogs/Vendayn

Mahni  4/21/08 12:54:24 AM

Rank: 4/100 Rank: 4/100 Rank: 4/100 Rank: 4/100 Rank: 4/100

Novice Member

Joined: 1/31/08
Posts: 58

Originally posted by apwrsmage

 


Originally posted by Mahni

Originally posted by apwrsmage

 

In statistical analysis, depending on what you're doing, the ideal sample size is 150 to 3,000. A sample size in the millions starts to pollute the numbers.


"Ideal" sample size depends on estimated effect size in the population - there could be cases were a very small effect size posed such a great risk (say a small % increase in aircraft failures) that it would warrant a large sample size to detect it.
But for this case, your point is valid - a sample size of 150 would be enough to detect an effect size of interest (assuming no analysis of higher order interactions).
Larger sample sizes (for example, "in the millions") would not "pollute the numbers" or introduce any type of systematic bias. If you have a large sample (say hundreds of thousands of transaction records, as an example) and you were to fit the *entire* sample with certain techniques (for example, a decision tree algorithm such as CART or CHAID) you could end up with a result that was not *generalizable* due to overfitting, but this would be the fault of the statistician for not using an appropriate validation technique, *not* for having "too much data".
As I said in my previous post, my greatest concerns here are how representative the data is, and with statistical validity. I don't have concerns about power or whether differences are statistically significant.

 

Using a sample set in the millions, unless you're dealing with a very straight-forward, black and white situation can introduce a certain amount of random variance that, depending on the situation, can play with power. Or, again depending on the situation, the differences in conclusions from the larger sample size are negligible enough to not warrant the extra effort... say a 20.05% trend versus a 20.03% trend.

Nevertheless, in this situation a control certainly can't be established given the nature of the situation and its uses, but the tool isn't being used for clinical trials, and there are so many possible ways the tool can be used it can't declare a definitive result, it can only show a trend.

Which brings us all back to the original point: A trend. The tool shows the trend of curiosity among those that have used it as to what games and systems they're currently looking for. It seems to me that was the whole point of the tool in the first place. No more, no less.

Sorry for the derail here...

Large sample sets do not "introduce" "random variance".  If you are saying large sample sets somehow increase unexplained variance OR variance from individual differences (think error terms in structural equation modeling), they do not.  If you are saying they introduce some form of systematic bias, they do not.  If you are saying that large sample sizes can create a problem due to sampling error, they can but thats a sampling error problem.  As sample sizes go up, confidence intervals go down, whether its a sample size of one hundred or ten million.

Not having a control (group) is irrelevant here.  Descriptives and estimation of population parameters (population mean, variance) can always be done with a sample of the population without a control group.  If you are implying that only "clinical trials" give definitive results, that is incorrect.  Observational or psuedo-experimental designs are just as valid (and in some cases *more* valid as a design) than experimental designs (such as a factorial design including a control group).

When you say "there are so many possible ways the tool can be used...", if you are saying this is not a controlled experiment, I completely agree.  But thats a validity / generalizability issue (that I pointed out in earlier posts).  It doesn't lead me to conclude that this is directional (in your words, a trend), but it does lead me to be *very* hesitant in drawing any conclusions from the data (how respondents use the tool doesn't necessarily translate with what people are looking for from "new" mmos). 

Just to reiterate, I think the tool is great and has a lot of utility.

 
apwrsmage  4/21/08 12:58:46 AM

Rank: 1/100 Rank: 1/100 Rank: 1/100 Rank: 1/100 Rank: 1/100

Novice Member

Joined: 4/04/08
Posts: 38


Originally posted by Mahni

Originally posted by apwrsmage




Originally posted by Mahni


Originally posted by apwrsmage

In statistical analysis, depending on what you're doing, the ideal sample size is 150 to 3,000. A sample size in the millions starts to pollute the numbers.



"Ideal" sample size depends on estimated effect size in the population - there could be cases were a very small effect size posed such a great risk (say a small % increase in aircraft failures) that it would warrant a large sample size to detect it.
But for this case, your point is valid - a sample size of 150 would be enough to detect an effect size of interest (assuming no analysis of higher order interactions).
Larger sample sizes (for example, "in the millions") would not "pollute the numbers" or introduce any type of systematic bias. If you have a large sample (say hundreds of thousands of transaction records, as an example) and you were to fit the *entire* sample with certain techniques (for example, a decision tree algorithm such as CART or CHAID) you could end up with a result that was not *generalizable* due to overfitting, but this would be the fault of the statistician for not using an appropriate validation technique, *not* for having "too much data".
As I said in my previous post, my greatest concerns here are how representative the data is, and with statistical validity. I don't have concerns about power or whether differences are statistically significant.




Using a sample set in the millions, unless you're dealing with a very straight-forward, black and white situation can introduce a certain amount of random variance that, depending on the situation, can play with power. Or, again depending on the situation, the differences in conclusions from the larger sample size are negligible enough to not warrant the extra effort... say a 20.05% trend versus a 20.03% trend.
Nevertheless, in this situation a control certainly can't be established given the nature of the situation and its uses, but the tool isn't being used for clinical trials, and there are so many possible ways the tool can be used it can't declare a definitive result, it can only show a trend.
Which brings us all back to the original point: A trend. The tool shows the trend of curiosity among those that have used it as to what games and systems they're currently looking for. It seems to me that was the whole point of the tool in the first place. No more, no less.

Sorry for the derail here...
Large sample sets do not "introduce" "random variance". If you are saying large sample sets somehow increase unexplained variance OR variance from individual differences (think error terms in structural equation modeling), they do not. If you are saying they introduce some form of systematic bias, they do not. If you are saying that large sample sizes can create a problem due to sampling error, they can but thats a sampling error problem. As sample sizes go up, confidence intervals go down, whether its a sample size of one hundred or ten million.
Not having a control (group) is irrelevant here. Descriptives and estimation of population parameters (population mean, variance) can always be done with a sample of the population without a control group. If you are implying that only "clinical trials" give definitive results, that is incorrect. Observational or psuedo-experimental designs are just as valid (and in some cases *more* valid as a design) than experimental designs (such as a factorial design including a control group).
When you say "there are so many possible ways the tool can be used...", if you are saying this is not a controlled experiment, I completely agree. But thats a validity / generalizability issue (that I pointed out in earlier posts). It doesn't lead me to conclude that this is directional (in your words, a trend), but it does lead me to be *very* hesitant in drawing any conclusions from the data (how respondents use the tool doesn't necessarily translate with what people are looking for from "new" mmos).
Just to reiterate, I think the tool is great and has a lot of utility.

And this is the part where you overthink it and miss my point, and most of my meanings, entirely.

 
Mahni  4/21/08 2:15:51 AM

Rank: 4/100 Rank: 4/100 Rank: 4/100 Rank: 4/100 Rank: 4/100

Novice Member

Joined: 1/31/08
Posts: 58

Originally posted by apwrsmage

 

 

And this is the part where you overthink it and miss my point, and most of my meanings, entirely.

I'll try again.

Originally posted by apwrsmage

A sample size in the millions starts to pollute the numbers.

Not true.

 

 
apwrsmage  4/21/08 2:57:18 AM

Rank: 1/100 Rank: 1/100 Rank: 1/100 Rank: 1/100 Rank: 1/100

Novice Member

Joined: 4/04/08
Posts: 38


Originally posted by Mahni

Originally posted by apwrsmage


And this is the part where you overthink it and miss my point, and most of my meanings, entirely.
I'll try again.
Originally posted by apwrsmage
A sample size in the millions starts to pollute the numbers.


Not true.

In your opinion. My opinion is otherwise. It's obvious you're trying to "win" an intellectual "debate". But, this is the last I'm posting on this thread. The point that the tool may not be scientific but provides good information and is interesting has been ground into a fine paste. So feel free to say, "Nuh uh! You're all kinds of wrong! Cuz I know!" though I continue to disagree. If you want to keep bashing away at the subject in an attempt to puff yourself up, be my guest.

 
Mahni  4/21/08 11:45:54 AM