Statistics Gurus...........help

maverick2112 · Feb 21, 2004

What exactly is the rejection v in a normal distribution curve

KotysDad · Feb 21, 2004

Re: Statistics Gurus...........help

maverick2112 said:
What exactly is the rejection v in a normal distribution curve

Mav,

Not exactly sure what you mean by "rejection v", but I'll try to give an answer that might clear up some confusion about the normal distribution curve.

Suppose you have a hypothesis that you want to test, and you determine the data is "normally distributed". Mathematically (which I cant type here without an equation editor), this means that the distribution function of the random variable you are testing fits a specific function with integrals and values for the mean and standard deviation. Ok, that aside.......

If you test data that is normally distributed, you set up your null hypothesis (say, that a coin is fair), then you flip the coin a certain number of times. You conclude from that data whether to accept or reject your hypothesis.

Before you determine your rejection region, you must first determine your "level of significance" or "level of confidence" of your result. There are times when you will reject your hypothesis when in fact it was correct (TYPE 1 error). There are times when you accept the hypothesis when in fact you should have rejected it (TYPE 2 error). In testing your hypothesis, the maximum probability with which you would be willing to risk a TYPE 1 error is called your level of significance.

You set a level of significance - usually either 5% or 1% - lets choose 5% here. That means you will have 95% confidence in your result.

So you run your experiment and record your data and compute your sample mean and standard deviation from your data. Using the standardized variable Z, you compute your test statistic.

Now, if you chose a 5% level of significance, this means that if your statistic falls between -1.96 and +1.96 you will accept your initial hypothesis. (These numbers come from stat tables for Normal Dist).

The 5% means that you accept a "risk" of 5% that you will accept your initial hypothesis even though it should have been rejected. In other words, you are 95% sure you made the right decision about your initial hypothesis.

I dont know if this clears up your confusion, or if I just made it worse.

If I understood your question, its a tough question to answer in 25 words or less. lol

If you're still unclear about it, post a follow up question and I'll try to answer it.

Penguinfan · Feb 21, 2004

Yea, that's what I was gonna say.

hellah10 · Feb 21, 2004

wtf

:shrug:

TORONTO-VIGILANTE · Feb 21, 2004

ughhhhhhhhhh..my eyes are bleeding.....:shrug:

KMA · Feb 21, 2004

Depends if you are doing a one-tail or two-tail test of significance.
If one tail, its either the 5% of the "bell" curve at the left or right tail. If two-tailed, its 2.5% of the curve on both sides.

It also depends upon the level of confidence you want. 5% is conventional but some use 1%.

If the test statistic used falls into those regions, you reject whatever null hypothesis you are testing. That was a quick rendition.

maverick2112 · Feb 22, 2004

Kotys DAD.........Heres the activity I am doing.............

Activity Objective:
This activity will help show you an application of simulation and statistics in real scientific research. .

Science Background:
In 1997, the magazine Nature featured an article describing a recent find of a nest of 22 Troodon eggs at Egg Mountain near Choteau, Montana. The eggs had hatched, but the lower two-thirds of each shell was intact and fossilized in limestone rock. These eggs provide valuable clues to the nesting and reproductive habits of the Troodon. Based on the arrangement physical characteristics of the nest, paleontologists determined that the eggs were laid in pairs. If this were indeed the case, the nest would provide valuable evidence for scientists trying to determine evolutionary links between dinosaurs and birds.

If they were laid in a random manner, it would suggest that the reproductive system of the Troodon was like that of modern-day crocodiles. This, in turn, would represent evolutionary evidence for a dinosaur-reptile link.

On the other hand, a pairing of the eggs would suggest that the Troodon laid the eggs in pairs and over an extended period of time. Modern day birds lay their eggs in a similar manner, only they do so one at a time. However, the evolutionary forerunners of birds possessed two working ovaries. Thus, a paired arrangement of the eggs would represent evolutionary evidence for a dinosaur-bird link.

So, here?s the big question: ?Was the placement of eggs the result of purposeful biological behavior or just a random event ? a lucky coincidence that occurred by chance?? In this activity, we will see how scientists used simulation and statistics to answer this question.

Math Notes:
First, you need to find out what a ?normal? distribution is. I am also interested in something called the Empirical Rule. This rule tells us the proportion of data that will be within one, two and three standard deviations of the mean ? providing the distribution is normal.

OK, back to the eggs: The question facing the consulting statistician was how to establish that the eggs were actually paired. Of course, it is possible that the eggs were laid in a random manner and they just appeared to be laid in pairs.

The statistician first developed a quantitative measure of ?pairedness.? The minimum average paired distance (MPD) is obtained by examining each of the possible pairings of the eggs in the nest, measuring the distance between pairs, and selecting the pairing that minimizes the sum of the distances between each pair. The MPD is this sum divided by the number of pairs.

The statistician then used simulation to construct 1000, randomly laid dinosaur nests. For each of these "virtual" nests, he determined the MPD. Thereafter, he constructed a histogram of the 1000 MPD's to create a distribution of MPD's for randomly-laid nests. In other words, the histogram (constructed via simulation) represented a picture of what nests would result if the eggs were truly laid in a random manner.

He then compared the MPD of the original nest with the histogram of MPD's and determined that it was very unlikely that the nest was laid in a random manner. The reasoning underlying this is as follows:

From the graph, it can be inferred that if a dinosaur nest is laid by random means, the MPD of the nest will probably be about 3.5. It is possible that a random nest could have an MPD of about 2 or 5, but both of these events are less likely than the random nest with MPD of 3.5. Likewise, an MPD of 1 is even less likely to occur than is an MPD of 2. In general, obtaining an MPD of 1 or less is so unlikely to happen that, were it to happen in an actual nest, we would question the assumption that the nest were truly laid by random means.

This is exactly what happened in the actual research. The distribution of random MPD's was shaped much like the image above. However, the MPD of the actual nest was very small. In fact, if we use the scale of the figure above, the MPD of the actual nest was about 0.5! The chance of this happening if the eggs were truly placed in the nest at random is remote ... so remote that the researchers can justifiably question whether or not the nest is truly random. That leaves the alternative ... the eggs in the nest are paired.

The Activity:
To determine whether or not the eggs in the nest were paired, the consulting statistician simulated the laying of 1000 nests. In each nest, the eggs were laid by random means. Although the original nest contained 22 eggs, one can consider a smaller number of eggs and still develop an understanding of the research and the role of simulation in the research. In our activity, we will consider nests of 4 eggs.

The Scenario

News Flash: Researchers have stumbled across a nest of 4 Troodon eggs. The eggs appeared to be paired, but the researchers would like more conclusive evidence.

The Research

You are going to simulate the laying of "random" nests of four eggs each, determine the MPD of each nest, and construct a histogram of your results. Based upon your results, you will develop a rule: If the MPD of the actual nest is less than (SOME VALUE), then I reject the claim that the eggs in the actual nest were laid in a random manner.

.

Calculate the MPD for each nest. To do so, label the eggs in each nest (1, 2, 3, and 4). There are 3 possible pairings of the eggs in your nest:

Pairing 1: egg 1 with egg 2, and egg 3 with egg 4

Pairing 2: egg 1 with egg 3, and egg 2 with egg 4

Pairing 3: egg 1 with egg 4, and egg 2 with egg 3

For each pairing, find the average paired distance between the centers of the paired eggs. For example, for pairing 1, use your ruler to measure the distance (in cm) from the center of egg 1 to the center of egg 2, and the distance from the center of egg 3 to the center of egg 4. Add the two distances and divide by two. This average is the average paired distance for pairing 1. Repeat the process for pairings 2 and 3. The MPD is the minimum of the three average paired distances.

Repeat this process for 6 nests.

Based upon your results, identify a rejection value V. The rejection value is a number V such that if the MPD of the actual nest were less than V, then you would conclude that the four eggs in the actual nest were not laid in a random manner. Justify your value of V.

According to the Law of Large Numbers, true trends in the data are only revealed by a sufficiently large sample.

What is the connection between the Law of Large Numbers and this simulation activity?

OK.......I did this activity and came up with the following numbers

7.8, 5.3, 8.5, 6.5, 4.5, 5.0.................MPD NUMBERS.

I need to answer the following questions............

1. How would a nest that is laid by random means differ from one in which the eggs are paired?

2. What is the connection between the Law of Large Numbers and this simulation activity?

3. What is my rejection v...????

PS.....I did this from 54 nests and my numbers are

7.8,5.3,8.5,6.5,4.5,5.0
5.8,4.8,7.0,5.5,6.5,7.1
3.4,5.1,5.5,4.6,8.4,3.2
2.6,4.3,4.5,3.2,3.8,4.0
6.0,5.2,4.1,5.2,6.9,4.9
4.2,8.4,8.7,8.9,7.4,3.9
7.8,6.4,7.8,9.4,4.2,3.5
7.9,5.7,9.6,10.2,9.6,9.9
2.5,3.5,6.5,6.5,6.5,4.0...........since the eggs are pretty close together for this number of times, I would have to conclude the are being layed in pairs and not randomly since if they were being layed randomly these numbers would be more erractic. Is this correct?

hellah10 · Feb 22, 2004

am i the only person that just scrolled through all that jibberish and didnt read nothing except for the confused smileys?

maverick2112 · Feb 22, 2004

Hellah.......dont worry I am taking this statistics class and It took me about 5 times reading thru this stupid assignment to understand it and I still am not quite sure what I am doing................Thank heavens for Madjacks and Kotys Dad

KotysDad · Feb 22, 2004

Mav,

I printed out a copy of your assignment to look at. I'll get you some feedback by tonight unless someone else responds earlier.

Hellah LOL......Only the true geeks of this world can read that assignment and appreciate it's true aesthetic value.

Color me geeky. lol

maverick2112 · Feb 22, 2004

Thanks KD............I appreciate all the help I can get........

Captain Crunch · Feb 22, 2004

KotysDad said:
I'll get you some feedback by tonight unless someone else responds earlier.

Don't bet on that happening!!!!!!! I'm thinking that you are his only source of help on this one.

TJBELL · Feb 22, 2004

Captain Crunch said:
KotysDad said:

I'll get you some feedback by tonight unless someone else responds earlier.

Don't bet on that happening!!!!!!! I'm thinking that you are his only source of help on this one.

Click to expand...

:lol2 :142smilie :142smilie :142smilie Too funny!!!

Koty should be real proud of her Dad!

Omar: Jibberish. LOL!!!!!!!!!

SixFive · Feb 22, 2004

hellah10 said:
am i the only person that just scrolled through all that jibberish and didnt read nothing except for the confused smileys?

LMFAO!!! Add me to that list! I have a father that is rather scholarly, genius type, who understands all that stuff, but I don't know it.

KotysDad · Feb 22, 2004

Mav,

Here is my solution, but I am a bit confused about one or two points. You listed 54 MPD numbers in your data set, but the last 6 numbers are followed by "since the eggs are pretty close together....". Are these last 6 MPD numbers supposed to be from the actual nest of 4 Troodon eggs that were mentioned in the News Flash statement?

I assumed the answer to my question is no, since if that was the case you would have had 60 data points......54 for your simulation followed by the 6 actual data points from the eggs recovered. If this isnt the case, then I have to run the numbers again. Your statement after those last 6 data points is what makes me wonder if they were simulation data points or real data points. The final answer doesnt change drastically, but for accuracy let me know if I made a wrong assumption.

So, here is my solution under the assumption that all 54 of those data points were from your simulation.

Ho = null hypothesis that eggs were placed at random.
H1 = alternative hypothesis that eggs were paired.

Using your 54 data points, I computed the mean and standard deviation of the points and came up with

mean = 5.97
sd = 2.04

Since the alternative hypothesis implies that you will only reject the null if the real mean is well below the sampled mean, then we only need to consider a one-sided rejection region. I am going to use 5% as my significance level. This 5% is pretty standard, but some people choose to use 1%. I'll explain a bit more on this later.

Considering the normal bell curve, you have to translate that 5% left tail to an actual value (rejection value V). You have to take the standardized normal variable:

X - mean
----------- < -1.645
sd

and solve for X. The -1.645 comes off the normal tables with a left tail probability of .05.

Solving for X, you get X < 2.61. THis means that if the mean from your real data is less than 2.61, then you will reject the null hypothesis that the placement is random and conclude there is pairing.

I mentioned the 5%. Some people like to use 1%. If you use 1%, you get X < 1.21. With a lower rejection value, you increase the chance of having a real data set taken as random, when in fact it really is paired. If you get a rejection, you have more confidence that the eggs are paired. There are pros and cons to either choice of 5% and 1%. I mentioned this earlier when I talked about Type 1 and Type 2 errors in my previous post. I used 5%. In reality the statistician knows ahead of time whether a false postive or false negative is more damaging, and from that makes his choice of 5% vs. 1%. Your problem didnt make that distinction, so I choose 5%.

Now for the rest of your questions.

1. Your data shows that on average there are approximately 5.97 cm of distance between eggs when placed at random. How would a nest that is laid at random differ from one paired? We answered that above. The paired eggs will have a much smaller MPD. The above analysis shows how much smaller it has to be to conclude the eggs are not random.

2. The connection between the law of large numbers and your simulation is simply that the more data points you use in your simulation, the closer your sample mean will approach the true mean of randomly placed eggs. In other words, your sample mean will more accurately reflect random data. (Consider flipping a coin. If you flip it twice and get 2 heads, you wont conclude that the coin is biased towards heads even though the probability of heads in your sample was 100%. But if you flipped it 10,000 times, the distribution of heads to tails will be much closer to the true value of a unbiased coin - 50%. Thats the law of large numbers.)

3. What is the rejection value? We answered that above. At a 5% significance level, the rejection value is 2.61. Anything below that and we reject the null.

Hope this helps. Let me know if you have any questions on what I did.

maverick2112 · Feb 23, 2004

Kotys Dad............You dont know how much I appreciate your help on the stats class. Maybe after this class is over(in a couple of months) I will get your email from madjack and send you a little something for all of the help you have been. again thanks a lot.

Those last six numbers were my numbers from my simulation so your assumption was correct. Again thanks.

By the way how do you know so much about this stuff. Do you do this for a living? Statistics, projections etc.

KotysDad · Feb 23, 2004

maverick2112 said:
Kotys Dad............You dont know how much I appreciate your help on the stats class. Maybe after this class is over(in a couple of months) I will get your email from madjack and send you a little something for all of the help you have been. again thanks a lot.

Those last six numbers were my numbers from my simulation so your assumption was correct. Again thanks.

By the way how do you know so much about this stuff. Do you do this for a living? Statistics, projections etc.

Mav,

You're very welcome. I enjoy this stuff, so ask a question anytime.

Yes, I somewhat do this for a living. My field of expertise is cryptography - which involves alot of math. I have a B.S and M.S in Applied Math. I also teach part time in the evenings.

Search

Search

Statistics Gurus...........help

maverick2112

Registered User

KotysDad

Registered User

Penguinfan

Thread banned

hellah10

WOOFJUICE

TORONTO-VIGILANTE

ad interim...

KMA

Registered User

maverick2112

Registered User

hellah10

WOOFJUICE

maverick2112

Registered User

KotysDad

Registered User

maverick2112

Registered User

Captain Crunch

Registered User

TJBELL

Registered User

SixFive

bonswa

KotysDad

Registered User

maverick2112

Registered User

KotysDad

Registered User

We value your privacy