Test Question Possibilities
ü
What are the various
ways we can use to graph univariate data? I count 4 major ways. Give a brief
explanation of the process involved in for each way.
ü What are the two competing quantitative methods used to describe univariate data? Give a brief definition or explanation of each parameter within the two competing methods.
ü What are the main things we look for in graphs of distributions? What additional things should we also consider in these graphs?
ü How do you know if a data set is skewed left? Skewed right? Be specific. How would a boxplot look that was skewed in each direction?
ü What is meant by resistant? Non-resistant? Which of the univariate parameters that we have studied would fall into each category?
ü What is the formula for mean of a sample? For standard deviation of a sample? What symbols are used for the mean and standard deviation of a sample? What symbols (don’t worry about formulas) are used for the mean and standard deviation of a population? How are the variance and standard deviation of a sample (or population) related? Be sure to include a unit of measurement relationship for the last question.
ü What happens to the mean, median, standard deviation and IQR of a data set if you add a fixed number to each member of a data set? What happens to these same measures if you take a data set and multiply each value by a fixed number?
ü What conclusions can you make about a distribution if the mean = median? Mean > median? Mean < median? Briefly explain why you can make these conclusions.
ü What does a density curve represent? What are the two basic requirements for a density curve?
ü What does the empirical rule say about all normal
distributions? How do you standardize observations from a normal distribution
with mean
and
standard deviation
into
one that is normally distributed with mean 0 and standard deviation 1?
ü Describe how you would use Table A in the book in two ways. First, to determine the proportions of observations within two given values in a normally distributed population. Second, to determine the observation values that would account for a particular proportion of the data in a normally distributed population.
ü What is a percentile score? What can you say about the percentile score of a median? Quartile 3? Quartile 1? Mean?
ü Briefly describe how you would use a histogram of a data set to assess normality. Would using a dotplot instead of a histogram be a more accurate assessment? Why or why not? Yes, I agree using a dotplot may be more tedious, but that is not the question J
ü Describe how using a normal probability plot can be used to assess normality. Be sure to address what you a plotting on the y-axis and the x-axis on the plot. Also be sure to briefly explain not only WHAT you are looking for in this plot but also WHY this might be useful in assessing normality.
ü What are the qualitative things you should look for in a scatterplot of bivariate data? How can you add a categorical variable to a scatterplot? WHY would you want to add a categorical variable to a scatterplot? Give an example.
ü How do you calculate r for a data set? What does this have to do with Z-scores? What is the interpretation of this r value? What are the units of an r value? What are the units of a z-score?
ü What are the properties of r?
ü What is the difference between a correlation coefficient and a coefficient of determination?
ü In addition to
,
r and r2 what things should you report with your analysis of
bivariate data? HINT: One of these is a graph and the other things are from
Chapter 1.
ü What is our GOAL in obtaining a LSRL? That is, what have we optimized or minimized? Would the LSRL be different if we switched the explanatory and the response variable? What about r and r2?
ü What is the relationship between the slope of the line and the correlation coefficient? How can you find the y-intercept of the LSRL given only the correlation coefficient and the mean based statistics for both x and y?
ü What (x,y) can you GUARANTEE will always be on the LSRL? Why can you guarantee this?
ü What the heck is an outlier? An influential point? Can a point be one and not the other?
ü How do you calculate residual values? What is the sum of the residuals? What is the sum of the squares of the residuals and what does it have to do with the LSRL?
ü When analyzing a residual plot, what should you look for?
ü What is the interpretation of r2?
ü What are the three main properties of logarithms?
ü If you have a suspected (exponential) relationship of y = ABx, how do you use the properties of logarithms to obtain a linear relationship? What graph would you suspect to be a straight line? That is, what would you plot on the dependent and independent axes? How do you then complete the inverse transformation to put the model back into the context of the problem?
ü Same as above question but for power function.
ü What is an initial test for exponential growth? How can you tell by looking at an equation if you have exponential growth or exponential decay?
ü What is a marginal distribution? What is a conditional distribution? If you have a two-way table that measures “n” of one characteristic and “m” of another characteristic, how many marginal distributions will there be? How many conditional distributions will there be?
ü Part 1 - What is extrapolation? Why should you care? Part 2 – What warnings do you have for someone performing linear regression on a data set of averages? Why do you make this warning?
ü Describe Simpson’s paradox? What is the main lesson learned from Simpson’s paradox in relation to analysis of statistical data?
ü What is a lurking variable? A common response variable? A confounding variable? Which of the above is a subset of another?
· Voluntary response
· Convenience sampling
· Undercoverage
· Nonresponse
· Positive Response
· Wording of questions
ü What are the two major components of a probability model?
ü What is a sample space?
ü What is the multiplication principle and how would you use it?
ü IF DISJOINT – P(A or B) = ?
ü IF NOT DISJOINT – P(A or B) = ?
ü IF INDEPENDENT – P(A and B) = ?
ü IF NOT INDEPENDENT P(A and B) = ?
ü IF NOT DISJOINT and also INDEPENDENT, P(A or B) =
ü Draw a Venn Diagram to represent the following situation: P(A) = .3, P(B) = .6, P(A and B) = .2
o Are events A and B independent? How do you know?
o Are events A and B disjoint? How do you know?
o Find the following:
§ P(A and B)
§ P(A and Bc)
§ P(Ac and B)
§ P(Neither A nor B)
§ P(Not A and B)
ü The probability of guessing a correct answer on a multiple choice test is .2 (assuming each question has 5 choices). You are going to take a 6 question multiple-choice test and guess the answers on each. Find the following (expressions are fine):
o P(getting all correct)
o P(getting all incorrect)
o P(not getting all correct)…note this is not the same as P(getting all incorrect)
o P(not getting all incorrect)…note this is not the same as P(getting all correct)
o P(getting at least one correct)…aha…this is the same as one of the above J
o P(getting at least one incorrect)…aha…this is the same as one of the above J
ü If the P(A and B) = .4 and the P(A) = .7, what is the P(B given A)?
ü If the P(A and B) = .3 and the P(A Given B) = .6, which can you find, the P(A) or the P(B)? Find it.
ü If P(A given B) = 1/3 and the P(B) = .3, what is the P(A and B)?
ü If P(A or B) = .8, P(B Given A) = .3 and P(A) = .5, what is the P(B)? Hint: You cannot find directly – first find P(A and B) from given, then find P(B) from P(A or B) given. I absolutely LOVE this question.
Chapter 7 –
ü What is the definition of a Random Variable? What two things will you need in order to DEFINE your random variable?
ü What is a probability histogram? A density curve? Are the two related in any way?
ü What is the difference between a discrete and a continuous RV? What are the two requirements for the probabilities of a discrete RV? Of a continuous RV?
ü What is a uniform probability distribution? How would the function be defined if the continuous RV can take on the values from 0 to n inclusive?
ü Is a normal distribution a pdf? Why or why not?
ü How do you determine (by formula) the mean, variance and standard deviation of a discrete RV? What is the difference between a mean of a RV and an expected value of a RV? [Note – recall that FINDING the mean and the variance of a continuous RV is beyond the scope of this class and requires methods from calculus.]
ü What does the law of large numbers state? Are there any assumptions about the observations from the population that are crucial to this law?
ü Complete the following rules for means and variances of RV’s
o uX+Y =
o uX-Y =
o ua+bX=
o variance(a + bX) =
o variance(X + Y) =
o variance (X – Y) =
o standard deviation(X – Y) =
ü What are the four conditions that need to be satisfied to for a situation to be a binomial setting?
ü How is the RV X defined in a binomial distribution? What values can X take? What are the two parameters that you need in order to determine a probability such as P(X = 3).
ü What is the difference between a pdf (probability distribution function) and a cdf (cumulative distribution function)? If X is a RV that is B(10, .3) give an example of how the two functions are related.
ü Do the concepts of a pdf and a cdf make sense outside of the context of a binomial random variable setting? If not, why not. If so, give a quick example.
ü Explain what the calculator commands binomlpdf(50, .43, 23) and binomlcdf(50, .43, 23) would represent in words. You may put it into the context of a problem if you wish.
ü What is the formula (not calculator command) for determining the probability of (X = k) in a B(n,p)?
ü How does a geometric random variable differ from a binomial random variable? How are they similar? What values can a geometric RV take?
ü What is mean and variance and standard deviation of a random variable that is B(n,p)? What is the mean of a geometric RV that had a probability of success = p? [Note: the variance of a Geometric Dist is given by (1-p)/p2 if you care but is not given in the text]
ü If Mr. G has a 24% chance of making a free throw and shoots each independently confirm the following probabilities:
o P(It takes Mr. G Exactly 8 attempts to make his first) =.035
o P(It takes Mr. G more than 8 attempts to make his first shot)=.111
o
P(It takes Mr. G less than 8 attempts to
make his first shot) = .853
ü
Describe
what a sampling distribution represents.
ü
How
will the spread of the sampling distribution of x-bar change as you increase
the sample size?
ü
How
will the spread of the sampling distribution of p-hat change as you increase
the sample size?
ü
How
will the spread of the sampling distribution of x-bar change as you increase
the number of samples?
ü What is the difference between a parameter and a statistic?
ü What does it mean for a statistic to be unbiased?
ü Describe in words what p-hat would represents.
ü What does the Central Limit Theorem say?
ü What does the Law of Large Numbers say?
ü Under what conditions can you assume that the sampling distribution of x-bar is N(mu, sigma/sqrt(n))?
ü Under what conditions can you assume that the sampling distribution of p-hat is N(p,sqrt[ p(1-p)/n])?
ü Mr. G’s Geometry class has grades that are normally distributed with a mean of 84.4% and a standard deviation of 3.3%.
o What is the probability that
a randomly chosen student will have a grade that is below 80%? What
assumptions/conditions are necessary in order to ensure this probability is an
accurate calculation?
o What it the probability that
a randomly chosen group of 5 students will have a mean grade that is greater
than 90%? What assumptions/conditions are necessary in order to ensure this
probability is an accurate calculation?
o What is the probability that
a randomly chosen group of 30 students has a mean grade that is less than 82%?
o Which of the answers above would change if the population of student’s grades were distinctly NONNORMAL?
1.
What
is meant by a critical value (z*)?
2.
What
are we assuming about
(the standard deviation of the population) throughout this
chapter?
3.
What
is the formula for
, the standard deviation of the sampling distribution of
?
4.
What
is the formula for the margin of error for the statistic
?
5.
How
do you find a confidence interval from
and your margin of error?
6.
What
is the interpretation of a confidence interval? Yes, I am asking for
those two magic sentences!
7.
You
weigh 30 bags of M&M peanut bags and get a mean weight for those 30 bags to
be 12.4 oz. You are told that the standard deviation for bag weights,
, is known to be .3 oz2. We are interested in a
94% confidence interval. Find the following
a.
z*
b.
c.
margin
of error
d.
94%
confidence interval
e.
What
is the interpretation of this interval?
8.
Suppose
you would like to cut the margin of error in 7c in half. How many total bags
should you weigh in order to do this?
9.
How
many bags should you weigh in 7 in order to ensure you have a margin of error
no more than .03 oz at a confidence interval of 94%?
10. How would you get the margin
or error for a statistic from the confidence interval only?
11. Name three cautions you have
for someone who picks up a calculator and calculates a confidence interval.
(see pages 524-525!)
12. What is the interpretation
of a P-value?
13. When you will reject Ho?
Accept Ho?
14. When is a result called
statistically significant?
15. Back to our M&M bags
from problem 7. If H0:
and Ha:
and your
= 12.4 oz, n = 30, alpha = 5% …
a.
What
is the P-value?
b.
Will
you accept of reject H0?
c.
Is
the result
= 12.4 statistically significant?
d.
Redo
a, b and c if you are given H0:
and Ha: ![]()
16. What is the definition of a
Type I error?
17. What is the definition of a
Type II error?
18. How do you find the P(Type I
error)?
19. What is the procedure
for finding P(Type II error)?
20. What is the definition of
the Power of a significance test?
21. Bag to the M&M bag
example. Use the following parameters
= 12.4 oz, n = 30 , H0:
and Ha:
, alpha = .05
and find
the following:
a.
P(Type I error)
b.
P(Type
II error) assuming that the true mean is 12.35oz?
c.
Power
of the test.
1.
When
should you use a T-distribution INSTEAD of a Z distribution when finding a
Confidence Interval or doing a Significance Test?
2.
KEY
CONCEPT – CONSIDER THIS AN ESSAY QUESTION! - What is the difference between
these three different kinds of T-tests (conceptual and formula)? A) 1-sample B)
matched pairs C) 2-sample
3.
What
are the degrees of freedom for 1-sample or a matched pairs T-Statistic? What
should you use for the degrees of freedom for a 2-sample T-Statistic if you
don’t have a calculator? Briefly explain how the degrees of freedom on a
calculator be different and why.
4.
Is
a 2-sample T-statistic unbiased? Why or why not? What assumptions/conditions
must be met in order to use the T-statistic (see page 606).
5.
What
can you say about the sampling distribution of the difference of two means (
1 -
2). Specifically, what is it’s center and standard
deviation.? What assumptions/conditions must be satisfied in order for
you to make these statements?
6.
What
is the standard error for an
? How do you use this standard error to get a C% confidence
interval?
7.
What
is the GENERAL FORM for a confidence interval of any statistic?
8.
Quite
often the null hypothesis on a 2-sample T-statistic is what? How does this
affect the T-statistic?
9.
Suggestion
– Review the concepts of Type I error, Type II error and Power in context of a
T-dist whether it be 1-sample, matched pairs or 2-sample.
10. Note – we skipped pages
633-639 since they are optional. Still, you should understand that a calculator
uses a different degrees of freedom and the concept of why it is either a
conservative or a non-conservative estimate!