Statistics How To: Elementary Statistics for the rest of us!

0

 Calculus deals with continuous functions using the limit process, which you can think of as a “stopping point”. In the atomic clock example, the limit of the clock’s accuracy is 19 decimal places. Limits are in place to give us a workable number, otherwise we would be calculating (with ordinary mathematics) forever and ever.

 The remaining two concepts are usually defined as solutions to geometry problems. For example, the derivative is the slope at a point, and the definite integral is the area under a curve. Theses two core concepts allow you to take a simple graph, like one of the velocity of a car, and mine it for information you couldn’t get with ordinary mathematics. For example, you can find acceleration or “jerk“—that feeling you get in an elevator when it suddenly slows down or speeds up.

 Looking for elementary statistics help? You’ve come to the right place. Statistics How To has more than 1,000 articles and hundreds of videos for elementary statistics, probability, AP and advanced statistics topics. Looking for a specific topic? Type it into the search box at the top of the page.

 Acceptance-Rejection sampling is a way to simulate random samples from an unknown (or difficult to sample from) distribution (called the target distribution) by using random samples from a similar, more convenient probability distribution. A random subset of the generated samples are rejected; the rest are accepted. The goal is for the accepted samples to be distributed as if they were from the target probability distribution.

 Let’s say you wanted to sample from the inverse of a particular function, but an explicit inverse doesn’t exist. You could use a similar distribution—making sure that (Carroll & Hamrick, 2011):

 Acceptance-rejection sampling is is a Monte Carlo method that has largely been replaced by newer Markov Chain Monte Carlo methods. It’s usually only used when it’s challenging to sample individual distributions within a larger Markov chain (Christensen et al., 2011).

 Dieter, U. & Ahrens, J. (1974). Acceptance-Rejection techniques for sampling from the gamma and beta distributions. Office of Naval Research Technical Report No. 83. Retrieved December 9, 2017 from: https://statistics.stanford.edu/sites/default/files/CHE%20ONR%2083.pdf

 You can never make exact measurements in an experiment (even the atomic clock isn’t exact: it loses a second every 15 billion years). How far away from the “mark” you are is described by accuracy and how well you measure is described by precision.

 If you want to tell which set of data is more precise, find the range (the difference between the highest and lowest scores). For example, let’s say you had the following two sets of data:

 Accurate and precise: If a weather thermometer reads 75oF outside and it really is 75oF, the thermometer is accurate. If the thermometer consistently registers the exact temperature for several days in a row, the thermometer is also precise.

 Precise, but not accurate: A refrigerator thermometer is read ten times and registers degrees Celsius as: 39.1, 39.4, 39.1, 39.2, 39.1, 39.2, 39.1, 39.1, 39.4, and 39.1. However, the real temperature inside the refrigerator is 37 degrees C. The thermometer isn’t accurate (it’s almost two degrees off the true value), but as the numbers are all close to 39.2, it is precise.

 An active variable is a variable that is manipulated by the investigator. It’s designed to shine light on some part of a question or problem, and its usefulness comes in the way it can be controlled by a researcher.

 Consider a research project on the effect of water on greenhouse tomatoes. The amount of water provided to each tomato is an active variable because it is controlled by the investigator. In fact, their manipulation of that variable is the force that drives the experiment.

 In an investigation on the responses of middle-class families to different types of advertising propaganda, the type of advertising shown for each recorded response is an active variable. The researcher may not be able to manipulate the social status of the family or their predispositions toward advertising, but he can decide which advertisement to test in each instance.

 The designers of study 1 ask their participants to keep a record of the amount of time their children spend using tablets, computers, and other electronic devices. They also tell participants not to change anything about their daily routine during the course of the study.

 The designers of study 2 divide their participants into three groups. They ask group 1 to eliminate electronic use during the course of the study. Group 2 is asked to allow their children devices for 1 hour a day, and group 3 to give unlimited access.

 An additive tree is a general way to represent clusters of data in a graph. It is used when your data table is composed of rows and columns that represent the same units; the measure must be a distance or a similarity.

 A “tree” is a finite, connected graph where any two nodes are connected by one path. In the above image, node B is connected by one path to node E, and node E is connected by one path to node F. The additive tree is a similar technique to cluster analysis. Both techniques have the “leaves” of the tree representing units. Where the additive tree differs is that the distance is graphically represented by the distance of those units on the tree.

 Cluster analysis creates the clusters but does not create a graph that represents the results. An additional limitation of hierarchical cluster analysis is that objects in the same cluster must be exactly the same distance from each other, and the distances between clusters must be larger than the “within clusters” distance. Additive trees do not have these limitations.

Excellent Statistics

 The Matlab command is [addtree.m) and displaying them is [displaytree.m]. You can find more Matlab commands here. Michael Lee’s page on the UC Irvine website also has a downloadable zip file with all of the commands you can do in Matlab for representing similarity data.

 If it’s not admissible, then it’s inadmissible. An inadmissible decision rule is never worth using, since by definition there will always be another rule that is better than it. But just because a rule is admissible doesn’t mean it’s the best or even the most sensible rule. Although there is no rule that is generally better than it, there may be decision rules that are better over specific ranges—and those ranges might be the ranges that are most interesting to our research.

 Let δ be our decision rule, and R(θ, x) the risk function. The risk function quantifies the risk we take in using δ, or the likelihood we make a wrong decision with our rule. θ here is a variable in Θ, which includes all the states of nature, in a way of speaking, or every point in an unlimited range.

 They were first developed by astronomer Sir George Biddell Airy (1801-1892), during his investigations into optic diffractions. They have a wide variety of applications and are used in fields as diverse as quantum mechanics, electromagnetics and combinatorics. A different function developed by Airy is important in astronomy and microscopy.

 Airy developed a different function, related to diffraction on a circular aperture. The function, which is the shape of an “Airy disk” is unrelated to the definitions above (which are regarding solutions to the Airy equation). This particular function is closely related to the Bessel function. Because of the possibility of confusing the two different functions, this version is sometimes called the Airy Disk Function.

 Abramowitz, M. and Stegun, C. A. (Eds.). Airy Functions. §10.4 in Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing. New York: Dover, pp. 446-452, 1972.

 Alternate forms reliability is a measure of reliability between two different forms of the same test. Two equivalent (but different) tests are administered, scores are correlated, and a reliability coefficient is calculated. A test would be deemed reliable if differences in one test’s observed scores correlate with differences in an equivalent test’s scores.

 Parallel forms are very similar, but with one major difference: the observed score has the same mean and variance. This isn’t a requirement for alternate forms reliability, which just uses different versions of the same test. That said, you can only interpret correlation between tests in a meaningful way if the alternate forms are also parallel. Proving that two tests are parallel is practically impossible (Furr & Bacharach, 2008); Although interpreting correlations is theoretically possible, it isn’t usually a feasible “real life” option. In addition, although two tests might seem equivalent, a different question here and there might results in the test measuring completely different constructs.

 As noted above, it’s extremely challenging to interpret reliability with parallel forms. However, you can take several steps to ensure that your reliability estimate is as good as possible:

 Practice and transfer effects can be eliminated if half the subjects take test A followed by test B, and half the subjects take test B followed by test A. Note that although this seems a little strange (what’s the point in subjects taking two different tests instead of one?), remember that you’re assessing reliability here, not subject performance. Once you’ve determined that the tests are reliable, you can administer test A or test B to a subject, with the knowledge that the two tests are equivalent in every way.

 If you’re working on data analysis, there are many tools available to provide insights to your data. These tools include ANOVA and regression analysis. At first glance, the two methods may look similar—so similar in fact, that you wouldn’t be the first to completely confuse the two.

 Both result in continuous output (Y) variables. And both can have continuous variables as (X) inputs—or categorical variables. If you use exactly the same structure for both tests (see the demonstration of dummy coding here for an example), they are effectively the same; In fact, ANOVA is a “special case” of multilevel regression.

 The preferred inputs for ANOVA are categorical variables. You can think of ANOVA as a regression with a categorical predictors (Pruim, n.d.). However, you can choose to use continuous variables. The opposite is true: use continuous variables for regression with categorical variables as a second option. The reason that categorical variables are a second option in regression analysis is that you can’t just plug in categorical data into your regression model; You have to code dummy variables first. Dummy coding is where you give your categorical variables a numeric value, like “1” for black and “0” for white.

 A bounded region has either a boundary or some set of or constraints placed upon them. In other words, a bounded shape cannot be an infinitely large area—it’s defined by a set of measurements or parameters. A square, drawn on a Cartesian plane, has a natural boundary (four sides). Other shapes and surfaces can be more challenging to visualize. For example, the surface area of a cylinder has constraints of length, height and circumference.

 In general, look for regular shapes (triangles, rectangles, squares) or as close to regular shapes as you can get (like the “curvy” triangle A). They should have bases that follows a line horizontal to the x-axis; These shapes are easier to integrate. Caution: Sometimes it’s actually easier to divide the shape up horizontally, instead of the vertical slice shown above. Refer to this pdf (from MIT) for an example of when you would want to slice horizontally.

 The left-hand bound is easy to see from the graph (x = 0); The right-hand side is x = 4; You can find it with the intersection feature of a graphing calculator or with algebra (See: How to find the intersection of two lines).

Post a Comment

0Comments
* Please Don't Spam Here. All the Comments are Reviewed by Admin.
Post a Comment (0)

#buttons=(Accept !) #days=(30)

Our website uses cookies to enhance your experience. Learn More
Accept !
To Top