MATHCAD, Geomagic Freeform, Mecsoft VisualCAM & CAD CAM CNC software support discussion forum. Moderated.

Moderators: caddit, evanish, Moderators

#776 by evanish
Fri Sep 25, 2009 7:38 am
Descriptive statistics is commonly used in the field of medical research studies. It is used to quantitatively describe the main features of a set of data. Inferential statistics differs in that it is used to reach conclusions that generalise beyond the immediate data. Descriptive statistics are used to present quantitative descriptions of large amounts of data in a clear and understandable way. MathCAD offers a variety of descriptive statistical functions to reduce large amounts of data into a much simpler summary. Descriptive statistics are generally presented along with more formal analyses, to give the audience an overall sense of the data being analysed.

I plan to go through the following basic functions:
1. mean(A,B,C,...)
2. median(A,B,C,...)
3. mode(A,B,C,...)
4. percentile(v,p)

These functions take single or multiple scalars or arrays, and return the mean, median, and mode, respectively, giving measures of the location of a data point relative to the rest of the distribution. The best choice of location estimator depends on the general dispersion or distribution of your data.

Mean
In statistics we refer to this also as the "arithemetic mean" and is the most commonly used type of average. To calculate the mean of a set of numbers this involves simply adding the total sum of the numbers in a set divided by the number of items in the set. The other types of averages such as the median and th mode will be discussed later on.

Example:
Consider the following numeric data:

Image


The arithmetic mean or average of N values is given by the following formula:

Image
Image

The mean is sensitive to changes in values of one or more data points:

Image
Image


The mean is greatly affected by significant outliers. So you may find that the mean is a poor description of the central location if this is the case.
You may choose to trim the outliers and find the "trimmed mean" for a better estimate.

Consider the "trimmed" numeric data:

Image

As you can see I have chosen to leave out the value 46 which was a significant outlier in this set.

Image


MathCAD automatically readjusts all the values of the formulas and recalculate the new mean for this data set.
#777 by evanish
Fri Sep 25, 2009 7:54 am
Median
The median, or "middle value," of a set of data is another description of central location. The median depends on the relative positions of the data, not on the actual values of every data point, and so is relatively insensitive to small changes in individual data values. Mathcad's median function does not accept complex numbers.

A median is the value falling in the middle when data are sorted in ascending order (smallest to largest).

Example:

Image

If there are an odd number of data, there is one data value which is the median.
e.g. if a<b<c then the median for the list {a,b,c} is b

Example:

Image
Image

If there is an even number of data, the median is taken to the be mean of the two middle numbers.

e.g. if a<b<c<d then the median for the list {a,b,c,d} is the mean of b and c i.e. (b+c)/2.

Example:

Image
Image


Mode
The mode refers to the value that occurs the most frequently in a data set.

Example:
In a data set where there are no repeated values, you see an error message like the one shown below.

Image

Example:
In the case where more than one data value is repeated with the same frequency, the mode function also gives an error, with the message "multimodal."

Image


Percentiles and quartiles
Percentiles measure which values of a data set fall below which a certain percentage of the total number of points/observations. E.g. the 20th percentile of a range of data is the value below which 20% of the observations may be found.

The 25th percentile is also known as the first quartile (Q1); the 50th percentile as the median or second quartile(Q2); the 75th percentile as the third quartile (Q3).

Example:
If your data set has 11 points/entries, the 50th percentile is the value of the median (or 6th point) and so on.
The percentile function takes a vector of data and a percentage between 0 and 1 and returns the value of the percentile.

Image

Note that if the indexed value occurs between two data points, then we have to calculate the amount to add or subtract from an actual point in the data set to give the percentile. A quartile is one of the three percentiles that mark 1/4 of the data:

Example:
Image

Quartiles are best used for graphical analysis of data in Quantile-Quantile plots.