Difference between revisions of "Student Distribution"

From TORI
Jump to navigation Jump to search
Line 18: Line 18:
 
</div>
 
</div>
   
[[Student Distribution]] is term from scientific [[slang]]; it may refer to hololomorphnc function of two variables
+
[[Student Distribution]] is term from scientific [[slang]]; it may refer to hololomorphic function of two variables
   
 
<math>\displaystyle
 
<math>\displaystyle
Line 46: Line 46:
 
https://en.wikipedia.org/wiki/Student%27s_t-distribution
 
https://en.wikipedia.org/wiki/Student%27s_t-distribution
 
</ref>.
 
</ref>.
  +
<!--
 
In [[TORI]], the [[slang]] is allowed; especially, if supplied with the definition and causes no confuions.
 
!-->
 
 
==[[Ctudent]]==
 
==[[Ctudent]]==
   
Line 200: Line 198:
 
\( \displaystyle
 
\( \displaystyle
 
S_n = \sqrt{V_n}=\sqrt{\frac{n}{n\!-\!2}}=\sqrt{\frac{N\!-\!1}{N\!-\!3}} \)
 
S_n = \sqrt{V_n}=\sqrt{\frac{n}{n\!-\!2}}=\sqrt{\frac{N\!-\!1}{N\!-\!3}} \)
 
<!--
 
The «Scaled Student» function can be defined sith
 
 
\(\displaystyle
 
\mathrm{ScaledStudent}_n(x)=
 
S_n \ \mathrm{Student}_n(S_n \ x) =
 
\mathrm{Student}_n \! \left(\ x\ \sqrt{\frac{n}{n\!-\!2}}\ \right) \ \sqrt{\frac{n}{n\!-\!2}}
 
\)
 
 
Explicit plot of this function is shown in the second picture at the top.
 
 
!-->
 
   
 
At least 4 measurements (\(n\!=\!3\) , \(N\!=\!4\)) are necessary in order to characterize, how close to the «true value» \(A\) the estimate \(a_0\) is expected to happen, in terms of the mean-square deviation.
 
At least 4 measurements (\(n\!=\!3\) , \(N\!=\!4\)) are necessary in order to characterize, how close to the «true value» \(A\) the estimate \(a_0\) is expected to happen, in terms of the mean-square deviation.
Line 293: Line 278:
 
\)
 
\)
   
can be treated as a «mean-square error or the evaluation of quantity \(A\) with set of \(N\) independent measurements».
+
can be treated as a «mean-square error of the evaluation of quantity \(A\) with set of \(N\) independent measurements».
   
 
\(V_n\) can be interpreted also as «The mean square deviation» of evaluated \(a_0\) from the «true value» \( A \). However, it assumes that \(N\) is integer and \(\ N\!>\!3 \ \) .
 
\(V_n\) can be interpreted also as «The mean square deviation» of evaluated \(a_0\) from the «true value» \( A \). However, it assumes that \(N\) is integer and \(\ N\!>\!3 \ \) .

Revision as of 01:59, 11 May 2024


GauStudent12345big.png
\(y=\mathrm{Student}(n,x)\) for \(n\!= 1,2,3,4,5,\infty \)
Student5map.png
\( u+\mathrm i v = \mathrm{Student}(5, x+\mathrm i y)\)
GauMap.png
\( u+\mathrm i v = \mathrm{Student}(\infty, x+\mathrm i y)\)

Student Distribution is term from scientific slang; it may refer to hololomorphic function of two variables

\(\displaystyle \mathrm{Student}(\nu,t) =\mathrm{Student}_\nu(t) = \) \(\displaystyle \frac{\Gamma\left(\frac{\nu+1}{ 2 }\right)}{\sqrt{\pi\ \nu}\ \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{~ t^2}{\nu }\right)^{-(\nu+1)/2}\)

Here \(\Gamma\) refers to Gamma function, \(\Gamma(z)\) \(=\) Factorial\((z\!-\!1)\) .

In TORI, the Factorial is implemented better than the Gamma function, so, the Editor uses

\(\displaystyle \mathrm{Sturent}(\nu,t) = \) \(\displaystyle \frac{\mathrm{Factorial}\left(\frac{\nu-1}{2}\right)}{\sqrt{\pi\ \nu}\ \mathrm{Factorial}\left(\frac{\nu}{2}-1\right)} \left(1+\frac{t^2}{\nu}\right)^{-(\nu+1)/2}\)

For integer values of the first argument and real values of the second argument, Function \(\mathrm{Student}\) is shown in figure above.

The complex map for \(\nu=5\) is shown in figure at right.

The second figure shows the modified version of functions \(\mathrm{Student}\); they are scaled in such a way, that the second moment (id est, the mean square deviation) is unity.

The more official, rigorous (and not so easy to handle) name for the Student Distribution is «Probability density of the Student's t-distribution».
This article uses copipasts from https://en.wikipedia.org/wiki/Student%27s_t-distribution [1].

Ctudent

Integral of the Student Distriution with respect to the second argument may be referred as Student Cumulative, or Ctudent, or, more formally, «Cumulative Student distribution function»;

\(\displaystyle \mathrm{Ctudent}(\nu,t) = \int_{-\infty}^t \mathrm{Student}(\nu,x) \mathrm{d} x \)

It can be expressed through the Incomplete beta function or though the Hypergeometric function. Up to year 2024, none of these two functions is implemented at TORI.

Editor did not find any convenient and usual notations for functions Student and Ctudent; in future, names Student and Ctudent may be replaced to something more suitable and convenient.

Special cases

Wikipedia suggests simple expressions for the first 5 Students:

Certain values of \(\ \nu\ \) give a simple form for Student's t-distribution.

\(\ \nu\ \) PDF CDF notes
1 \(\ \frac{\ 1\ }{\ \pi\ (1 + t^2)\ }\ \) \(\ \frac{\ 1\ }{ 2 } + \frac{\ 1\ }{ \pi }\ \arctan(\ t\ )\ \) See Cauchy distribution
2 \(\ \frac{ 1 }{\ 2\ \sqrt{2\ }\ \left(1+\frac{t^2}{2}\right)^{3/2}}\ \) \(\ \frac{ 1 }{\ 2\ }+\frac{ t }{\ 2\sqrt{2\ }\ \sqrt{ 1 + \frac{~ t^2\ }{ 2 }\ }\ }\ \)
3 \(\ \frac{ 2 }{\ \pi\ \sqrt{3\ }\ \left(\ 1 + \frac{~ t^2\ }{ 3 }\ \right)^2\ }\ \) \(\ \frac{\ 1\ }{ 2 } + \frac{\ 1\ }{ \pi }\ \left[ \frac{ \left(\ \frac{ t }{\ \sqrt{3\ }\ }\ \right) }{ \left(\ 1 + \frac{~ t^2\ }{ 3 }\ \right) } + \arctan\left(\ \frac{ t }{\ \sqrt{3\ }\ }\ \right)\ \right]\ \)
4 \(\ \frac{\ 3\ }{\ 8\ \left(\ 1 + \frac{~ t^2\ }{ 4 }\ \right)^{5/2}}\ \) \(\ \frac{\ 1\ }{ 2 } + \frac{\ 3\ }{ 8 } \left[\ \frac{ t }{\ \sqrt{ 1 + \frac{~ t^2\ }{ 4 } ~}\ } \right] \left[\ 1 - \frac{~ t^2\ }{\ 12\ \left(\ 1 + \frac{~ t^2\ }{ 4 }\ \right)\ }\ \right]\ \)
5 \(\ \frac{ 8 }{\ 3 \pi \sqrt{5\ }\left(1+\frac{\ t^2\ }{ 5 }\right)^3\ }\ \) \(\ \frac{\ 1\ }{ 2 } + \frac{\ 1\ }{\pi}{ \left[ \frac{ t }{\ \sqrt{5\ }\left(1 + \frac{\ t^2\ }{ 5 }\right)\ } \left(1 + \frac{ 2 }{\ 3 \left(1 + \frac{\ t^2\ }{ 5 }\right)\ }\right) + \arctan\left( \frac{ t }{\ \sqrt{\ 5\ }\ } \right)\right]}\ \)
\(\ \infty\ \) \(\ \frac{ 1 }{\ \sqrt{2 \pi\ }\ }\ e^{-t^2/2}\) \(\ \frac{\ 1\ }{ 2 }\ {\left[ 1 + \operatorname{erf}\left( \frac{ t }{\ \sqrt{2\ }\ } \right) \right]}\ \) See Normal distribution, Error function

C++ implementation

Usually, the Student Distribution appears with integer (and not so big) values of the first argument ( let it be \(n\)) and real values of the second argument (let it he \(x)\).

For this case with \(0<n<11\), the implementation Student.cin is suggested; it is repeated below.

float Student(int n,float x)
{float c[11]={0.,
0.31830988618379,
0.35355339059327,
0.36755259694786,
0.37500000000000,
0.37960668982249,
0.38273277230987,
0.38499145083227,
0.38669902096139,
0.38803490887167,
0.38910838396603};
return c[n]*pow(1.+x*x/n, -0.5*(n+1));
}
SledgehammerNut.png

For \(n>10\), the Gaussian distribution is recommended instead of the Student Distribution. In most of cases, there is no need to use a sledgehammer to crack a nut (as it is shown in figure at right). The following relation takes place:

\(\displaystyle \mathrm{Student}(\infty,x) = \mathrm{Gau}(x)= \frac{\exp(-z^2/2)}{\sqrt\pi} \)

The implementation above evaluates the density of the Student Distribution for real values of the argument. However the case of complex argument is also not a big deal.

As the Factorial function is already implemented, the complex double implementation of the Student Distribution is straight forward for complex values of the arguments. That implementation had been used to evaluate the array of coefficients in the code above.

Application

Large number of data

The Student Distribution is basic tool at the statistical analysis of experimental data, when some value is measured many times, and the errors at the measurement are supposed to be independent and have the Gaussian distribution.

Roughly, if some quantity is measured \(N\) times, and values \(a_n\) are obtained, with \(n\) from \(1\) to \(N\)
then the quantity is estimated to be

\( \displaystyle a_0 = \frac{1}{N} \sum_{n=1}^N a_n \)

and the «expected error», «possible deviation» of this estimate from some «true value», in its turn, is estimated as

\( \displaystyle D = \sqrt{\frac{1}{(N\!-\!1) (N\!-\!3)} \sum_{n=1}^N (a_n-a_0)^2} \)

At large \(N\), the error of estimates \(А\), id est, \(a_0-A\) is supposed to have the Gaussian distribution. Assuming this, the colleagues actually use the \(\mathrm{Student}_\infty \) without to mention it.

Mean-square deviation

At small \(N\), the peak of the density of distribution of deviations of \(A\) from the true value is sharper and have wide "wings", than the Gaussian at the same mean-scare deviation; for \(N=2\) and \(N=3\), the mean-square deviation is not defined as a real number.

The Student Distribution has sense for the analysis or rare phenomena, when new, more accurate measurement happens to be far away from the expected «Gaussian». The new value may happen to be still at the tail of the Student Distribution.

In the primitive approach, to show the statistical significance of the estimate, instead of the Gaussian bell, the Srudent distribution of the same width should be iused; the first parameter \( \nu=N-1 \), where \(N\) is number or independent measurements performed.

For \(N\) measurements, the distribution of the averaged results can be expressed with function \(\mathrm{Student}_{N-1} \)

For the Student distribution, the variance

\(\displaystyle V_n=\int_{-\infty}^{\infty} \mathrm{Student}(n,x) \ x^2 \mathrm d x = \frac{n}{n-2} \)

with \( n = N\!-\!1 \).

The mean square deviation is

\( \displaystyle S_n = \sqrt{V_n}=\sqrt{\frac{n}{n\!-\!2}}=\sqrt{\frac{N\!-\!1}{N\!-\!3}} \)

At least 4 measurements (\(n\!=\!3\) , \(N\!=\!4\)) are necessary in order to characterize, how close to the «true value» \(A\) the estimate \(a_0\) is expected to happen, in terms of the mean-square deviation.

Advanced

The less data one has, the more advanced formalism should be applied to use them well.

Let     \( \displaystyle q = \frac{1}{N\!-\!1} \sum_{m=1}^{N} (a_m - a_0)^2 \)

Then, quantity     \( \displaystyle t=\frac{a_0-A}{\sqrt{q/N}} \)

is distributed according to a Student's t-distribution with     \(n=N-1\) degrees of freedom.

Letter t, used as the identifier above, happen to be part of the full name «Student's t-distribution density».

In such a way, instead of the bell \(\mathrm{Gau}\), the \(\mathrm{Student}_{N-1} \) should be used.

This authentically takes into account the spreading of the mean-square deviation in compare to the Gaussian.

This is the main reason not to adjust the second moment of the Student Distribution; for this reason, namely Student function (and not the «scaled Student») is used.

The resulting bell    

\(\displaystyle F(x)=\mathrm{Student}_{N-1}\!\left( \frac{x-a_0}{\sqrt{q/N}} \right) \frac{1}{\sqrt{q/N}} \)

characterizes the precision of approximation of the «true value» \(A\) with estimate \(a_0\).

Collecting the notations above, it can be written as follows:

\(\displaystyle F(x)=\mathrm{Student}_{N-1}\left( \frac{x-a_0}{\sqrt{\frac{1}{\displaystyle(N\!-\!1) N} \sum_{m=1}^{N} (a_m - a_0)^2}} \right) \frac{1}{\displaystyle{\sqrt{\frac{1}{(N\!-\!1) N} \sum_{m=1}^{N} (a_m - a_0)^2}}} \)

or even

\(\displaystyle F(x)=\mathrm{Student}_{N-1}\left( (x-a_0) \sqrt{ \frac{(N\!-\!1) N} { \sum_{m=1}^{N} (a_m - a_0)^2} } \right) \sqrt{ \frac{(N\!-\!1) N} { \sum_{m=1}^{N} (a_m - a_0)^2} } \)

At the slang of physicists, function \(F(x)\) is considered as probability density, that the «true value» \(A\) happen to be \(x\).

Such an approach is practical and easy to interpret.

However, from the point of view of the rigorous mathematics, such a vulgarization is nonsense.
In the similar way, the speculations about «derivatives» at times of Isaac Newton were nonsense, because that time, the Mathematical Analysis still did not exist. In a similar way even at century 21, some colleagues repeat the wrong statement
«Probability to wine in a casino is always less than half at any strategy».
Even such a wrong statement may have a sense, while it keeps the adept from gambling.

For the Student Distribution, the variance

\( \displaystyle V_n=\int_{-\infty}^{\infty} \mathrm{Student}_n(x) \ x^2 \ \mathrm d x = \frac {n}{n-2} \)

with \(n\!=\!N\!-\!1\) , quantity

\(\displaystyle V_n=\sqrt{ \frac { \sum_{m=1}^{N} (a_m - a_0)^2} {(N\!-\!3)\ N} } \)

can be treated as a «mean-square error of the evaluation of quantity \(A\) with set of \(N\) independent measurements».

\(V_n\) can be interpreted also as «The mean square deviation» of evaluated \(a_0\) from the «true value» \( A \). However, it assumes that \(N\) is integer and \(\ N\!>\!3 \ \) .

References

Keywords

«[[]]», «Density of probability», «Probability», «Student Distribution»,