8. Pearson r & Spearman \(\rho\)#

8.1. Pearson correlation coefficient#

import numpy as np
from scipy import stats
x, y = [1, 2, 3, 4, 5, 6, 7], [10, 9, 2.5, 6, 4, 3, 2]
res = stats.pearsonr(x, y)
res

Check out the object res. Use dir(res) and help to figure it out.

8.1.1. p-values#

A lot of statistics uses p-values rather than \(\sigma\) when measuring significance. A p-value is the probability that the observation came from the null hypothesis (for Pearson this is no correlation). A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis.

P-values can be converted to \(\sigma\) by using the area under a Gaussian curve. For the basics 1-, 2-, and 3-\(\sigma\) you can use the 68–95–99.7 rule. But pay attention if you want to account for the area in one of non-of the tails (are you testing for a positive correlation or just a non-zero correlation). Assuming you are testing for a non-zero correlation (two tail area of interest) the 2-, and 3-\(\sigma\) p-values are 0.05 and 0.003.

8.1.2. More pearsonr examples#

For smaller samples, the perturbation test can lead to a more accurate estimate of the p-value

rng = np.random.default_rng()
method = stats.PermutationMethod(n_resamples=np.inf, random_state=rng)
stats.pearsonr(x, y, method=method)

To perform the test under the null hypothesis that the data were drawn from uniform distributions:

rng = np.random.default_rng()
method = stats.MonteCarloMethod(rvs=(rng.uniform, rng.uniform))
stats.pearsonr(x, y, method=method)

N-dimensional arrays

rng = np.random.default_rng()
x = rng.standard_normal((8, 15))
y = rng.standard_normal((8, 15))
stats.pearsonr(x, y, axis=0).statistic.shape  # between corresponding columns
stats.pearsonr(x, y, axis=1).statistic.shape  # between corresponding rows

SciPy allows for the creating of these confidence interval. Here is an asymptotic 90% confidence interval.

x, y = [1, 2, 3, 4, 5, 6, 7], [10, 9, 2.5, 6, 4, 3, 2]
res = stats.pearsonr(x, y)
res.confidence_interval(confidence_level=0.9)

Can even build this interval via bootstrap—to be defined lated in the class.

8.2. Spearman’s rank correlation coefficient#

import numpy as np
from scipy import stats
res = stats.spearmanr([1, 2, 3, 4, 5], [5, 6, 7, 8, 7])
res.statistic
res.pvalue

About what sigma is this p-value?

rng = np.random.default_rng()
x2n = rng.standard_normal((100, 2))
y2n = rng.standard_normal((100, 2))
res = stats.spearmanr(x2n)
res.statistic, res.pvalue
res = stats.spearmanr(x2n, y2n)
res.statistic
res.pvalue

What is going on in this last example?

8.3. Suggested Reading#