7. Correlation Testing#

https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

https://www.geeksforgeeks.org/spearmans-rank-correlation/

https://en.wikipedia.org/wiki/Spearman’s_rank_correlation_coefficient

(https://en.wikipedia.org/wiki/Monotonic_function)

https://ui.adsabs.harvard.edu/abs/2019ApJ…874…32R/abstract - you can’t fit a line sometimes.

(Following text is from https://www.scribbr.com/statistics/pearson-correlation-coefficient/)

7.1. When to use the Pearson correlation coefficient#

The Pearson correlation coefficient (r) is one of several correlation coefficients that you need to choose between when you want to measure a correlation. The Pearson correlation coefficient is a good choice when all of the following are true:

  • Both variables are quantitative: You will need to use a different method if either of the variables is qualitative.

  • The variables are normally distributed: You can create a histogram of each variable to verify whether the distributions are approximately normal. It’s not a problem if the variables are a little non-normal.

    • scipy has ways to remove this assumption.

  • The data have no outliers: Outliers are observations that don’t follow the same patterns as the rest of the data. A scatterplot is one way to check for outliers—look for points that are far away from the others.

  • The relationship is linear: “Linear” means that the relationship between the two variables can be described reasonably well by a straight line. You can use a scatterplot to check whether the relationship between two variables is linear.

7.2. Spearson vs. Spearman’s rank correlation coefficients#

Spearman’s rank correlation coefficient is another widely used correlation coefficient. It’s a better choice than the Pearson correlation coefficient when one or more of the following is true:

  • The variables are ordinal.

  • The variables aren’t normally distributed.

  • The data includes outliers.

  • The relationship between the variables is non-linear and monotonic.