Lately I've been frustrated by attempting to rewrite the regression coefficients of a linear model in terms of the correlation coefficients between my explanatory variables. I was hoping, rather, to find some resource somewhere with the formulae I require. Everything I've found, au contraire, in textbooks, lecture notes, the intarwebs, etc., has fallen terribly short. For now I'll concentrate on one little problem that confused me greatly for a few days of otherwise-blissful vacation.

In the

*statement*of multiple linear regression, there are three

*different*uses of the word

*independence*:

- Linear regression literature will speak of the
*independent variables*, usually denoted X_ij for all j, which are also called the explanatory variables, the regressors, the predictors. Regression likes to call them "independent" variables simply to contrast them with the "dependent" variable, y_i, also called the response variable, the regressand, etc. - A fundamental assumption of most linear regression analysis is that each sampled data point is
*statistically independent*of the others. That is, the random error terms (epsilon_i) are statistically independent of each other. This says nothing of the explanatory variables (but does help to describe the response variable y_i). - In
*multiple*linear regression (where we have multiple explanatory variables, that is, X_ij for a range of j), the "independent" explanatory variables must be*linearly independent*of each other. If this is not true then there is the problem of "multicollinearity", and the regression coefficients are not uniquely specified.

In particular, one can quite well perform a linear regression when some explanatory variables are statistically

*correlated*. But not if they are

*perfectly*correlated (a correlation coefficient of +/- 1), as that would mean they are

*linearly dependent*, and we violate rule 3.

What boggles my mind, is that after gouging through dozens of references, I have not found

*any*resource that mentions these multiple inconsistent uses of the same word in the same setting! At least Wikipedia actually states that it means

*linear*independence for rule 3, on its regression analysis page. But some texts may as well have written: "The independent variables may be non-independent so long as they are independent", although that might have answered my question if I knew how to interpret it.

And in case you were worried, there are of course ways to continue an analysis if the assumptions of 2 and/or 3 are not met. Although, these methods may be undesirable. Mwaha.

## Error

Your IP address will be recorded

You must follow the Privacy Policy and Google Terms of use.