Saturday, February 25, 2012

Splines in Linear Regression (LJ 2/27)

I worry about you Averyl, having to read some of my learning journals. This one will be brutal. 

A lot of the papers I have read (skimmed) written by Dr. Woods involve modern regression model methods, such as splines (surprisingly, that graph is an example of a linear, truly linear, model using splines). Currently, I am taking a class on modern regression model (although in my case, modern means 1980, so not cutting edge methods like Dr. Woods), and I realized I will definitely need to understand this core material for my project. In order to make sense of what I have learned, I will briefly detail some modern regression methods and how they will relate to my project.

Data is not linear, regardless of how much statisticians wish it were. Yet there is some much clean and intuitive theory about linear models. Mathematically, linearity is optimal. Instead of wading through murky mathematics and kooky calculations (many of which have been proved impossible to solve), statisticians have adapted linear ideas to fit curvy data. Some methods include splines, smoothing kernels, transformations, and automatic smoothers. Current thought on experimental design relies heavily on such methods, especially splines and smoothers. 

Splines: Sometimes, between different experimental groups, the effects of a drug is are different, i.e. the slopes are different. Therefore, one needs to use different lines to predict for the different groups. But you run into problems with continuity, so you extend the basis. (NOTE to self: review linear algebra; Dr. Woods is very mathematical.) When designing experiments, it is important to take the supposed differences into account in order to reduce both variance and bias, the paradigm of statistics.

Smoothers: When the underlying distribution of the points is not a linear, or even a piecewise collection of them,  you can only predict point by point. Using the same knot selection idea as in splines, an alternative approach is to put knots at all distinct x values and control the fit of the line (actual linear line) through regularization. Using a smoother matrix formulated from the data, the effective degrees of freedom can be chosen. In design theory and Dr. Woods papers, there is some comment about the selection of effective degrees of freedom. In class, we use a greedy algorithm to chose them, but I would like to be able to learn more about other efficient and conservative ways in choosing the effective degrees of freedom. 

Such methods are very applicable in design theory, or at least the more sophisticated and elegant relatives of the above methods. It is important for me to have a strong basis in the above methods so I may have somewhere to build from. 

No comments:

Post a Comment