The central idea is to extract linear combinations of the inputs as derived features, and then model the target as a nonlinear function of these features. Assume we have an input vector X with p components, and a target Y. Let ωm, m=1,2,...,M, be unit p-vectors of unknown parameters. The projection pursuit regression(PPR) model has the form f(X)=Σgm(ωmTX). This is an additive model, but in the derived features Vm=ωmTX rather than the inputs themselves. The functions gm are unspecified and are estimated along with directions ωm using some flexible smoothing method. The scalar variable Vm=ωmTX is the projection of X onto the unit vector ωm, and we seek ωm so that the model fits well, hence the name "projection pursuit." As a result, the PPR model is most usefull for prediction, and not very usefull for producing an understandable model for the data. How do we fit a PPR model, given training data (xi, yi), i=1,2,...,N? We seek the approximate minimizers of the error function
Σi=1N [yi-Σm=1Mgm(ωmTxi)]2.