Finance, Data Science

P-Spline Implementation on R

Objective

By using P-Splines (Penalized Smoothing Splines) to draw a smooth line on two dimensional scatter plot on R.

Code Examples

Sample code from Package ‘pspline’

data(cars)
attach(cars)
plot(speed, dist, main = "data(cars) & smoothing splines")  # scatter plot the original data
lines(sm.spline(speed, dist, df=10), lty=1, col = "red", )  # draw the P-Spline curve with degree of freedom 10
lines(sm.spline(speed, dist, df=100), lty=1, col = "green")  # draw the P-Spline curve with degree fo freedom 100
legend("topleft", legend = c('df=10', 'df=100'), col = c("red", "green"),  lty=c(1, 1))

Xixia

We confirm that the larger df (=degree of freedom) leads to more zigzagged line (bias-variance tradeoff).

Artificial Data

$$y = \sin (x) - 1.5\cos (x/2 - 5) + 0.3 \sin (x * 10)$$
x = (1:50)/3
y = sin(x) - cos(x/2 - 5)*1.5 + 0.3*sin(x*10)
plot(x, y)
lines(sm.spline(x, y, df=5), lty=1, col = "red")
lines(sm.spline(x, y, df=10), lty=1, col = "green")
legend("bottomleft", legend = c('df=5', 'df=10'), col = c("red", "green"),  lty=c(1, 1))

Xixia

Application on Real Data

For example, we can apply P-Spline on stock transaction data to extract intraday seasonality

X-axis: time stamps transaction occurred.
Y-axis: trade interval from the last trade Xixia

We can confirm the intraday seasonality with P-Spline (shorter transaction interval right after market opens, and right before market closes.)

Mathematical Background

Now, consider

$$y _ j =x\left(t _ j \right)+\varepsilon _ j , \; j=1, \ldots, n.$$

Here, \(x\left(t _ j \right)\) is the spline’s prediction, \(y _ j\) are the actual observed points.

Now, how we decide \(x\left(t _ j \right)\) ?

The first method comes to our mind is probably least square method.

$$ S S E(x | y)=\sum_ j \left(y_ j -x\left(t_ j \right)\right) ^ 2$$

P-Spline utilizes this idea of least square method.

Now, we consider what kind of line we want to draw.

Xixia

If we have these ↑ points, the line we want to draw would look like this↓

Xixia

We can also draw a line like this:

Xixia But this↑ is not what we wanted. We penalize this zigzag in the P-Spline.

We define penalty as:

$$ \sum _ {i=1} ^ N \left(y-\alpha-f(x ; \beta) +\sum _ {p=1} ^ P f\left(z _ {i p} \right) \right) ^ 2 +\lambda \int \left(f ^ {\prime \prime} (x) \right) ^ 2 dx$$

References