普通最小二乘法线性回归

若数据集DDDnnn个属性描述,则线性回归的假设函数为:
hw,b(x)=∑i=1nwixi+b=wTx+b h_{\boldsymbol{w}, b}(\boldsymbol{x})=\sum_{i=1}^{n} w_{i} x_{i}+b=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}+b hw,b(x)=i=1nwixi+b=wTx+b
其中,w∈Rn\boldsymbol{w}\in \mathbb{R}^nwRnb∈Rb\in \mathbb{R}bR为模型参数。

为了方便,我们通常将bbb纳入权向量w\boldsymbol{w}w,作为w0w_0w0,同时为输入向量x\boldsymbol{x}x添加一个常数1,作为x0x_0x0.
w=(b,w1,w2,…wn)Tx=(1,x1,x2,…xn)T \begin{array}{c}\boldsymbol{w}=\left(b, w_{1}, w_{2}, \ldots w_{n}\right)^{\mathrm{T}} \\\boldsymbol{x}=\left(1, x_{1}, x_{2}, \ldots x_{n}\right)^{\mathrm{T}}\end{array} w=(b,w1,w2,wn)Tx=(1,x1,x2,xn)T

此时,假设函数为:
hw(x)=∑i=0nwixi=wTx h_{\boldsymbol{\boldsymbol{w}}}(\boldsymbol{x})=\sum_{i=0}^{n} w_{i} x_{i}=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x} hw(x)=i=0nwixi=wTx

其中,w∈Rn+1\boldsymbol{w}\in \mathbb{R}^{n+1}wRn+1,通过训练确定模型参数w\boldsymbol{w}w后,便可使用模型对新的输入实例进行预测。

使用均方误差(MSE)作为损失函数,假设训练集DDDmmm个样本,均方误差损失函数定义为
J(w)=12m∑i=1m(hw(xi)−yi)2=12m∑i=1m(wTx−yi)2 \begin{aligned}J(\boldsymbol{w}) &=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\boldsymbol{w}}\left(\boldsymbol{x}_{i}\right)-y_{i}\right)^{2} \\&=\frac{1}{2 m} \sum_{i=1}^{m}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}-y_{i}\right)^{2}\end{aligned} J(w)=2m1i=1m(hw(xi)yi)2=2m1i=1m(wTxyi)2

损失函数J(w)J(w)J(w)最小值点是其极值点,可先求J(w)J(w)J(w)www的梯度并令其为0,再通过解方程求得。

计算J(w)J(\boldsymbol{w})J(w)的梯度:
∇J(w)=12m∑i=1m∂∂w(wTxi−yi)2=12m∑i=1m2(wTxi−yi)∂∂w(wTxi−yi)=1m∑i=1m(wTxi−yi)xi \begin{aligned}\nabla J(\boldsymbol{w}) &=\frac{1}{2 m} \sum_{i=1}^{m} \frac{\partial}{\partial \boldsymbol{w}}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right)^{2} \\&=\frac{1}{2 m} \sum_{i=1}^{m} 2\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \frac{\partial}{\partial \boldsymbol{w}}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \\&=\frac{1}{m} \sum_{i=1}^{m}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \boldsymbol{x}_{i}\end{aligned} J(w)=2m1i=1mw(wTxiyi)2=2m1i=1m2(wTxiyi)w(wTxiyi)=m1i=1m(wTxiyi)xi

以上公式使用矩阵运算描述形式更为简洁,设:
X=[1,x11,x12…x1n1,x21x22…x2n⋮⋮⋮⋱⋮1,xm1xm2…xmn]=[x1Tx2T⋮xmT] \boldsymbol{X}=\left[\begin{array}{ccccc}1, & x_{11}, & x_{12} & \ldots & x_{1 n} \\1, & x_{21} & x_{22} & \ldots & x_{2 n} \\\vdots & \vdots & \vdots & \ddots & \vdots \\1, & x_{m 1} & x_{m 2} & \ldots & x_{m n}\end{array}\right]=\left[\begin{array}{c}\boldsymbol{x}_{1}^{\mathrm{T}} \\\boldsymbol{x}_{2}^{\mathrm{T}} \\\vdots \\\boldsymbol{x}_{m}^{\mathrm{T}}\end{array}\right] X=1,1,1,x11,x21xm1x12x22xm2x1nx2nxmn=x1Tx2TxmT
y=[y1y2⋮ym] \boldsymbol{y}=\left[\begin{array}{c}y_{1} \\y_{2} \\\vdots \\y_{m}\end{array}\right] y=y1y2ym

w=[bw1w2⋮wn] \boldsymbol{w}=\left[\begin{array}{c}b \\w_{1} \\w_{2} \\\vdots \\w_{n}\end{array}\right] w=bw1w2wn

那么,梯度计算公式可写为:
∇J(w)=1m∑i=1m(wTxi−yi)xi \nabla J(\boldsymbol{w})=\frac{1}{m} \sum_{i=1}^{m}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \boldsymbol{x}_{i} J(w)=m1i=1m(wTxiyi)xi
=[x1,x2,…,xm][wTx1−y1wTx2−y2⋮wTxm−ym]=\left[\begin{array}{c}\boldsymbol{x}_1,\boldsymbol{x}_2,\dots,\boldsymbol{x}_m\end{array}\right]\left[\begin{array}{c}\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{1}-y_{1} \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{2}-y_{2} \\\vdots \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{m}-y_{m}\end{array}\right]=[x1,x2,,xm]wTx1y1wTx2y2wTxmym
=[x1,x2,…,xm]([wTx1wTx2⋮wTxm]−[y1y2⋮ym])=\left[\begin{array}{c}\boldsymbol{x}_1,\boldsymbol{x}_2,\dots,\boldsymbol{x}_m\end{array}\right]\left(\left[\begin{array}{c}\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{1} \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{2} \\\vdots \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{m}\end{array}\right]-\left[\begin{array}{c}y_{1} \\y_{2} \\\vdots \\y_m\end{array}\right]\right)=[x1,x2,,xm]wTx1wTx2wTxmy1y2ym
=1mXT(Xw−y)=\frac{1}{m} \boldsymbol{X}^{\mathrm{T}}(\boldsymbol{X} \boldsymbol{w}-\boldsymbol{y})=m1XT(Xwy)
令梯度为0,解得:
w^=(XTX)−1XTy \boldsymbol{\hat{w}}=\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{\mathrm{T}} \boldsymbol{y} w^=(XTX)1XTy

w^\boldsymbol{\hat{w}}w^即为使得损失函数(均方误差)最小的w\boldsymbol{w}w。以上求解最优w\boldsymbol{w}w的方法被称为普通最小二乘法(Ordinary Least Squares,OLS)。

import numpy as np


class OLSLinearRession:
    def _ols(self, X, y):
        '''普通最小二乘法估算w'''
        tmp = np.linalg.inv(np.matmul(X.T, X))
        tmp = np.matmul(tmp, X.T)
        w = np.matmul(tmp, y)
        return w


    def _preprocess_data(self, X):
        '''数据预处理:添加x0=1'''
        m, n = X.shape
        X_ = np.ones((m, n + 1))
        X_[:, 1:] = X
        return X_

    def train(self, X, y):
        '''训练模型'''
        X = self._preprocess_data(X)
        self.w = self._ols(X, y)

    def predict(self, X):
        '''预测'''
        X = self._preprocess_data(X)
        y = np.matmul(X, self.w)
        return y
Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐