In a linear layer every input is connected to every output via a matrix multiplication and vector addition. Every connection is weighted via the matrix multiplication. The added vector is called a bias and is broadcasted during addition. Sometimes these layers are called dense or fully connected layers. They form the basis of a multi-layer-perceptron.
Given an input x of shape batch size x input features, a weight matrix W of shape input features x output features and bias vector b of shape output features. The weight matrix and bias vector will be broadcasted to match the batch size.
y_{ij} = \sum_{k=1}^{n} x_{ik}W_{kj} + b_{k}
This layer is based on the base layer class. And it complies to the learning layer protocol, enabling it to adjust it's weights and biases using optimizers. It has further a forward and backward method and it is callable.
The construction arguments are:
- inputSize: number of input features, must be an integer
- outputSize: number of output features, must be an integer
- weights: (optional), the weight matrix one wants to assign, if 'None' then it will be generated uniformly, should be a Numpy array or ArrayLike
- bias: (optional), the bias vector one wants to assign, if 'None', then it will be generated uniformly, should be a Numpy arry or ArrayLike. One can set so 'False' if one doesn't want a bias.
The forward method takes:
- input: a Numpy Array or ArrayLike of shape (batchSize, inputSize)
- it returns a Numpy Array of shape (batchSize, outputSize)
The backward method takes:
- gradient: a Numpy Array or ArrayLike of shape (batchSize, outputSize)
- it returns a Numpy Array of shape (batchSize, inputSize)
The class can be constructed like this:
linear = Linear(5, 5)
Afterwards we can use it like this:
output = linear(input)
Given the input is of shape (batchSize, 5)
Example usage:
from machineLearning import nn
features = 5
batchSize = 64
linear = nn.Linear(features, features)
input = np.random.random(batchSize, features)
output = linear(input)
The source code can be found here