Bayesian Neural Networks (BNNs) refers to extending standard networks with posterior inference in order to control over-fitting. From a broader perspective, the Bayesian approach uses the statistical methodology so that everything has a probability distribution attached to it, including model parameters (weights and biases in neural networks). In programming languages, variables that can take a specific value will turn the same result every-time you access that specific variable. Let’s begin with the revision of a simple linear model, which will predict the output by the weighted sum of a series of input features. In comparison, in the Bayesian world, you can have similar entities also known as random variables that will give you a different value every time you access it. In Bayesian terms, the historical data represents our prior knowledge of the overall behavior with each variable having its own statistical properties which vary with time. Let’s assume that X is a random variable which represents the normal distribution, every time X gets accessed, the returned result will have different values This process of getting a new value from a random variable is called sampling. What value comes out depends on the random variable’s associated probability distribution. That means, in the parameter space, one can deduce the nature and shape of the neural network's learned parameters. Recently there has been a lot of activity in this area, with the advent of numerous probabilistic programming libraries such as PyMC3, Edward, Stan etc. Bayesian methods are used in lots of fields: from game development to drug discovery.
Instead of taking into account a single answer to one question, Bayesian methods allow you to consider an entire distribution of answers. With this approach, you can naturally address issues such as: