kciwsnurb, 2 months ago The temperature scale, I think. You divide the logit output by the temperature before feeding it to the softmax function. Larger (resp. smaller) temperature results in a higher (resp. lower) entropy distribution.
The temperature scale, I think. You divide the logit output by the temperature before feeding it to the softmax function. Larger (resp. smaller) temperature results in a higher (resp. lower) entropy distribution.