i implementing neural network classification problem. tried propagation takes longer converge. though of using rprop. in test setup rprop works fine , gate simulation never converges or , xor gate simulation.
- how , when should update bias rprop?
- here weight update logic:
for(int l_index = 1; l_index < _total_layers; l_index++){ layer* curr_layer = get_layer_at(l_index);
//iterate through each neuron (unsigned int n_index = 0; n_index < curr_layer->get_number_of_neurons(); n_index++) { neuron* jth_neuron = curr_layer->get_neuron_at(n_index); double change = jth_neuron->get_change(); double curr_gradient = jth_neuron->get_gradient(); double last_gradient = jth_neuron->get_last_gradient(); int grad_sign = sign(curr_gradient * last_gradient); //iterate through each weight of neuron for(int w_index = 0; w_index < jth_neuron->get_number_of_weights(); w_index++){ double current_weight = jth_neuron->give_weight_at(w_index); double last_update_value = jth_neuron->give_update_value_at(w_index); double new_update_value = last_update_value; if(grad_sign > 0){ new_update_value = min(last_update_value*1.2, 50.0); change = sign(curr_gradient) * new_update_value; }else if(grad_sign < 0){ new_update_value = max(last_update_value*0.5, 1e-6); change = -change; curr_gradient = 0.0; }else if(grad_sign == 0){ change = sign(curr_gradient) * new_update_value; } //update neuron values jth_neuron->set_change(change); jth_neuron->update_weight_at((current_weight + change), w_index); jth_neuron->set_last_gradient(curr_gradient); jth_neuron->update_update_value_at(new_update_value, w_index); double current_bias = jth_neuron->get_bias(); jth_neuron->set_bias(current_bias + _learning_rate * jth_neuron->get_delta()); } } }
in principal don't treat bias differently before when did backpropagation. it's learning_rate * delta
seem doing.
one source of error may sign of weight change depends on how calculate error. there's different conventions , (t_i-y_i)
instead of (y_i - t_i)
should result in returning (new_update_value * sgn(grad))
instead of -(new_update_value * sign(grad))
try switching sign. i'm unsure how implemented since lot not shown here. here's snippet of mine in java implementation might of help:
// gradient didn't change sign: if(weight.previouserrorgradient * errorgradient > 0) weight.lastupdatevalue = math.min(weight.lastupdatevalue * step_pos, update_max); // changed sign: else if(weight.previouserrorgradient * errorgradient < 0) { weight.lastupdatevalue = math.max(weight.lastupdatevalue * step_neg, update_min); } else weight.lastupdatevalue = weight.lastupdatevalue; // no change // depending on language, should check nan here. // multiply -1 depending on error signal's sign: return ( weight.lastupdatevalue * math.signum(errorgradient) );
also, keep in mind 50.0, 1e-6 , 0.5, 1.2 empirically gathered values might need adjusted. should print out gradients , weight changes see if there's weird going on (e.g. exploding gradients->nan although you're testing and/xor). last_gradient
value should initialized 0
@ first timestep.
Comments
Post a Comment