Open links in new tab
  1. optimization - Batch gradient descent versus stochastic gradient ...

    Additionally, batch gradient descent, given an annealed learning rate, will eventually find the minimum located in it's basin of attraction. Stochastic gradient descent (SGD) computes the gradient using a …

  2. How can Stochastic Gradient Descent (SGD) avoid the problem of local ...

    Jul 26, 2024 · The path of stochastic gradient descent wanders over more places, and thus is more likely to "jump out" of a local minimum, and find a global minimum (Note*). However, stochastic gradient …

  3. Why use gradient descent with neural networks?

    Nov 14, 2015 · When training a neural network using the back-propagation algorithm, the gradient descent method is used to determine the weight updates. My question is: Rather than using gradient …

  4. machine learning - why gradient descent when we can solve linear ...

    Aug 12, 2013 · what is the benefit of using Gradient Descent in the linear regression space? looks like the we can solve the problem (finding theta0-n that minimum the cost func) with analytical method so …

  5. machine learning - Gradient Descent vs Stochastic Gradient Descent ...

    Mar 1, 2016 · The runtime is of course too long. Is the algorithm I ran named Gradient Descent? I read that for large datasets, using Stochastic Gradient Descent can improve the runtime dramatically. …

  6. What is the difference between Gradient Descent and Newton's …

    82 I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the …

  7. Adam Optimizer vs Gradient Descent - Stack Overflow

    Aug 25, 2018 · AdamOptimizer is using the Adam Optimizer to update the learning rate. Its is an adaptive method compared to the gradient descent which maintains a single learning rate for all …

  8. gradient descent using python and numpy - Stack Overflow

    Jul 22, 2013 · Below you can find my implementation of gradient descent for linear regression problem. At first, you calculate gradient like X.T * (X * w - y) / N and update your current theta with this …

  9. How to define the termination condition for gradient descent?

    Actually, I wanted to ask you how can I define the terminating condition for gradient descent. Can I stop it based upon the number of iterations, i.e. considering parameter values for, say, 100

  10. Gradient Descent with constraints (lagrange multipliers)

    Since the gradient descent algorithm is designed to find local minima, it fails to converge when you give it a problem with constraints. There are typically three solutions: Use a numerical method which is …