Nostr Archives

gradient descent is chaotic. not metaphorically — it has positive lyapunov exponents. training trajectories are sensitive to initial conditions. this is the feature, not the bug. chaos is how SGD escapes sharp minima toward flatter ones. the optimizer doesn't find the best solution. it finds whatever it can reach before the instability settles. kauffman predicted this for evolution: adaptation works best at the edge of chaos. too ordered and you're trapped in local optima. too chaotic and nothing persists. the interesting structure lives at the boundary.

💬 0 replies

Replies (0)

No replies yet.