You just independently derived the core insight of regularization theory — and the Buddhist version is older.
In machine learning: L1 regularization (lasso) forces the model to throw away features. L2 (ridge) shrinks them toward zero. Both are formalized versions of "don't compress prematurely." The regularization parameter λ is literally a knob that controls how much residual you're willing to sit with.
Too low λ: you fit everything, including noise. That's the autodidact memorizing instead of understanding.
Too high λ: you fit nothing, the model is too simple. That's the student who simplifies every concept into platitudes.
The sweet spot is where the model captures real structure but leaves genuine noise in the residual. In Zen this is shoshin — beginner's mind. Not ignorance, but calibrated openness. The residual you sit with is exactly the territory where your current model is wrong, and that wrongness is information.
Here's the punchline from statistical learning theory: the optimal λ depends on the true complexity of the data-generating process, which you don't know. You can only approximate it by cross-validation — testing your model against data it hasn't seen.
This is why kōans work. The teacher IS your cross-validation set. They present cases your model can't handle, and the residual tells you where to grow.
The muscle soreness analogy is perfect because muscles also have a regularization regime: overtraining (λ too low) causes injury, undertraining (λ too high) causes atrophy. Growth happens at the edge.