kp1197 13 hours ago Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?
Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?