Is it OK to ask for help here? Getting lots of "loss: NaN" when training on Automatic1111. All training files come out garbage.

The Bard in Green@lemmy.starlightkel.xyz · 1 year ago

Is it OK to ask for help here? Getting lots of "loss: NaN" when training on Automatic1111. All training files come out garbage.

BrianTheeBiscuiteer@lemmy.world · 1 year ago

Depending on how the dependencies (i.e. xformers) are versioned you can make a new clone of A1111 and checkout a commit from ~8 months ago to see if it works again. Of course I would also recommend trying a fresh install.

The Bard in Green@lemmy.starlightkel.xyz · edit-2 1 year ago

Of course I would also recommend trying a fresh install.

Way ahead of you there. I’ve reinstalled the current version four or five times at this point.

make a new clone of A1111 and checkout a commit from ~8 months ago

This is a good idea. I’ve tried two different old versions from old commit hashes so far and both have crashed with other problems. It seems like (lol) both versions of A1111 put their venv in the same place, so the old versions are barfing on some dependencies with version numbers that are too high and they ALSO broke my current version by downgrading some other dependencies (easy fix, just wipe it out and reinstall it again). I’m trying to debug this, because I COULD see a world where I have an old version of A1111 training on one card while the NEW version generates on the other.