>>65623991I mean, It's "direct preference optimization" but it also depends whose preference it is. It's a nice training method but it needs to adhere to the preference of an individual or group that cares about the same things as you do and I don't expect hired randos to share the same values with us. (prompt adherence vs stylization, illustrated vs realism, etc)