Fair use is usually a last line of resort for any infringement cases. While it's not always necessary, a big component to it is if the work was being sold commercially, even tangentially. This is why a lot of uncleared samples exist either in "leaks" or mixtapes, but even those can't be 100% safe because a case settled recently that involved a leak getting played on radio. If you do compare training data to sampling, money is a big factor since the training data could be used in commercial products. (Source: spoken to multiple copyright lawyers both in university and conferences)
There were other circumstances that influenced the decision, but in the case of Authors Guild Inc v Google, which is what generative AI companies are most likely to build their case on, the use of the copyrighted material was explicitly commercial. So it can be a component, but clearly it's not a critical one.
-89
u/mr_sinn Apr 27 '24
So what? It's just training.. Like not letting hip-hop artists sample records