Woah, this is a really good point. That brings up an interesting question: can Reddit stipulate in its TOS that a search engine can crawl the site for the sake of engine indexing, but not train its models on it? I don't see why they can't add such language.