Why? 'We agree that inappropriate images should not be in the data set'
There's nothing wrong with 'inappropriate images' being in a data set. It's not too much of a stretch to say that pr0n could be needed for some AI purposes (yes, yes, might have an interesting time getting funding for that).
What *is* wrong is stealing the copyrighted material. pr0n, just like flikr and the other sites, is there for you to appreciate and usually push you towards buying a product. It is not placed there for use elsewhere.