It is not the amounts of data that matter, it is the labelling
Copious amounts of data are easy to get, rather harder to turn into information. In order to train most AI or ML systems you need copious data with a reliable ground truth. The latter is very, very hard to come by, and requires lots of very, very careful, and usually dull work in labelling data items as belonging to different classes. If your ground truth on which you train you method is suspect, you will end up with over-fitting problems, because the ML/AI method with faithfully try to reproduce erroneous human decisions. For deep learning methods like convolutional neural networks (CNNs) to yield their (often impressive) results, you need hundreds of thousands, or preferably millions of accurately labelled data items. CNNs have been around fr quite a long time, but only the advent of large, labelled databases of images and the like made the methodology take off. labelling hundreds of thousands of data items automatically would be ideal, but isn't always possible. Usually some poor sods has to do lots and lots of unglamorous work.
Apart from these problems (which are daunting enough), there is the problem of all the parameter choices (learning rates, numbers and type of layers in deep networks, etc) to get right