Re: Total light?
You are right in that image quality is ultimately limited by the total number of photos detected, but that's over the whole image area (and, for a common output resolution, that's per-pixel). In principle that's purely a function of the lens alone. A smaller sensor requires a proportionately shorter focal length in order to get the same field of view. However, to collect the same number of photons, it will need an aperture proportionately wider. Take the example of a 35mm so-called "full frame" sensor 24x36mm and imagine you mount a 50mm lens with an aperture of f4. Now imagine a sensor of half the dimension, 12 x 18mm (not a usual sensor size, but it makes the arithmetic easier). You will now need a 25mm lens to get the same field of view, and to collect the same total number of photons in a given exposure time, it will now have to be f2 (and get the same depth of field characteristics). This is all part of what's called "the principle of equivalence". As the f-stop is simply the focal length divided by the aperture diameter, then you can see the physical diameter of the aperture will be exactly the same in both cases. As the maximum (physical) diameter of the aperture is the primary factor that dictates the lens diameter, you can see that for the same light gathering power the two lenses will (broadly) be similar diameter (although not length).
So the question might be asked, why do we need large sensors, if we can just use smaller sensors with wider lenses. Leaving aside the issue that lenses with very small f-stops become increasingly difficult and expensive to design (only partly ameliorated by the smaller image circle), there is a major sensor limitation. That is the ability of a sensor to detect photons before saturating. Broadly speaking, a sensor with 4 times the surface area can detect 4 times the number of photons before saturating (or blowing highlights). Note that this is not just sensors it applies to, but also film. Slide film, especially, "blows" highlights and to collect more light in total, you need bigger films.
Of course there is another issue, that for any given output resolution, the smaller sensor will have to have smaller photosites (clearly half the dimensions in this case) and that, in turn, means the 25mm lens would have to be able to resolve twice as well.
As the ultimate dynamic range of the sensor is defined by the ratio between the saturation level and what's called the "noise floor", there is an advantage to the larger sensor. It has the potential for four times the number of detected photons before saturation which means, all other things being equal, it can achieve a couple more EV of dynamic range.
There's a lot more to it than that of course but, essentially, the reason "big is better" just comes down to that ability to detect more photons by dint of the greater surface area.