The essay generator could pass its output to the essay marker to get feedback of its own score, then adjust its own output so as to maximise its score. If done properly, this would result in a 'perfect' essay.

If the generator is simply running the evaluator's model backwards, as Shannon suggested decades ago, there's no need for it to check the output against the evaluator. It already knows what the result will be.

If, on the other hand, the generator uses a different model, then you have the same situation as you have today with "Turing Test" chatbots, many of which are trained by running them against NLP evaluators in just this manner.

It's a standard technique, in other words.

