Oh the Irony of the Irony
You are saying that the author is wrong and you are saying that the figure 4.3% DOESN'T represent the probability of this event occurring by chance.
I think you are incorrect and that you are slightly misrepresenting what the ASA said. In the interests of accuracy, what it actually said was:
“2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.”
It seems to me that the author was using the P value to make a statement about data in relation to a specific hypothesis (that the original population had a 1:1 ratio of males to females).
But the good news is that we don’t have to argue about it. Indeed, doing so is pointless. Why on Earth argue when we can actually test it and find out who is correct? We can actually measure the probability that we will find 2,262 or more of one gender in a random sample of 4,390 taken from a balanced (1:1) population. Then we can compare it to the 4.3% given by the Chi Squared test.
How? We can use a Monte Carlo simulation. We can set up a population with equal numbers of males and females. We can then take a sample of 4,390 people at random from that population and count the number of females that we get. We can see if the number of females in our sample is equal to, or greater, than the one observed (2,262). We can then repeat that sampling over and over again (say, 1,000,000 times) and see how often, on average, we get a deviation as large (or larger) than 2,262.
Monte Carlo simulations do not guarantee to give exactly the right answer but, if you run them often enough, they should give an answer that is close to reality. The more often you run them, the closer (on average) they should get.
This is really simple to do. You can write your own code and test it for yourself. Or you can use mine (which is in R) and check it out. If I have made any errors, I apologize. Please do correct them and try it for yourself. When I ran this code the answer approximated to around 4.47%. This suggests to me that the author was correct in his statement. (Bear in mind that Chi Squared never promises to give an EXACT figure, all probabilities are estimations.)
PReached=0
startingset=c(1,0)
People=4390
Runs=1000000
X = 1:Runs
for(i in X) {
for ( i in People ) {
Foo=sample(startingset,People, replace=TRUE)
}
Baa=sum(Foo)
if (Baa > 2261) {
PReached = PReached+1
}
}
PReached=PReached*2 # to allow for both ends of the curve
Prob=PReached/Runs*100
Prob