Dunning Kruger is no excuse.
<blockquote>projecting future road use, capacity for public services such as schools, parks, libraries, demographic projections - you know, the things that are the whole purpose of having a census in the first place.</blockquote>
I'll preface this by saying that the next paragraph is not just pure 'skiting': it's a potted way of establishing that I'm qualified to make declarative assertions about data accuracy and quality, and the extent to which any attempt to create 'noiseless' data will help in formulating 'accurate' projections.
I've done a bunch of stuff on projects that did projections for housing demand (and supply of residential and industrial land), demography (regional migration; catchments for major retailers; proximity models for house prices near proposed railway stations; changes in household composition by small-area aggregates). Geospatial analysis is one of the things I understand reasonably well - up to and including analysing annual changes by individual cadastre parcel for the 31 Melbourne metro LGAs for 2004-2012, and the 5 Geelong-area LGAs for 2006-2015. My 'strongest suit', though, is the statistical analysis of data. My 'formal' training - Honours, Masters and PhD (incomplete) - was in Economics and Econometrics. I won the ABS prize in my Honours year, and got an RBA cadetship (one of only 4 offered in the entire country) and the Vice-Chancellor's undergraduate research award (the only student in the faculty who got one). I got straight Firsts for my Masters coursework subjects. One of the papers I co-authored resulted in Treasury asking our team to help them implement rational expectations in their macroeconometric model (TRYM). My PhD dissertation spent several sections demonstrating how using central-tendency measures as 'exogenous' inputs to a non-linear model was a waste of time.
So with that by way of background... let's get to the idea that using the census data gives a better estimate of forward numbers, than a standard exponential curve with completely artificial noise (of the form x[t]=x[t-1]+e[t] where x[t] is the log of the variable of interest X at time t, and e[t] is a lognormal random variate).
In other words, the key question is
<blockquote>how much additional accuracy in projections would be obtained by using 'accurate' census data, versus modelling percentage changes in literally any metric of interest by dlog(X)=e where e is a vector of lognormal variates?</blockquote>
Award yourself dix points if you realised that I was sneaking up on the idea that the correct answer is "None. There is literally zero reduction in forecast MAPE from using historical survey data, over a Monte Carlo simulation using 'sensible' estimates for the conditioning parameters for the distribution of e.".
Award yourself another soixant points if you understand what variables cause the correct answer to be the case. Those variables are technological and preference changes and policy variables. Future values of these variables are literally impossible to estimate at an aggregate level, and even more impossible-er at a sectoral level... and they are not geographically constant (so Frankston and Brisbane will not have common tech change, preference and policy parameters in a regional model).
If you have snaffled all the points on offer up to now, you are barely at 'HIIB' level, which means that I would not listen to you if you were a government advisor (most government advisors are IIA's, but that's still a very low bar).
Another dix points will get you an HI (but only in one subject). These can be garnered by grokking the footnote.
Footnote... This is also true in a linear model, because linear models are not bijective from the exogenous variable space to any subset of the endogenous variables; policy analysis is only ever interested in 'key' subsets of the entire endogenous variable matrix.
To see why this non-bijectivity is the case a fortiori, change the closure (swap the endogenous variables-of-interest for the same number of 'naturally' exogenous variables - so the system remains mathematically solvable).
Force the swapped endo-vars to remain unchanged, then perturb the rest of the exogenous variables by some arbitrary percentage and solve the model. Do that several hundred times, and you will have several hundred sets of all variables where the endogenous variables of interest take the same value, but the exogenous variables are different. Bijectivity... categorically rejected.
Congratulations... you just proved that there are multiple vectors of values for the 'exo-vars' that are consistent with the same vector of values for the endo-vars of interest. (This is why I stopped being interested in 'point' (or 'single-path') forecasting: to say anything meaningful about the statistical properties of the endogenous variables of interest, requires a stochastic sensitivity analysis).
Award yourself the last dix points and join the Firsts. You still need a further douze points to finish next to me in 4th year. (OK, so that last bit was pure skiting).