The most widely used task fMRI analyses use parametric methods that depend on a variety of assumptions. While individual aspects of these fMRI models have been evaluated, they have not been evaluated in a comprehensive manner with empirical data. In this work, a total of 2 million random task fMRI group analyses have been performed using resting state fMRI data, to compute empirical familywise error rates for the software packages SPM, FSL and AFNI, as well as a standard non-parametric permutation method. While there is some variation, for a nominal familywise error rate of 5% the parametric statistical methods are shown to be conservative for voxel-wise inference and invalid for cluster-wise inference; in particular, cluster size inference with a cluster defining threshold of p = 0.01 generates familywise error rates up to 60%. We conduct a number of follow up analyses and investigations that suggest the cause of the invalid cluster inferences is spatial auto correlation functions that do not follow the assumed Gaussian shape. By comparison, the non-parametric permutation test, which is based on a small number of assumptions, is found to produce valid results for voxel as well as cluster wise inference. Using real task data, we compare the results between one parametric method and the permutation test, and find stark differences in the conclusions drawn between the two using cluster inference. These findings speak to the need of validating the statistical methods being used in the neuroimaging field.