Grok3 and CHatGPT differ in evaluating National Academies' report and Williams' CRF