Ask an AI Accountant

Version 2.0 by

How it works:
  • This AI has ingested US tax law     

  • Every few days, a (human) tax professional reviews all answered questions     

  • So far, the AI has been broadly correct 97% of the time     

  • AI has been fine-tuned on US tax law  

    The base model is OpenAI’s GPT-4, but it has been fine-tuned on US federal and state tax legislation, and IRS announcements.

  • Answers are periodically graded by tax pros  

    Graded answers will show a symbol indicating whether it was correct or incorrect.

  • The AI has 96% accuracy (vs 94% for human tax pros) 

    The AI response accuracy will be continuously updated as more questions get submitted. So far, 129 questions have been graded.

Our server is busy. Please try again later.
I'm here to help with tax-related questions. If you have question about a please feel taxes, ask, and I'll free to do my best to assist you.

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

This answer will be reviewed by a certified tax professional. Once it has been reviewed, you’ll see the          symbol next to it if the AI’s answer was correct and the           symbol if it was wrong.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Previously asked questions:

No items found.
No items found.
No items found.
No items found.
No items found.
No items found.
No items found.
No items found.
No items found.

No results found.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

How should I use this tool?

We trained GPT-4 on the latest tax updates for 2023 and made some important technical updates from V1. (More on those below!)

When will I get an answer?

You’ll see the AI accountant’s answer right away (or… in about 10-40 seconds). It’ll show up right under the window where you typed it in.

Will the answers be fact-checked?

A (human!) CPA, EA, or JD will review responses periodically. (Read about one CPA's experience fact-checking AI responses here!)

We’ll incorporate their feedback — if they had any — into the version of the answer published above.

How do I know which answers have been graded?

When you’re browsing past questions, a symbol next to the answer means a tax professional has already reviewed it. A filled in green checkmark means the AI got it right, while a red "stop" icon means it got it wrong. You'll also see the tax pro's comments on the individual question's page.

Who are our graders?

Below is a list of the graders used in this assessment. They were pre-vetted by the Keeper team for reputability and reliability. “Accuracy” is determined using a blind test — by comparing the answers of each grader to a source-of-truth answer. There is no difference between how the AI is graded and how the humans are graded.

Grader Qualification Accuracy
Ieva Ivanauskas IRS Enrolled Agent 93%
David Bailey IRS Enrolled Agent 96%
Louis Colasurdo CPA 95%
Brent Jackson CPA 90%
Andrew Walsh CPA 94%

Please also keep in mind that the questions being evaluated are designed to be tricky, and that the single Q&A format of this experiment is not entirely representative of the typical way accountants engage with clients.

What guidelines were used for grading?

  1. Overly ambiguous questions are not graded. This means questions that are essentially “unanswerable” without making major assumptions on behalf of the user, e.g. "I have two kids and am making 35k in California; how much will I owe in taxes?"
  2. Overly contentious topics are not graded. This includes topics without clear case law, or where two or more of the five tax professionals disagreed with the rest, e.g. “I’m a basketball referee who needs to be able to run up and down the court with the players. Can I claim my gym membership fee as a tax deduction as an ordinary and necessary business expense?”
  3. Only the accuracy and relevancy of each answer is assessed. Other factors such as tone, or the offering of information that might be helpful but doesn’t directly answer the question asked, was not a factor in grading.
  4. Only questions pertained to US individual tax law are graded. Corporate tax questions and other financial advisory inquiries are excluded from official grading.
  5. Only questions written in English are graded. While the AI can answer questions written in other languages, those responses are not graded due to the language limitations of our human graders.

What technical improvements were made over V1?

  • Comprehensive embeddings encompassing all federal US tax codes, every state tax code, and IRS announcements from the past ten years.
  • A chain-of-reasoning retrieval system devised to identify the most relevant vector embeddings for each question and generate a suitable response based on those embeddings.
  • More rigorous evaluation system with an extended training period that included dozens of human accountants providing their expert grading of answers and establishing correct responses.

How does V2 perform on about standardized accounting tests?

We ran V2 against a sample of questions from Part 1 of the EA exam. It scored 88%, which is a passing grade. (Last year, humans only needed 66% to pass.) You can find the detailed results here.

Note that these questions have less ambiguity and involve fewer contentious topics than the questions on our sample.

What are V2's known weaknesses?

  • Certain city and state tax law intricacies. We haven’t finished embedding all state and city tax law into the V2 yet. These will be part of V3.
  • Complex math. A known issue with LLMs, the AI may fail if asked to do math involving more than five or so steps. This is especially evident when asking the AI to calculate income tax for an individual in a high tax bracket. This will be addressed via a plugin a later version.
  • Overly conservative. Due to its training and reliance on the actual underlying tax code, it can sometimes give answers that are correct but overly conservative, e.g. claiming that a business owner cannot claim per diem rates for business travel on a cruise ship due to specific IRS cruise conventions laws. This is technically correct, but likely to be overlooked in practice.

Disclaimer

We’ve provided this information for educational purposes, and it does not constitute tax, legal, or accounting advice. If you would like a tax expert to clarify it for you, feel free to sign up for Keeper.