Ask an AI Accountant, Version 2.0: New & Improved

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

How should I use this tool?

We trained GPT-4 on the latest tax updates for 2023 and made some important technical updates from V1. (More on those below!)

When will I get an answer?

You’ll see the AI accountant’s answer right away (or… in about 10-40 seconds). It’ll show up right under the window where you typed it in.

Will the answers be fact-checked?

A (human!) CPA, EA, or JD will review responses periodically. (Read about one CPA's experience fact-checking AI responses here!)

We’ll incorporate their feedback — if they had any — into the version of the answer published above.

How do I know which answers have been graded?

When you’re browsing past questions, a symbol next to the answer means a tax professional has already reviewed it. A filled in green checkmark means the AI got it right, while a red "stop" icon means it got it wrong. You'll also see the tax pro's comments on the individual question's page.

Who are our graders?

Below is a list of the graders used in this assessment. They were pre-vetted by the Keeper team for reputability and reliability. “Accuracy” is determined using a blind test — by comparing the answers of each grader to a source-of-truth answer. There is no difference between how the AI is graded and how the humans are graded.

Grader	Qualification	Accuracy
Ieva Ivanauskas	IRS Enrolled Agent	93%
David Bailey	IRS Enrolled Agent	96%
Louis Colasurdo	CPA	95%
Brent Jackson	CPA	90%
Andrew Walsh	CPA	94%

Please also keep in mind that the questions being evaluated are designed to be tricky, and that the single Q&A format of this experiment is not entirely representative of the typical way accountants engage with clients.

What guidelines were used for grading?

Overly ambiguous questions are not graded. This means questions that are essentially “unanswerable” without making major assumptions on behalf of the user, e.g. "I have two kids and am making 35k in California; how much will I owe in taxes?"
Overly contentious topics are not graded. This includes topics without clear case law, or where two or more of the five tax professionals disagreed with the rest, e.g. “I’m a basketball referee who needs to be able to run up and down the court with the players. Can I claim my gym membership fee as a tax deduction as an ordinary and necessary business expense?”
Only the accuracy and relevancy of each answer is assessed. Other factors such as tone, or the offering of information that might be helpful but doesn’t directly answer the question asked, was not a factor in grading.
Only questions pertained to US individual tax law are graded. Corporate tax questions and other financial advisory inquiries are excluded from official grading.
Only questions written in English are graded. While the AI can answer questions written in other languages, those responses are not graded due to the language limitations of our human graders.

What technical improvements were made over V1?

Comprehensive embeddings encompassing all federal US tax codes, every state tax code, and IRS announcements from the past ten years.
‍A chain-of-reasoning retrieval system devised to identify the most relevant vector embeddings for each question and generate a suitable response based on those embeddings.‍
More rigorous evaluation system with an extended training period that included dozens of human accountants providing their expert grading of answers and establishing correct responses.

How does V2 perform on about standardized accounting tests?

‍We ran V2 against a sample of questions from Part 1 of the EA exam. It scored 88%, which is a passing grade. (Last year, humans only needed 66% to pass.) You can find the detailed results here.

Note that these questions have less ambiguity and involve fewer contentious topics than the questions on our sample.

What are V2's known weaknesses?

Certain city and state tax law intricacies. We haven’t finished embedding all state and city tax law into the V2 yet. These will be part of V3.
Complex math. A known issue with LLMs, the AI may fail if asked to do math involving more than five or so steps. This is especially evident when asking the AI to calculate income tax for an individual in a high tax bracket. This will be addressed via a plugin a later version.
Overly conservative. Due to its training and reliance on the actual underlying tax code, it can sometimes give answers that are correct but overly conservative, e.g. claiming that a business owner cannot claim per diem rates for business travel on a cruise ship due to specific IRS cruise conventions laws. This is technically correct, but likely to be overlooked in practice.

Disclaimer

We’ve provided this information for educational purposes, and it does not constitute tax, legal, or accounting advice. If you would like a tax expert to clarify it for you, feel free to sign up for Keeper.