Cutler & Co Latest news: ChatGPT: Use cases and limits to its reliability

ChatGPT: Use cases and limits to its reliability

Author Liam Bastick

Publisher FM Magazine

Date Published 05.02.2024

Amongst other things, 2023 will be remembered as the year of chatbot AI, particularly ChatGPT. Though many of us are aware of ChatGPT, it’s probably safe to say most of us don’t understand how it works and aren’t using it to its full potential. Let’s try to bring you up to speed. I will consider both of these points and discuss what it means for you, your work, and possibly, your career.

How does it work?

ChatGPT, which stands for Chat Generative Pre-Trained Transformer, is what is referred to as a large language model (LLM). LLMs are a type of neural network, a form of artificial intelligence (AI). Neural networks were created to imitate the human skill of estimating solutions to complex problems and making informed yet subjective evaluations, not only of numerical data but also text and visualisations.

ChatGPT is an incredibly powerful tool that combines the computational ability of a computer with a nuanced, human-like understanding of explicit terms and patterns. ChatGPT always gives its best estimation of any given solution. It isn’t calculating or thinking things through from first principles like a human does; there is no understanding of the how or why, just a complex algorithm spitting out a solution. Naturally, this leads to some glaring issues that require human oversight and many tasks in which user input may make a significant difference.

Limits to its reliability and capability

The first of these glaring issues surrounds ChatGPT’s reliability. Due to ChatGPT’s approach to estimating, it may struggle with questions that are complex or in areas that require more specialised knowledge, such as science and law. Even worse, it has been known to just make things up or “hallucinate”, such as when a lawyer from Levidow, Levidow & Oberman was fined in June 2023 for submitting fake citations after using ChatGPT to help research past court cases for an aviation injury claim.

Mathematics can also present issues. People often wrongly assume that ChatGPT should be good at maths because it is a computer. That is like expecting English professors to be good at physics because they are at a university. ChatGPT is built to process language. When you ask ChatGPT a mathematics problem, it does not calculate it from first principles as we do; instead, it calculates it based on other examples it has seen in relevant training data. Whilst that is great for simple mathematical problems such as addition and multiplication, it will struggle with more complicated questions involving exponentials or trigonometry. With these sorts of questions, the answers are sometimes not even close (see the example of a material error in the screenshot “ChatGPT Material Error Example,” below).

ChatGPT material error example

Whereas the correct answer is:

This means that a guiding human hand is always necessary to check that it will give you the answer you want and provide it with the correct direction. This is where you as a finance professional come in: Your ability to be that expert guiding hand in the use of these tools to their full potential will set you apart.

Prompt engineering

Before I provide a specific example, please allow me to cover a few simple “prompt engineering” concepts. Prompt engineering is the art of creating good prompts (directions or questions) that will allow ChatGPT to generate the most useful, accurate, and directed answers. It does not matter whether ChatGPT can generate a good answer if you have not asked the right question.

A tip here is to provide relevant detail and context. The more information you give ChatGPT, the more reliable and accurate its response. ChatGPT cannot read context well: It cannot “read between the lines”; you need to tell it exactly what you want and how you want it.

The next key concept is known as the “chain of thought prompting”, which involves getting ChatGPT to break problems down into multiple steps. This serves three purposes:

ChatGPT is more reliable at solving many small problems rather than trying to solve one big problem in one go.
This forces ChatGPT into somewhat human-like problem-solving, as it is forced to break down its processes into fundamental steps to solve them.
This simplifies the “auditing” process, as it is easier to find where it went wrong and adjust based on that.

You should not be afraid to have multiple attempts at prompting ChatGPT. There is a level of inbuilt variability within ChatGPT called “temperature” which forces it to provide different outputs every time. If you are not satisfied with an answer, just ask it again. Furthermore, if it has the right idea, you can get it to build upon or elaborate on its previous answers to give it greater direction regarding what you seek. Try using its previous answers to adjust your prompt and ask it a modified question instead.

You should give examples of how you want your answer output whenever possible. This is called “multiple-shot prompting”. It gives ChatGPT a clearer idea of what you seek in an answer instead of generating something from scratch (called “zero-shot prompting”). This is especially useful for any sort of report or content generation that requires consistent style, language, or formatting.

Furthermore, I recommend using role-play. Have ChatGPT assume a role in its output, such as a marketer, a data scientist, or a professor (say). This gives it more context by which to frame its answer to be more specific and tailored. Alternatively, give it an audience: Ask it to explain something at a high school or professional level.

How to use it

Before I get into some practical examples, let me explain how to set up ChatGPT. You must visit the website chat.openai.com (no “www”) and set up an account. Once you are set up and logged in you will see the bar at the top with GPT-3.5 and GPT-4 — this gives you the option to switch between models if you have the paid version. We think the paid version is useful. GPT-3.5 is what is known as unimodal, which means the chatbot understands and interprets text input only. In addition to being more advanced and tending to provide better responses to more nuanced questioning (a subjective analysis, I know), the GPT-4 engine is multimodal. This means the chatbot can understand/process images too, accepting text and image/visual prompts. You should still check answers for their accuracy and veracity, though, just as you should with the free version.

The sidebar contains your chat history so you can go back and check or resume an old chat. Clicking on your email in the bottom left brings up various settings related to privacy, your account, and your interactions. Finally, at the bottom is the dialog where you may interact with the model. To interact with ChatGPT, type “prompts” in the box and either click on the arrow or press ENTER to submit your prompt. ChatGPT will reply, and from there, you can have your conversation. It’s that simple.

Use cases

I’d like to finish by going over some interesting use cases for ChatGPT — more than just your usual tasks of writing emails, trying to emulate VBA code, or asking for advice. I’d like to preface this by acknowledging that the premium version of ChatGPT has far more flexibility, functionality, and utility than the free version. It is a better model and has access to a range of useful plug-ins such as search, data analysis, and image generation. However, for now, I shall focus on the free version as that is the most accessible and arguably the one to review before paying for a more advanced alternative.

For accountants or others working in finance, one key use case is simply getting help with Excel, such as how to interpret formulas, choose functions, highlight key cells, create solutions, and so on. This can be as simple as explaining how a feature works, such as conditional formatting or providing keyboard shortcuts (see the screenshot “Simple Use Case,” below).

Simple use case

However, you may also use it to find functions to solve more complex problems or even use it to develop creative solutions. For example, imagine that I have a sentence with a date somewhere in it, and I wish to extract just the date. This is a tricky question for many Excel users, so let’s simply ask ChatGPT (see the screenshot “More Complex Use Case,” below).

More complex use case

Not only does it give us a solution that works (you’re welcome to test it yourself), but it also explains to us in a simple way how it works. Bear in mind that it came up with this on the second attempt.

Another useful ChatGPT feature is the ability to easily copy and paste Excel cells into it. Whilst you cannot paste thousands of rows Of data into ChatGPT, you can paste a sample and ask ChatGPT for advice on how to deal with it. As an illustration, here I use some antiquated Melbourne house price data — in the screenshot “Example Dataset”.

Example dataset

This dataset is 20,000 rows long, but here, I shall just copy and paste the top ten into ChatGPT and ask it for some ideas with the prompt shown in the screenshot “Copying and Pasting Excel Rows Into ChatGPT,” below.

Copying and pasting Excel rows into ChatGPT

It acts as an assistant analyst, giving you actionable suggestions and explaining the formula to apply them. Obviously, these suggestions are fairly simple, but we could just as easily ask it for more complicated suggestions.

For my final example, I want to show you something a little different (as a word of warning). Let’s get more technical and attempt to build a discounted cash flow (DCF) model in ChatGPT (see the screenshot “Discounted Cash Flow Model in ChatGPT,” below).

Discounted cash flow model in ChatGPT

If you perform these calculations yourself, you will note that there are some immediate issues:

ChatGPT is starting with $1 million in year 1, which I clearly stated was last year.
The terminal value calculation is wrong. The formula is correct, but it miscalculated the value.
After year 1, all of the discounted values are incorrect.

This example illustrates a core ChatGPT issue. This was not a complicated question: The computations were simple, yet it still could not get it correct. Imagine if this DCF was multistage or if we needed it to build more fleshed-out financial statements? If you were building a DCF model and needed assistance on how to do a segment, it could give you useful feedback, but asking it to do these things itself is perhaps a step too far for the time being. This is where it is important to distinguish between what ChatGPT can do and where you as a user need to step in.

Word to the wise

ChatGPT has flaws requiring human intervention and oversight. However, it is a fantastic tool when used effectively. As the world becomes more familiar with AI, your ability to use these tools and be that guiding hand will be a key skill in whatever technical role you might be in.

If you ensure you are using it to its full capabilities and understand its limitations, you can add value as a user, perform your work more efficiently and effectively, and develop a positive reputation in AI — which can only enhance your career prospects.