AI / Machine Learning
-
September 13, 2023

How To Query Databases Using Natural Language and Chat-GPT

Welcome to the world of seamless database management! Querying databases can often be a formidable task, especially when crafting SQL queries that perfectly align with your system's needs. Developers frequently encounter challenges, from struggling to recall the intricate web of tables and relationships to grappling with the complexity of coding elaborate queries. We’ve created this tool to make database queries in natural language and the purpose  is to enhance this consulting process. If you’re not a developer and you need to consult a database or if you’ve worked with them just make this process a little bit easy. With a simple text input entry you’ll get the data you’re looking for and this tool can display it in a friendly way.In the next sections we’re going to cover how the process of making this proof of concept was. We’ve built a NodeJS app but it could be any other language you want.

Proposed Solution

The developed project is a node.js-based web application that integrates with OpenAI API to facilitate query creation. Users can simply enter their desired query in natural language through the web app's interface. The AI component of the application will then analyze the user's input, attempt to derive the appropriate SQL query, and execute it against the database. The results of the query are then displayed to the user using HTML generated by the AI.

Considering the limitations of the Open AI Chat Completion API and trying to have a simple and useful tool to query the database, we define the following flow:  

First the user enters the desired query using natural language, something like: “Give me the 10 more played songs of June 2023”.

Behind the scenes, the system obtains the list of all the existing table names in the database.

Using that list of names and the query in natural language, the system sends a message to OpenAI API, one for giving the API information and context:

CODE: https://gist.github.com/nicmalegre/0d00eb9884826732aee67b8d37bc6ed6.js?file=firstPrompt.js

Then the API returns the list of table names and the system parses the response to obtain that list.

With that list, the system gets the definition of each table, that is: for each table, ask the database for each contained field and its data type.

Now the system sends another pair of messages to OpenAI:

CODE: https://gist.github.com/nicmalegre/27a3751649982c8eef7a7dad41a8a6e3.js?file=secondPrompt.js

Then the OpenAI API answers with the SQL query.

The system executes the query and sends the result back to the API with the following prompt:

CODE: https://gist.github.com/nicmalegre/9ec95640d37f7c4bd99e7fe1599107be.js?file=thirdPrompt.js

And voila! The Chat-GPT API gives us a response with a styled chart with the data we need to display in our web app.

Pros and Cons

This tool offers a range of advantages for users seeking to query databases using natural language:

  • Ability to execute both simple and complex queries with ease. Users can simply provide their desired query in natural language, eliminating the need for intricate SQL syntax. This streamlined approach makes the querying process more accessible and easy for anyone.
  • User-friendly presentation of data. The program ensures that the queried data is displayed in a clear and easily understandable format. This intuitive presentation enables users to quickly interpret and analyze the results, empowering them to make informed decisions and derive meaningful insights from the data.
  • The program provides flexibility in terms of database connectivity. Users have the option to connect to different databases, allowing for parameterization of the database. While the current implementation works with a specific database, the system's design enables seamless integration with other databases as needed. This adaptability enhances versatility and usability, catering to diverse database requirements.

Overall, these advantages make the program a powerful tool for natural language querying, simplifying the process, improving data comprehension, and accommodating various database systems.

While the tool created for natural language querying offers valuable functionality, there are some limitations to consider:

  • The openAI API has restrictions on the amount of data that can be received and sent. This means that if the AI component requires a substantial amount of table definitions to generate the query, the request may be rejected due to exceeding the data limit.
  • The openAI API does not create a session, requiring the system to send a list of messages each time it needs to ask a question. This lack of session management can be inconvenient and hinders the retention of historical data that may be relevant for generating accurate results.
  • The openAI API's response time can be a drawback. Each query involves interacting three times and the API's response time, which can take several seconds, can accumulate and result in a process that takes minutes to complete. This slower response time negatively impacts usability.

These limitations highlight the need for careful consideration when using the tool, particularly regarding data limits, session management, and response time. Addressing these challenges can lead to a more efficient and user-friendly natural language querying experience.

Advantages vs Disadvantages

Next Steps

This is the beginning but we want to continue working to improve this tool.The main aspects that we will focus on are:

  • Database security. In order to secure the data from the database, the idea is to be able to execute read-only queries and not allow other types of queries.
  • Database flexibility. Modify the process to abstract from the database type and use both SQL and noSQL.
  • Decreased response time. Improve the time to show a response by modifying how we interact with OpenAI API.
  • Limit the number of results. Add a control on the number of results, since with very large databases the process may fail.
  • Query history. Save the generated queries to run them again in an easy way enabling convenient reusability without the need for AI regeneration.

Summary

The information presented in this blog shows an interesting integration with the Chat Completion API. Our database consulting tool offers a user-friendly solution for retrieving data without the need for extensive coding knowledge. It opens doors for non-developers and simplifies the process for experienced professionals, making database management a breeze.

There are many other ways you can extend the API depending on your specific requirements and use cases. We are dedicated to continuous improvement and innovation. We are committed to refining and expanding our tool's capabilities to meet the evolving needs of our users. Your feedback and suggestions are invaluable to us as we strive to enhance the user experience, optimize performance, and introduce new features.

Thank you for joining us on this journey, and we look forward to the exciting developments that lie ahead!

* Authors: Nicolas Alegre & Ramiro Gavagnin