In an age marked by swift technological progress, a revolutionary power has arisen that possesses the capacity to fundamentally alter industries, creativity, and problem-solving as we currently know it as Generative Artificial Intelligence (Gen AI). In a world where innovation and automation are increasingly crucial, understanding the nuances and implications of generative AI is no longer a specialized matter but rather a strategic necessity.
Enterprises are looking to confidently deliver products and services quickly anytime, anywhere, and the exponential rise of AI is making that a reality. To achieve this, multiple organizations are currently experiencing a noteworthy transformation as they proactively explore the creation of their own bespoke large language models (LLMs) to meet the growing need for enhanced control, data privacy, and cost-efficient solutions.
Large Language Models (LLMs) are a fundamental algorithm in Gen AI that has transformed the artificial intelligence (AI) development landscape, providing developers with unparalleled capabilities in a fraction of the time previously required. Despite the widespread excitement surrounding LLMs, their full potential is frequently underestimated due to various factors. This article delves into the challenges and limitations of LLMs and how Infosys, in collaboration with Google, is striving to overcome these obstacles to assist enterprises in building a truly AI-powered organization.
Large Learning Models – Mimicking Human Intelligence
Sophisticated artificial intelligence models, known as large language models, are primarily designed to process and generate text that closely resembles human language. These models possess the ability to comprehend language structures, grammar, context, and semantic linkages, owing to their extensive training on vast amounts of textual data.
Large language models utilize deep learning methodologies, such as transformer architectures, to identify statistical correlations and patterns within textual data. This information is then utilized to generate text that closely emulates human-written content, exhibiting cohesiveness and contextual relevance.
In recent years, the market for large language models has experienced significant expansion. According to industry reports, the AI market is projected to exhibit substantial growth, increasing from USD 11.3 billion in 2023 to USD 51.8 billion by 2028. This growth is attributed to a compound annual growth rate (CAGR) of 35.6%. The surge in demand for language-based applications such as virtual assistants, chatbots, content generation, and translation services is driving this trend.
The increasing demand for advanced natural language processing (NLP) capabilities is a significant driver of growth in the Large Language Model (LLM) industry. LLMs, such as GPT-3, have revolutionized the NLP sector by exhibiting remarkable language production, comprehension, and contextual understanding abilities. The demand for LLMs is fueled by the growing need for complex NLP applications, including sentiment analysis, virtual assistants, content creation, chatbots, and translation services.
Additionally, the LLM industry is significantly impacted by the exponential proliferation of data and content on the internet. Enterprises are actively seeking efficient methods to process, analyze, and extract insights from the vast amount of text-based data generated daily. Leveraging LLMs, enterprises can harness this data for various purposes, such as market research, customer insights, content curation, and data analysis. LLMs possess robust text analysis and language comprehension capabilities, making them an asset in this regard.
Challenges of Integrating LLMs into Enterprise Applications
Although Large Language Models (LLMs) are capable of emulating human language, human intelligence encompasses more than just linguistic abilities. It encompasses unconscious perception and skills that are shaped by experiences and comprises an understanding of how the world operates. While utilizing LLMs can provide several benefits, there are certain challenges and limitations (mentioned below) that must be addressed when integrating them with applications.
- Contextual Memory and Context Limitations: The challenge of Contextual Memory and Context Limitations is widely recognized as one of the most prevalent obstacles in the field. In the case of LLM input, the context is often limited, and multiple applications require the ability to support a greater number of tokens. OpenAI previously addressed this issue by releasing context support for 16K tokens, and GPT-4 has since increased the context limitation to 32K, which equates to several pages of text. However, there are many instances where a greater context is necessary, particularly when working with numerous documents, each containing tens of pages. For example, a legal-tech company may require the processing of tens of legal documents to extract answers using LLM.
- Testing and Fine-tuning: Achieving a satisfactory level of accuracy with the LLM requires extensive testing, often involving prompt engineering and trial and error, as well as fine-tuning based on user feedback. While there are tests that run as part of the CI to ensure proper integration, the real challenge lies in developing a testing method that allows the testing engineer to modify templates, incorporate necessary data, and execute the prompt with the LLM to verify desired outcomes. Additionally, continuous fine-tuning of the LLM model is necessary through user feedback.
- Data Privacy and Security Concerns: One of the most significant challenges concerning LLMs pertains to data privacy and security. These models are trained on extensive amounts of public data. For an LLM to be useful in an enterprise setting, it must be retrained on sensitive information, including personal data, financial information, and confidential business information. To address these concerns, enterprises must ensure that their data is sufficiently secured and that the models are not accessing or utilizing sensitive information without authorization.
- Bias and Hallucination: Large Language Models (LLMs) may exhibit bias as a result of the data they are trained on, leading to erroneous outcomes. Additionally, AI hallucination can occur when an LLM produces an imprecise response not grounded in the trained data. Despite the considerable advantages that LLMs offer businesses, several obstacles must be addressed to effectively implement them. Enterprises must guarantee that their models are trained on impartial data and that the anticipated results from the LLM are validated against actual enterprise data.
- Integration with Existing Systems: One of the challenges that enterprises encounter with LLMs pertains to their integration with pre-existing systems. LLMs have the potential to generate abundant amounts of data, which can pose difficulties in terms of management and integration with existing systems. Enterprises must ensure that the data produced by these models is appropriately stored and managed and can be seamlessly accessed and integrated with pre-existing systems, including databases and analytics platforms.
In addition to this, there are several other challenges which include development and operational cost, glitch tokens, data enrichment, and more.
Below are some of the solutions which Infosys is building, along with Google, to address a few of the above challenges.
Solution 1: Social Media Insights Solution
With the social media insights solution, Infosys is working towards addressing the lack of personalized contextual insights. At present, there exist numerous solutions in the market that possess the ability to perform various tasks such as real-time monitoring, targeted sentiment gathering, identification of top influencers, utilization of hashtags, and tracking of engagements. However, these solutions primarily offer descriptive statistical analysis. A lot of custom development is needed to make those contextualized in an enterprise context.
In collaboration with Google, we are currently engaged in the development of a Gen AI-powered prediction model known as the “Social Media Insights Solution”. This innovative solution is being constructed through the utilization of GCP Cloud Services and has been designed as a cloud-native solution, leveraging key services such as cloud functions, GCP LLM services, and database technology. The below diagram provides a high-level overview of the technical architecture of this cutting-edge solution.
To address the high speed of inferencing and recommendations, the model leverages cloud functions which also makes the solution scalable to handle a high volume of social feeds. To ensure data privacy, the model is hosted within the customer’s VPC.
By leveraging cloud-hosted LLMs, we can achieve cost-effectiveness and ease of maintenance and upgrades, with minimal configuration changes required as the models evolve.
We are utilizing the fine-tuned Large Language Model (PLAM2) for the following applications:
- Sentiment Analysis
- Prioritization of the post
- Type of posts like user query, user request, suggestion, issue, etc.
- Category of the post
- Key attribute extractions like customer ID, meter ID, location, etc.
Enriched data can be seamlessly integrated with the helpdesk to facilitate manual intervention and ticketing workflow. Additionally, the solution offers basic resource prediction capabilities to support social media-based operations. This functionality can be further enhanced by leveraging shift planning, resource skills, and enterprise-specific issues. The prediction model is developed using a multivariate machine learning model.
Prior to fine-tuning, we experimented with zero-shot and few-shot prompting using LLM for the above enrichment. Similarly, we explored the univariate model for predictive analysis scenarios. However, these approaches did not yield the desired quality of output as compared to the currently implemented approach.
After conducting a thorough analysis, we have determined that implementing this approach in comparison to traditional hosting would yield a substantial reduction in direct costs. Additionally, this approach would result in an increase in reputation, continued business opportunities, and improved customer satisfaction due to reduced response times.
The enhanced state of technology allows us to leverage Cloud native components and LLMs with the least hassles of hosting, configuring, heavy CapEx, SME knowledge, and ongoing upgrade cycles and supports the other evolving innovation ecosystem. The above solution would help in enhancing client satisfaction/NPS, give faster responses to concerns, and cause the least interruption to the existing post-ticket-creation workflow. Even though the reference demonstration is for Twitter posts, it can be expanded to connect to any other social media platform through readily available adapters to make it more generic. It can also be enhanced to include elements of Responsible AI and auditing capabilities. Furthermore, it can be integrated with the traditional automation systems RPAs to automate the ticket lifecycle where it is system driven, powered by the insights gathered from LLMs.
Solution 2: Domain-specific LLM Models/Solutions
Generalist LLM programs are often characterized as being proficient in a variety of areas yet lacking mastery in any one specific domain. In fields that demand a high degree of precision and expertise, such as medicine, finance, engineering, and law, errors or inaccuracies can have significant consequences. However, this does not necessarily mean that these domains cannot benefit from the use of LLM techniques. In fact, experts have begun developing specialized LLM programs tailored to specific domains, which utilize the same foundational techniques such as self-supervised learning and RLHF.
“The era of large models is over, and the focus will now turn to specializing and customizing these models. Real value is unlocked only when these models are tuned on customer and domain specific data.”
– Sam Altman, the CEO of OpenAI
Infosys has explored the use of Google’s large language models for solving domain-specific problems. Some domains, like legal or tax, have complex and nuanced language, which poses a unique set of challenges. Our team has developed solutions, including semantic search and structured report generation, that are tailored to two primary domains – legal and tax. Additionally, we have created solutions to evaluate the performance of these models in other tasks, such as custom classification, personalized recommendations, and tasks involving images and audio. Below are a few use cases/solutions:
Use Case 1: Semantic Search
The focus of this use case is to offer personalized legal guidance. Semantic search is an innovative method of information retrieval that endeavors to comprehend the context and significance of a user’s inquiry rather than solely relying on keyword matching. It surpasses conventional keyword-based searches by considering the associations between words, phrases, and concepts to furnish more pertinent and precise search outcomes. This technique is especially advantageous for handling ambiguous queries and understanding user intent in a more nuanced fashion.
- Data preparation and Embedding Generation: A set of legal cases which contain various details like the entities involved, the case, judgement, and justification for the judgement is the data used in this use case. The dataset encompasses documents and articles that are transformed into high-dimensional vector representations, commonly known as embeddings. These embeddings effectively capture the semantic relationships between words and concepts within the text.
- Query Processing: When a user submits a search query, the query text is tokenized and processed. Embeddings are generated for the query using the same approach used for the documents.
- Semantic Matching: The query embedding is compared to the embeddings of the documents in the dataset. Techniques such as cosine similarity or other distance metrics are utilized to measure the similarity between the query and each document.
- Ranking and Retrieval: The ranking of documents is determined by their similarity scores to the query. The documents with the highest similarity scores are considered the most relevant and used as context to be provided to the model.
- Response Generation: The best matching context from the previous step, along with the user query, is given to the model. The model then produces advice based on the same.
Use Case 2: Structure Report Generation
For the structured report generation, we used the Large Language Models by Google to structure the content of documentation in a format that is easy to interpret and understand. Our report generation capabilities are currently available for two domains: Legal and Finance/Tax. The input required for this functionality is a comprehensive document containing domain-specific information.
These documents are chunked into smaller parts, and an embedding is created for each of the chunks. When these chunks are passed along with the template prompt to the Large Language Model, it results in a structured report that has well organized information. The user can also choose the complexity of the final report, whether it should be simple or moderate to read. Since these documents tend to be domain sensitive, some amount of prompt tuning is required to extract the correct information from the large documents.
Use Case 3: Classification
During the process of shipping goods, customs require goods to be grouped into specific categories based on the contents of the goods and other parameters. These categories are called Harmonized System Codes. For classification using Large Language Models by Google, we provide the type of goods being shipped as a query and retrieve the classification of the goods along with the reasoning. The system generates the top 3 labels based on the input query and returns the top result along with the detailed reasoning for the classification.
Use Case 4: Image-based Information Processing
For answering personal questions based on the contents of the image, the solution leverages Large Language Models by Google and runs an Optical Character Recognition, and extracts the contents of the image. This content is the input to a Language Model that can answer user-specific questions such as “Can you tell me the number of calories in a packet of Lays?” and “I have a peanut allergy. Does this food item contain any peanuts?”. The user has the option to furnish additional details regarding their conditions while asking the query, and the solution consider this information while answering the query.
Use Case 5: Audio based Information Processing
The audio processing solution utilizing Large Language Models from Google provides users with the ability to search for content, generate insights, or summarize audio. In the initial steps of this solution, the large audio clip is broken into smaller clips for accurate processing. The smaller clips are then passed through a Voice-To-Text module to extract the text from the clips. The contents of these clips are then run through an embedding generation module which creates a repository of the data from the audio clips. Based on the selection of the user, this repository is used to return the results for the searches, create a summary of the audio clip, or generate user-specific insights based on the contents of the audio.
Use Case 6: Recommendation
For the recommendation solution using Large Language Models from Google, the user provides their persona to the model. The course information comprises details on the course contents, difficulty, duration, and other parameters fed to the models and asks for a recommendation based on user persona. The persona of the user is matched with a repository of courses, and the results are presented to the user. The solution is based on the process of creating the embedding from the persona of the user and matching it with the embedding database of the various courses.
Google – Infosys Generative AI Living Labs
Infosys has a deep collaboration with Google Cloud and offers a portfolio of Google Cloud powered Generative AI products. Infosys Generative AI Labs, part of Infosys Topaz, brings together ready-to-use industry solutions, accelerators, and responsible design frameworks. Empowering everyone within and beyond the enterprise to lead the generative revolution and make change happen as they imagine it, so they can “Reimagine digital enterprise”, “Redefine human capability” and “Reinforce AI ethics”.
Google Infosys Living Labs – Google’s Cloud Offering
The Way Forward /Conclusion
The process of developing customized LLMs entails several essential steps, such as gathering and organizing domain-specific data, identifying appropriate architectures, and employing advanced model training methodologies. Leveraging open-source tools and frameworks can facilitate the creation of bespoke models for organizations. The above-mentioned joint models/solutions by Infosys and Google are a few examples that can enable organizations to leverage language models precisely tailored to their distinct requirements and goals. By adopting domain-specific models, enterprises can gain a multitude of benefits, including enhanced performance, tailored responses, and optimized operations. This transition marks a departure from the exclusive use of generic models, ushering in a new era where organizations leverage the potential of customized Language and Learning Models (LLMs) to foster innovation, tackle industry-specific obstacles, and attain a competitive advantage in the ever-evolving landscape of artificial intelligence.
- Sameer Govind Joshi – Principal Product Architect, iCETS
- Dr. Varsha Jain – Senior Data Scientist, AINA
- Nikhil Nandanwar – Lead Consultant, iCETS
- Bhoomi Shah = Senior Associate Consultant, iCETS