Thursday, November 7, 2024

Pakistan to Develop Urdu LLM for Generative AI

National University of Science and Technology (NUST), National Information Technology Board (NITB) and Telecom network operator Jazz have signed a Memorandum of Understanding (MOU) to develop Pakistan’s first indigenous Large Language Model (LLM) with focus on Urdu, including datasets for Pashto and Punjabi languages. It is aimed at empowering individuals, businesses, and organizations with advanced AI tools in their native languages. The envisioned LLM is expected to drive innovation in Generative AI applications, boosting productivity and accessibility in critical sectors like healthcare, education, and agriculture.

GPT-4 Accuracy Scores. Source: The Economist


Generative AI tools such as ChatGPT are powered by large language models, or LLMs. These models need to be trained on vast amounts of data in specific languages to be useful. Unfortunately, the Urdu content of the Internet is less than 0.1%. This will present a challenge for the developers of Urdu LLMs.

Online Content of Various Languages. Source: W3Techs 


Lack of Urdu content available for training ChatGPT affects the accuracy of the results for Urdu language users. For example, the GPT-4 accuracy score in question-answer tests in Urdu is just over 70%, compared with 85% accuracy score in the English language, according to data from OpenAI. Other South Asian languages, including Hindi, Bengali, Punjabi, Marathi and Telugu, suffer from the same problem. 

It's not just a South Asian problem. These challenges exist in the developing world. Non-European languages are generally poorly represented online. It's a major obstacle for non-European nations in developing their own generative artificial-intelligence (AI) models, which rely on vast amounts of training data. Generative artificial intelligence (AI) can produce biased results due to a number of factors, including the data it's trained on, the algorithms used, and how it's deployed. 

The use of AI in developing nations such as Pakistan will remain limited to a small number of people proficient in the use of the English language. Broadening the adoption of AI applications will require LLMs trained on local language content. The absence of this development could cost Pakistan the opportunity to take full advantage of the AI Revolution


8 comments:

Ahmed said...

Dear Sir

Asalam Aalikum

Mash Allah it is is great news and post which you have just shared but Sir the question is that why most of the people in Pakistan specially those who are running businesses and industry seem to be having problem with English? Mash Allah the new generation specially students who are studying in universities and colleges are good and proficient in English to some extent.

Let's suppose if we make LLM in Urdu which is no doubt a good initiative taken by authorities, but don't you think that it will limit or confine the achievment of Pakistan within the circle or domain of Pakistan?




Riaz Haq said...

Ahmed: "Let's suppose if we make LLM in Urdu which is no doubt a good initiative taken by authorities, but don't you think that it will limit or confine the achievment of Pakistan within the circle or domain of Pakistan?"

The two are not mutually exclusive. Pakistani developers and engineers can work with both English and Urdu LLMs and apps.

Domestically, Pakistan needs to have Urdu LLMs for GenAI apps for broad deployment and use in the country.

Riaz Haq said...

VEON’s Jazz Launches FikrFree: An AI-Powered Digital


https://www.globenewswire.com/news-release/2024/10/24/2968536/0/en/VEON-s-Jazz-Launches-FikrFree-An-AI-Powered-Digital-Marketplace-to-Unlock-Affordable-Insurance-and-Healthcare-in-Pakistan.html

VEON Ltd. (Nasdaq: VEON, Euronext Amsterdam: VEON), a global digital operator (“VEON” or the “Company”), today announces that Jazz, its digital operator in Pakistan, has launched FikrFree, a new AI-powered digital marketplace for insurance and healthcare. The platform aims to bridge a significant gap in Pakistan, where insurance sector penetration is less than 1% of GDP according to the Securities and Exchange Commission of Pakistan, and millions lack access to essential healthcare. In comparison, insurance penetration in other countries is significantly higher (over 7% of GDP in the US and more than 9% of GDP in the UK, according to the World Bank). FikrFree helps users find accessible and affordable coverage through personalized insurance plans and healthcare services.

FikrFree aims to reach the underserved healthcare market in Pakistan through an innovative platform that seamlessly integrates insurance, healthcare, and financial services all in one mobile app. FikrFree also leverages artificial intelligence to recommend personalized insurance plans for customers. The new digital service builds on VEON’s commitment to creating innovative digital solutions as part of its Digital Operator 1440 strategy, offering customers a portfolio of connected services that are relevant for each of the 1,440 minutes in a day. In 2Q24, direct digital revenues represented over 10% of VEON Group’s total revenues.

"Access to affordable healthcare is a fundamental need. In Pakistan, where millions struggle to find suitable insurance coverage and healthcare services, VEON is addressing this challenge with connected digital services. With the launch of FikrFree, we are empowering customers to access personalized insurance plans, specialist doctors, and on-demand medicine delivery—all in one seamless platform. Our digital operator strategy focuses on investing in services that enhance lives, and with FikrFree, we aim to make affordable healthcare accessible to all Pakistanis," says Kaan Terzioglu, CEO of VEON Group.

Riaz Haq said...

Generalists vs. Specialists: Evaluating Large Language Models for Urdu


https://arxiv.org/html/2407.04459v1

In this paper, we compare general-purpose pretrained models, (OpenAI's) GPT-4-Turbo and (Meta/Facebook) Llama-3-8b-Instruct with special-purpose models fine-tuned on specific tasks, XLM-Roberta-large, mT5-large, and Llama-3-8b-Instruct. We focus on seven classification and six generation tasks to evaluate the performance of these models on Urdu language. Urdu has 70 million native speakers, yet it remains underrepresented in Natural Language Processing (NLP). Despite the frequent advancements in Large Language Models (LLMs), their performance in low-resource languages, including Urdu, still needs to be explored. We also conduct a human evaluation for the generation tasks and compare the results with the evaluations performed by GPT-4-Turbo and Llama-3-8b-Instruct. We find that special-purpose models consistently outperform general-purpose models across various tasks. We also find that the evaluation done by GPT-4-Turbo for generation tasks aligns more closely with human evaluation compared to the evaluation by Llama-3-8b-Instruct. This paper contributes to the NLP community by providing insights into the effectiveness of general and specific-purpose LLMs for low-resource languages.

Ahmed said...


Dear Sir

As Alam Alaikum

I hope you are doing well, Sir just recently an AI summit conference was held in Islamabad in which IT professionals from Pakistan and from foreign countries took part.


In this Summit a Pakistani computer scientist Mr. Basic Diaz Sheikh also took part. He has mashallah contributed greatly to the field of telecom and IT and he is the founder and CEO of the company titled " FOR LOOP" .

I think the head office of the company "FOR LOOP" is in Lahore and it's other office is in Texas State of America.
It is a software solution company which provides software solutions and also has expertise in ML(Machine Learning) and AI(Artificial Intelligence).

Pls check his profile which I am sharing here from linked in:
______________________
An AI entrepreneur with a Ph.D. degree from the prestigious Cornell University in Computer Engineering and over 10 years of experience in C-level executive and top management positions across diverse high-tech companies, spanning the telecom, media, and artificial intelligence industries. Previously, I served as Advisor to the Prime Minister of Pakistan on Information Technology and Telecom, earning recognition as one of the 100 most influential Pakistanis in 2013 for my contributions to mobile broadband proliferation while CEO of Universal Services Fund Company.

Throughout my career, I've founded and led multiple successful ventures, including Capital TV, one of Pakistan's largest broadcast and digital news networks with over 2 billion annual video views across digital platforms. Additionally, I spearheaded InstaWorld, a venture-backed e-commerce logistics and fintech platform utilizing advanced AI algorithms to optimize delivery processes and cash-on-delivery management for thousands of clients. As the founder of Forloops, an AI software services company, we specialize in architecting, developing, deploying, and managing cutting-edge AI and generative AI solutions for clients across various sectors such as media, telecom, retail, fintech, and healthcare. Leveraging my extensive background and network in these fields, Forloops excels in delivering tailored AI solutions that drive innovation and efficiency for our clients.

My Ph.D. research at Cornell focused on asynchronous floating-point processor design, resulting in three patents and a best paper award for breakthrough research. Floating-point computations underpin all AI computations in chips, a fundamental aspect of my work that earned recognition at the Centennial Turing Award ceremony, the most esteemed forum in computer science. I've also gained valuable experience at leading semiconductor and hardware manufacturing companies, including Intel Corporation, Applied Materials Inc., and National Instruments. As a certified Artificial Intelligence expert, I have completed multiple certifications in Deep Learning, Generative AI, and Large-language models, further enhancing my expertise in the field.
_________________

Sir Riaz pls I would deeply appreciate if you could contact him and kindly discuss about how can students in the universities of Pakistan benefit from his company " FOR LOOP"?

Is it possible that his company can provide free internship and paid internship to Pakistani students of CS and IT who are interested in AI and ML after when they pass out from their universities?

Don't you think his compsny " FOR LOOP" which is a high standard and quality software solution company must have an agreement with various universities in Pakistan so that students after graduating from various universities in Pakistan can get internship in this company which can turn out to be a great learning experience and exposure for them?

Thanks

Ahmed said...

Dear Sir Diaz

I am sorry for the spelling mistake, their are some issues with my iPhone keypad, the computer scientist who actually attended this conference of AI in Islamabad Pakistan about whom I have posted my comment above and have clearly pasted his profile details from Linked in, his correct name is Basit Riaz Sheikh.

Thanks



Ahmed said...

Dear Sir

design and development of LLM(Large Language Model) is a very complex and hard task because the AI based machines or computers heavily depend on LLM. Just as computer systems have their OS(Operating Systems) and the whole computer system including the hardware of the computer system depends on OS.

This OS provides an interface between the user and the computer system and also provides other functionality so the user can easily use the computer.

This OS operates the computer and machine.

Similarly the LLM( Large Language Model) works like an operating system for AI based machines and computers.




Riaz Haq said...

Labelers training AI say they're overworked, underpaid and exploited by big American tech companies - CBS News

https://www.cbsnews.com/news/labelers-training-ai-say-theyre-overworked-underpaid-and-exploited-60-minutes-transcript/

Naftali Wambalo: I did labeling for videos and images.

Naftali and digital workers like him, spent eight hours a day in front of a screen studying photos and videos, drawing boxes around objects and labeling them, teaching the AI algorithms to recognize them.

Naftali Wambalo: You'd label, let's say, furniture in a house. And you say "This is a TV. This is a microwave." So you are teaching the AI to identify these items. And then there was one for faces of people. The color of the face. "If it looks like this, this is white. If it looks like this, it's Black. This is Asian." You're teaching the AI to identify them automatically.

Humans tag cars and pedestrians to teach autonomous vehicles not to hit them. Humans circle abnormalities to teach AI to recognize diseases. Even as AI is getting smarter, humans in the loop will always be needed because there will always be new devices and inventions that'll need labeling.

Lesley Stahl: You find these humans in the loop not only here in Kenya but in other countries thousands of miles from Silicon Valley. In India, the Philippines, Venezuela - often countries with large low wage populations - well educated but unemployed.

Nerima Wako-Ojiwa: Honestly, it's like modern-day slavery. Because it's cheap labor–

Lesley Stahl: Whoa. What do you –

Nerima Wako-Ojiwa: It's cheap labor.

Like modern day slavery, says Nerima Wako-Ojiwa, a Kenyan civil rights activist, because big American tech companies come here and advertise the jobs as a ticket to the future. But really, she says, it's exploitation.

Nerima Wako-Ojiwa: What we're seeing is an inequality.

Lesley Stahl: It sounds so good. An AI job! Is there any job security?

Nerima Wako-Ojiwa: The contracts that we see are very short-term. And I've seen people who have contracts that are monthly, some of them weekly, some of them days. Which is ridiculous.

She calls the workspaces AIi sweatshops with computers instead of sewing machines.

Nerima Wako-Ojiwa: I think that we're so concerned with "creating opportunities," but we're not asking, "Are they good opportunities?"

Because every year a million young people enter the job market, the government has been courting tech giants like Microsoft, Google, Apple, and Intel to come here, promoting Kenya's reputation as the Silicon Savannah: tech savvy and digitally connected.

Nerima Wako-Ojiwa: The president has been really pushing for opportunities in AI –

Lesley Stahl: President?

Nerima Wako-Ojiwa: Yes.

--------------

Fasica: I was basically reviewing content which are very graphic, very disturbing contents. I was watching dismembered bodies or drone attack victims. You name it. You know, whenever I talk about this, I still have flashbacks.

Lesley Stahl: Are any of you a different person than they were before you had this job?

Fasica: Yeah. I find it hard now to even have conversations with people. It's just that I find it easier to cry than to speak.

Nathan: You continue isolating you-- yourself from people. You don't want to socialize with others. It's you and it's you alone.

Lesley Stahl: Are you a different person?

Naftali Wambalo: Yeah. I'm a different person. I used to enjoy my marriage, especially when it comes to bedroom fireworks. But after the job I hate sex.

Lesley Stahl: You hated sex?

---------
These three and nearly 200 other digital workers are suing SAMA and Meta over "unreasonable working conditions" that caused psychiatric problems