Sunday, April 10, 2011

US Mining Urdu Content on Facebook and Twitter?

Why does the US government love social media and hate Wikileaks?

The answer is simple: Unlike Wikileaks that reveals unflattering information about its inner workings, the US government sees a treasure trove of detailed data on Facebook, twitter, blogs and other social media...particularly in Urdu and Arabic in light of recent developments in the Middle East and South Asia.

All of the stuff on digital social networks can be a great source of knowledge about user data and sentiments that governments and corporations love to exploit. It can be used not only to gather valuable intelligence for better product marketing and effective government propaganda, but also to tweak policies and offer products and services through crowd-sourcing to achieve desired outcomes.

The real challenge is how to make sense of the vast amount of information, discern meaningful patterns and use it as guide for action. The current application programming interfaces (APIs) and tools for social media monitoring and analysis are still evolving. The current tools focus mainly on sentiment analysis, giving governments and companies a general sense at best.

Among the various researchers tackling the challenge is an Indian-American computer scientist Rohini Srihari. She is working under a US grant to help mine and decipher data from Urdu social media. Her company, called Janya Inc., gets funding from the Pentagon for the project.

In a recent interview with NPR Radio, Srihari explained: "What I want is to determine who are the people, places and things being talked about; Is there an opinion being expressed? Is it a positive or negative opinion being expressed?"

If the ongoing social media data mining research efforts do succeed, the rich and powerful corporations and governments will be further strengthened to manipulate the opinions and preferences of the average consumers and voters. Such an outcome could lead to people being influenced to act against their own best self-interest in the name of freedom and democracy.

Here's a video clip titled "Urdu Enters the Digital Age" featuring Srihari explaining how her software works:

Related Links:

Haq's Musings

Obama's Success With Social Media

Obama on Urdu Poetry, Cricket, Daal, Keema

Case Against Wikileaks' Assange

PakAlumni-Pakistani Social Network

Media and Telecom Revolution in Pakistan

Pakistan's Telecom Boom

Pakistan Tops Text Message Growth

WiMax Rollout in Pakistan


Mayraj said...

I do not think common man is naive outside US (US is another story).
Remember failure if Al Hurra vs success of Al Jazeera!

Anonymous said...


Please read the case of hgharry case where the commanderx dumped the communication beween the security company and fbi.

Security company was writing malware trojans to track the internet users by infesting the individual pcs. So it is common that the discussion blogs are farmed in the name of national interest.

Riaz Haq said...

Here are some excerpts of an Op Ed by William Martin, US Consul General, published in The Express Tribune:

Perhaps showing the generation gap, I did not know that Pakistan has such a lively and active blogging community, with over three million citizen-journalists freely reporting on virtually every topic under the sun. Pakistan has one of the fastest-growing Facebook and Twitter-using populations in the world, with over four million Facebook users. Remarkably, the per capita internet access in Pakistan is between 10-15 per cent of the total population — more than double that of neighbouring India. Using even the most conservative estimates, 20 million Pakistanis are regularly online, or the equivalent of the population of four Singapores.

Pakistan enjoys tremendous freedom of information and online expression. As a representative of the United States, I am keenly aware of the vibrancy of that free speech every time I log in to my computer or pick up a newspaper. Although a bit bruised sometimes, I welcome it! By amplifying the diversity of voices, social media is making life a richer experience for us all. And this is possible because Pakistanis are using their freedom of expression every day, online. Blogging is reinforcing the backbone of democracy – freedom of speech – a freedom that is enshrined in the US Constitution.

In Pakistan, the freedom of the press was earned over time, through the sacrifices of its people, especially the sacrifices of those in the media community. Journalists and bloggers now play a central role in the effort to institutionalise these hard won freedoms.

We must never forget, the many journalists who have been killed or injured as they sought to report on the challenges facing us today. They take extraordinary risks to enlighten us with the truth. Nobody embodied this commitment more than Syed Saleem Shahzad, who was senselessly murdered trying to pursue this truth. All of us are diminished by his passing. But, there is no doubt that his work will continue and others will pick up the baton and carry on. It is up to each of us to honour his legacy and do all we can to support press freedom as a fundamental right to be enjoyed by everyone, everywhere. Blog on.

Riaz Haq said...

Here's an AP report on CIA mining Facebook, Twitter and other social media:

In an anonymous industrial park, CIA analysts who jokingly call themselves the "ninja librarians" are mining the mass of information people publish about themselves overseas, tracking everything from common public opinion to revolutions.

The group's effort gives the White House a daily snapshot of the world built from tweets, newspaper articles and Facebook updates.

The agency's Open Source Center sometimes looks at 5 million tweets a day. The analysts are also checking out TV news channels, local radio stations, Internet chat rooms — anything overseas that people can access and contribute to openly.

The Associated Press got an apparently unprecedented view of the center's operations, including a tour of the main facility. The AP agreed not to reveal its exact location and to withhold the identities of some who work there because much of the center's work is secret.

From Arabic to Mandarin, from an angry tweet to a thoughtful blog, the analysts gather the information, often in a native tongue. They cross-reference it with a local newspaper or a clandestinely intercepted phone conversation. From there, they build a picture sought by the highest levels at the White House. There might be a real-time peek, for example, at the mood of a region after the Navy SEAL raid that killed Osama bin Laden, or perhaps a prediction of which Mideast nation seems ripe for revolt....
The most successful open source analysts, Naquin said, are something like the heroine of the crime novel "The Girl With the Dragon Tattoo," a quirky, irreverent computer hacker who "knows how to find stuff other people don't know exists."

An analyst with a master's degree in library science and multiple languages, especially one who grew up speaking another language, makes "a powerful open source officer," Naquin said.

Riaz Haq said...

Education News reports concerns about the loss of Urdu script with growth of technology:

In SMS-happy Pakistani, many young people are writing their text messages in using the Latin alphabet, rather than the traditional Urdu script. That has some concerned that the classical script will disappear.

Cell phone users in Pakistan sent an average of 128 text messages each per month in 2009, government figures show.

That was the fifth highest figure among all countries in the world. Fueled by texting, a growing number of Pakistanis are using Latin letters to write Urdu, the national language, instead of using the official Urdu script.

Though the trend is limited, it has left some Urdu purists concerned about what happens if the trend continues.

While it may sound harmless, it has unintended consequences. Because the first generations of mobile phones couldn’t send text messages using Urdu script, Pakistanis improvised and started converting Urdu phrases into the Latin alphabet. Even though Urdu-capable phones are more common now, many people have become used to the Latin script.

Shaista Parween, a math and computer studies teacher, said texting-mad students are just as comfortable writing Urdu in Latin as they are using the regular script. In fact, she said they sometimes do schoolwork using the Latin alphabet.

“I’m facing this a lot in my classes,” Parween said. “Latin Urdu is being used so much, what can we do? We can’t say it’s wrong if they are trying. It’s used so much in the media and television, that’s why.”

Officially Urdu is written in a variation of the Arabic script. But while the use of Latin letters for Urdu has reached high levels, though still a minority, it isn’t the first time it’s been done.

European missionaries and administrators converted Urdu into the Latin script in the 18th century. And in the 1950s, military ruler Ayub Khan proposed officially writing Urdu in Latin letters, just as Mustafa Kemal Atatürk had done with Turkish decades earlier. But religious leaders said the Arabic script was an important connection to Pakistan’s Islamic identity, so Ayub abandoned the idea.

But now, tech-savvy kids are doing what a military dictator couldn’t achieve 40-years ago. And many Pakistanis aren’t happy about it.

“Trying to write a language in another script is like trying to drop off your skin and trying to have a new one,” said Rauf Parekh, an assistant professor at the University of Karachi’s Urdu Department.

He’s concerned about the impact this will make on society if people stop learning the Urdu script.

“They will be cut off from their culture, from their tradition, their history, their classical literature. How are they going to enjoy if they cannot read it in the original. So it’s a kind of deprivation on cultural and educational side. They won’t feel it perhaps now, but maybe hundred years from now they will realize what a great loss they have incurred,” he said.

While Parekh bemoans the loss of traditional Pakistani culture, a new kind of “text messaging culture” is emerging. Pakistanis use text messages for just about anything, but especially for passing on political jokes, poetry, quotes and for flirting.
One book is titled “Cool SMS,” another “Love & Love SMS.” Each joke or poem is printed in both the Urdu script and the Latin transliteration.

“It’s been about 10 years that these books have been published now,” shop owner Basharat explained. “There was a lot of demand for them initially. This is because the majority of our population is not educated, so Latin Urdu books were made so that every person can read the books and send SMSs. It made it so much easier.”..

Riaz Haq said...

Here's a report on free tweeting in Pakistan:

Whenever a country that has a history of internet censorship gains better access to one of the internet’s most important tools, it’s big news.

And that’s exactly what has happened today. Starting today, Pakistan’s largest provider of cellular services has announced that its prepaid customers can tweet away – for free.

“Data charges for accessing Twitter have been made ZERO for all Mobilink prepaid subscribers. Subscribers don’t require to subscribe to this offer since it is available for all prepaid subscribers by default,” says Mobilink.

That means that users can tweet and retweet all they want without incurring any data charges. This removes one of the impediments from Pakistani Twitter users, who have faced state censorship of Twitter in the past.

Back in May of 2012, the Pakistan Telecommunication Authority shut off Twitter access for the entire country for approximately 8 hours following the circulation of content deemed blasphemous on the network. Some speculated that the move had less to do with the specific content and more to do with a simple test as to whether a state-wide blockage was feasible.

As far as the rest of the internet goes, the Pakistani government has a history of censorship in the areas of so-called blasphemy and pornography. Recently, that censorship has moved to content that falls in the realm of political speech. In a country with this track record, free access to Twitter is a significant opportunity for its people – considering access remains open.

There are some caveats to the deal. Mainly, tweets must be sent via – not Twitter’s native apps.


“[G]oing on external links will result in data charging. Whenever a subscriber clicks on an external link, he will be shown a notification indicating that standard data charges apply to view the link. External link will be opened after subscriber’s consent only.”

But for the purposes of simply communicating (being that all-important amateur reporter), this is a great thing for Pakistani tweeters.

Riaz Haq said...

Here's a report about free Wikipedia access for Mobilink's pre-paid customers:

Mobilink has launched Wikipedia Zero with the aim of providing its customers with free access to the world’s largest general reference database. The source will be available for Mobilink’s prepaid customers who will have free access round the clock to the full mobile version of Wikipedia. Mobilink customers will also be able to view these articles in Urdu on supported handsets.
Farid Ahmad, Vice President Marketing Mobilink commenting on the launch of Wikipedia Zero said, “As Pakistan’s leading mobile internet provider we are proud to partner with the world’s sixth largest website to offer our customers free access to Wikipedia.
We hope that our customers will enjoy browsing through Wikipedia on Pakistan’s fastest mobile data network.’’
The service is available for all new and existing prepaid customers free of cost by accessing Wikipedia at OR from either their native mobile browser or through Opera Mini.

Riaz Haq said...

KARACHI: Apple implemented the Urdu language keyboard across mobile devices in its iOS 8 update in mid-September. But persistence of a few Urdu speakers has now forced the IT giant to consider adopting the native typeface of Urdu language, Nastaleeq, along with the current Naskh font.
The Cupertino-based company had launched iOS 8 on September 17 in what was termed their biggest software update to date, boasting a whole host of features for those using modern Apple mobile devices. Hidden among those upgrades was the implementation of the Urdu keyboard across the system. This enables users of Apple’s popular iPhone, iTouch and iPad devices to type in Urdu while using texts, email, social media and other interaction. This functionality was previously available only through language-specific apps such as Urdu Writer.
The downside for some in this new feature, though, is that it follows the Naskh typeface derived from Arabic Unicode, rather than Nastaleeq. This prompted the creator of Urdu Writer, Mudassir Azeemi, to start a social media campaign. In writing a letter, emailing and tweeting to the CEO of Apple CEO, Tim Cook, Azeemi urged, explained, even pleaded about how easy it would be to implement the Nastaleeq typeface for the Urdu keyboard in iOS 8.
The effort paid part dividend when on October 13, Azeemi got a phone call from Cook’s representatives. Azeemi was initially fretting whether there are lawyers on the other end with a cease and desist notice over his campaign. Instead, the representative assured him that Apple will consider implementing the typeface.

The road thus far
Azeemi, a Pakistani who works out of San Francisco, says the roots of his campaign germinated well before his email to Cook on October 5, and subsequent snail-mail on October 8.
“When my daughter was growing up, I saw that she was learning alphabet through YouTube,” the app and user experience designer tells The Express Tribune. Hoping that his child could use technology to learn about her heritage and the Urdu language, Azeemi started work on building an app about Urdu.
“We started wondering if we can implement it on iOS. We then designed an entire keyboard for Urdu Writer in 2010.”
Urdu Writer met with great success as it allowed people to type in Urdu and share their writing through email, SMS, Facebook and Twitter. Most importantly, Urdu Writer allowed users to access the Nastaleeq font.
Why Nastaleeq?
Nastaleeq is a Perso-Arabic script. It is the preferred writing script for Persian Kashmiri and Punjabi – languages which contributed in the creation of Urdu.
“We asked around and a lot of people said that they could not read Naskh,” Azeemi says before explaining that readability of Urdu is better in Nastaleeq rather than in other members of the font family, including the widely implemented Naskh, or the lesser used Kufic font.
This was echoed by Ahsan Saeed, who divides time as an Urdu localization moderator for Twitter and his day job at a digital agency. “People used to tell me that Twitter should have the Nastaleeq font.”
Saeed says that his own mother, too used to the Nastaleeq font, can’t read Urdu posts on Facebook or Twitter because they are in the Naskh typeface, but can read Urdu newspapers in the Nastaleeq font online.

Riaz Haq said...

Assange believes #Google is an extension US govt and instrument of US Policy. …

From Newsweek by Julian Assange of Wikileaks:

It was at this point that I realized Eric Schmidt might not have been an emissary of Google alone. Whether officially or not, he had been keeping some company that placed him very close to Washington, D.C., including a well-documented relationship with President Obama. Not only had Hillary Clinton’s people known that Eric Schmidt’s partner had visited me, but they had also elected to use her as a back channel.

While WikiLeaks had been deeply involved in publishing the inner archive of the U.S. State Department, the U.S. State Department had, in effect, snuck into the WikiLeaks command center and hit me up for a free lunch. Two years later, in the wake of his early 2013 visits to China, North Korea and Burma, it would come to be appreciated that the chairman of Google might be conducting, in one way or another, “back-channel diplomacy” for Washington. But at the time it was a novel thought.

I put it aside until February 2012, when WikiLeaks—along with over thirty of our international media partners—began publishing the Global Intelligence Files: the internal email spool from the Texas-based private intelligence firm Stratfor. One of our stronger investigative partners—the Beirut-based newspaper Al Akhbar— scoured the emails for intelligence on Jared Cohen.

The people at Stratfor, who liked to think of themselves as a sort of corporate CIA, were acutely conscious of other ventures that they perceived as making inroads into their sector. Google had turned up on their radar. In a series of colorful emails they discussed a pattern of activity conducted by Cohen under the Google Ideas aegis, suggesting what the “do” in “think/do tank” actually means.

Cohen’s directorate appeared to cross over from public relations and “corporate responsibility” work into active corporate intervention in foreign affairs at a level that is normally reserved for states. Jared Cohen could be wryly named Google’s “director of regime change.”

According to the emails, he was trying to plant his fingerprints on some of the major historical events in the contemporary Middle East. He could be placed in Egypt during the revolution, meeting with Wael Ghonim, the Google employee whose arrest and imprisonment hours later would make him a PR-friendly symbol of the uprising in the Western press. Meetings had been planned in Palestine and Turkey, both of which—claimed Stratfor emails—were killed by the senior Google leadership as too risky.

Looking for something more concrete, I began to search in WikiLeaks’ archive for information on Cohen. State Department cables released as part of Cablegate reveal that Cohen had been in Afghanistan in 2009, trying to convince the four major Afghan mobile phone companies to move their antennas onto U.S. military bases. In Lebanon, he quietly worked to establish an intellectual and clerical rival to Hezbollah, the “Higher Shia League.” And in London he offered Bollywood movie executives funds to insert anti-extremist content into their films, and promised to connect them to related networks in Hollywood.


If the future of the Internet is to be Google, that should be of serious concern to people all over the world—in Latin America, East and Southeast Asia, the Indian subcontinent, the Middle East, sub-Saharan Africa, the former Soviet Union and even in Europe—for whom the Internet embodies the promise of an alternative to U.S. cultural, economic, and strategic hegemony.

A “don’t be evil” empire is still an empire.

Extracted from When Google Met Wikileaks by Julian Assange published by OR Books. Newsweek readers can obtain a 20 percent discount on the cover price when ordering from the OR Books website and including the offer code word NEWSWEEK.

Riaz Haq said...

#Pakistan minister confirms the country is switching to #Urdu, dropping #English as official language via @TIMEWorld Pakistan is dropping English as its official language and switching to Urdu, a popular language in the Indian subcontinent.

The long-rumored change was confirmed by Pakistani Minister of Planning, National Reforms, and Development Ahsan Iqbal in an exclusive interview with TIME.

Iqbal said the change was being made because of a court directive. The Pakistani constitution, which was passed in 1973, included a clause specifying that the government must make Urdu the national language within 15 years, but it had not been enforced.

Still, Iqbal said the country is not entirely abandoning English, which will still be taught alongside Urdu in schools.

“It means Urdu will be a second medium of language and all official business will be bilingual,” he said.

Some Pakistanis fear that the move is part of an official backlash against the younger generation, which has been more open to Western culture.

But Iqbal argued that the move would help make Pakistan more democratic, since it will “help provide greater participation to people who don’t know English, hence making the government more inclusive.”

Urdu is just one of a number of languages spoken in Pakistan, but it retains a cultural cachet as the language of movies and music as well as the Islamic religion, while English has been more popular among elites and government ministries.

According to the CIA Factbook, nearly half of Pakistanis speak Punjabi, the language of the Punjab region, while only 8% speak Urdu. Several other languages are spoken by a fraction of the population.

The decision to break away from English creates a stark contrast with Pakistan’s neighbor and longtime rival India. English was the official language of the area that now comprises both countries under British rule, which ended in 1947.

Despite a similar language clause in its constitution, India continues to use both English and Hindi as its official languages.

Riaz Haq said...

Lexicon-based sentiment analysis for Urdu language

Social media has recently become as a powerful weapon people use for online discourse, creating content, share it and network with other individuals at a phenomenal frequency. With social media and user-generated content exploding the web/blogs/social networking forums, vendors/critiques/socialist and influential individuals got enthusiastic to mine this substantial data set for obvious meaning, but they soon discovered a novel challenge: to know that someone is talking about a particular topic/service/brand or social event is far less important in comparison to know how they are feeling and conversing about it. This is known as sentiment analysis or opinion mining. Numbers divulge that people are extensively using social media, expressing their positive opinions or negative apprehensions online. As an aftermath the concept of sentiment analysis/opinion mining is broadly being acknowledged and employed by society as a whole to enhance their business/products/services or just to assess the overall prevailing environment. Acknowledged work is being done in this area converging towards exploring sentiment analysis, its definite requirement in this era, different frameworks for sentiment analysis and their comparison with other previously proposed techniques, but unfortunately Urdu language is not considered comprehensively in this context. As Urdu is one of the prevalent languages, this paper aims at creating an application for sentiment analysis of Urdu comments on various websites. Elaborated system architecture is discussed in detail with techniques employed; experimentation procedure and proven results of 66% accuracy are also deliberated. The F-measure achieved by this proposed system is 0.73. Challenges faced in sentiment analysis with respect to this neglected language are also highlighted for future considerations.

Riaz Haq said...

#Pakistan’s #NawazSharif to use #American data firm that helped Trump win. #PMLN #PTI #PPP

The large foreign data firm Cambridge Analytica (CA) has turned to Pakistan as its services were hired by ousted Pakistani Prime Minister Nawaz Sharif to rescue his political career and to make sure PML-N win in upcoming general election. CA firm was hired during the Panama scandal case, as part of the campaign to smear the judiciary. However, after disqualification from public office by the Supreme Court of Pakistan, Nawaz Sharif also instructed CA to help promote his PML-N party throughout the current election cycle.

Cambridge Analytica is a data-driven campaigner firm. According to its website, the firm says it “uses data to change audience behaviour”. The political section of this firm’s website claims, “We find your voters and move them to action. CA Political has redefined the relationship between data and campaigns. By knowing your electorate better, you can achieve greater influence while lowering overall costs. There are no longer any experts except Cambridge Analytica.”

This firm was also hired by the Donald Trump for his presidential campaign. The Make America Great Again slogan was said to be invented by this firm and played a key role in the victory of Trump as CA claims to have 5,000 data points on over 230 million American voters. With these points, CA built up a custom target audience, and then used this crucial information to engage, persuade, and motivate voters to act in favor of Trump.

Some whistle blowers pointed out that this firm also played an essential role in the Brexit event. The Kenyan newspaper The Star also reported that President Uhuru Kenyatta hired this firm to win a second term of his presidency in 2017. After Trump and Kenya, this firm turned to Pakistan at the behest of Nawaz Sharif, who was disqualified from public office by the Supreme Court of Pakistan. He is also facing trial in Pakistan’s courts on corruption charges.

The ongoing well organized data-driven campaign against Pakistan’s judiciary has hinted that this firm has introduced a new online targeting campaign by controlling/manipulating social and main stream media in Pakistan. Nawaz’s party is still in ruling the country, so through huge government advertisements, they are controlling mainstream media.

Meddling Media

Biased news reporting against the judiciary has hinted that under the procurement of Nawaz, CA has taken control over Pakistani news rooms. The trend of fake news in Pakistan was accelerated by this big data firm through biased reporting. They are trying to manipulate data and the people who come into contact with this data for their own purposes. Pakistani electronic media is still immature and it always toes the line of big western corporate mainstream media. There is no doubt that the Jang/Geo group is a trendsetter in Pakistan’s media while the owner of this group, Mir Shakeel Ur Rahman is considered to be the godfather of Pakistani media. He maintains a strict grip on journalist associations and unions. He also controls Pakistan’s Broadcaster Association (PBA) through his son.