Better Manage Risks Inherent in Big Data
"Big Data", as it is known, will undoubtedly deliver important scientific, technological, and medical advances. But Big Data also poses serious risks if it is misused or abused.
But having more data is no substitute for having highquality data. For example, a recent article in Nature reports that election pollsters in the United States are struggling to obtain representative samples of the population, because they are legally permitted to call only landline telephones, whereas Americans increasingly rely on cellphones. And while one can find countless political opinions on social media, these arent reliably representative of voters, either. In fact, a substantial share of tweets and Facebook posts about politics are computer-generated.
A Big Data program that used this search result to evaluate hiring and promotion decisions might penalize black candidates who resembled the pictures in the results for"unprofessional hairstyles," thereby perpetuating traditional social biases. And this isnt just a hypothetical possibility. Last year, a ProPublica investigation of "recidivism risk models" demonstrated that a widely used methodology to determine sentences for convicted criminals systematically overestimates the likelihood that black defendants will commit crimes in the future, and underestimates the risk that white defendants will do so.
Another hazard of Big Data is that it can be gamed. When people know that a data set is being used to make important decisions that will affect them, they have an incentive to tip the scales in their favor. For example, teachers who are judged according to their students test scores may be more likely to"teach to the test," or even to cheat.
Similarly, college administrators who want to move their institutions up in the US News and World Reports rankings have made unwise decisions, such as investing in extravagant gyms at the expense of academics. Worse, they have made grotesquely unethical decisions, such as the effort by Mount Saint Marys University to boost its "retention rate" by identifying and expelling weaker students in the first few weeks of school.
A third hazard is privacy violations, because so much of the data now available contains personal information. In recent years, enormous collections of confidential data have been stolen from commercial and government sites; and researchers have shown how peoples political opinions or even sexual preferences can be accurately gleaned from seemingly innocuous online postings, such as movie reviews-even when they are published pseudonymously.
Finally, Big Data poses a challenge for accountability. Someone who feels that he or she has been treated unfairly by an algorithms decision often has no way to appeal it, either because specific results cannot be interpreted, or because the people who have written the algorithm refuse to provide details about how it works. And while governments or corporations might intimidate anyone who objects by describing their algorithms as "mathematical" or "scientific," they, too, are often awed by their creations behavior. The European Union recently adopted a measure guaranteeing people affected by algorithms a "right to an explanation"; but only time will tell how this will work in practice.
When people who are harmed by Big Data have no avenues for recourse, the results can be toxic and far-reaching, as data scientist Cathy ONeil demonstrates in her recent book Weapons of Math Destruction.
The good news is that the hazards of Big Data can be largely avoided. But they wont be unless we zealously protect peoples privacy, detect and correct unfairness, use algorithmic recommendations prudently, and maintain a rigorous understanding of algorithms inner workings and the data that informs their decisions.
The author Ernest Davis is a professor of computer science at the Courant Institute of Mathematical Sciences, New York University.
China Aims High in Big Data Industry
China aims to more than triple the scale of its big data industry by 2020 in a bid to foster new economic drivers, according to a government plan.
The countrys big data industry should increase its annual sales to 1 trillion yuan (145 billion U.S. dollars) by 2020 from an estimated 280 billion yuan in 2015, said the plan released by the Ministry of Industry and Information Technology (MIIT).
The government is targeting a compound annual growth rate of around 30 percent for the industrys sales in 2016-2020, according to the plan.
It also set goals to create 10 world-leading big data companies by 2020 and establish 10-15 experimental zones to speed up the industrys development.
Efforts to promote big data application and make traditional industries smarter can add new momentum to Chinas economic transformation, the MIIT said.
The past few years have seen rapid expansion of Chinas information industry, laying a solid foundation for big data development in the future, it said.
China is one of the worlds biggest data producers, with over 700 million Internet users and 1.3 billion mobile phone users -- more than any other country on both measures.
The countrys information industry saw sales reach 17.1 trillion yuan in 2015, double the level in 2010, according to the MIIT.
Alibaba Sets up Worlds First Big Data Anti-fake Alliance
Initiated by Chinas e-commerce giant Alibaba Group, the worlds first alliance to fight fakes using big data was launched in Hangzhou, Zhejiang province on Jan. 16. The first 20 members issued a joint action plan to cut down on counterfeit products.
According to Zheng Junfang, Alibabas chief platform governance officer, the traditional way of cracking down on fake goods offline does not remove the source of copycat products.
"We have to have everybody involved and work together to do it," Zheng announced. Alibaba is willing to share its experience, skills, technology and resources with people all around the work in the anti-counterfeit battle, she said.
Only invited brands can join the alliance, and membership is limited to 20. The first batch of members include Dulux, LV, Swarovski, Trendy Group, DAZZLE, Shiseido, Bioderma, Amway, Mars, Pernod Ricard, Huawei, SUPOR, Joyoung, Sony, Samsung, Western Digital (Western Digital and SanDisk), Canon and Ford. Alibaba has long cooperated on anti-counterfeiting initiatives. By the end of 2016, the company had cooperated with more than 18, 000 brands to fight against fake goods.
A Huawei manager pointed out that by using big data, Alibaba is playing a leading role in anti-counterfeiting efforts.
"We are fully expecting an IPR protection blueprint outlined by the big data anti-fake alliance," the manager said.
Zheng made four promises at the launch of the campaign: continue to provide data and technology support, promote cooperation on anti-counterfeiting efforts, provide priority service for alliance members, and invite alliance members to work on policy-making and amendments.
The establishment of the anti-fake alliance is only the first step. Alibaba will use big data to build anti-counterfeiting tools and consolidate social consensus, regularly releasing reports on anti-counterfeiting efforts.
Big Data Show Netizens Hooked on Entertainment News
Results of a big-data analysis revealed that 68.29 percent of Chinese netizens are followers of entertainment news.
The number was released by Toutiao, Chinas major news and information app for mobile devices. It recommends personalized information to individual users based on analysis of their habits.
Thirty-three articles relevant to entertainment on Toutiao in average have 1 million or more readers. Video clips on Toutiao now daily attracts about 1.27 billion clicks, and 230 million of them are relevant to entertainment.
Film represents the largest percentage of entertainment news on the platform, with 56.08 percent of all Toutiao users follow film news, followed by TV series (49.43 percent), and reality shows (38.25 percent), and music (31.98 percent).
Some celebrities are titled "most popular" actors and singers on Toutiao due to the number of followers on their news.
For example, pop singer Li Yuchun became the most popular singer on Toutiao in 2016, and Hunan TV became the most popular platform for reality shows.