麻豆视传媒免费观看

Sections

Commentary

Moving toward truly responsible AI development in the global AI market

Chinasa T. Okolo and Marie Tano
Marie Tano
Marie Tano Stanford HAI Tech Ethics Policy Fellow - Stanford University

October 24, 2024


  • Many AI applications, including large language models, rely on patterns learned from labeled datasets to generate accurate responses to new inputs.
  • Large AI companies, such as OpenAI, often outsourcethe labeling of these vast datasets to regions like Africa, where workers face low pay, limited benefits, and long hours, often engaging with sensitive or graphic materials.
  • To addressethical concerns about labor exploitation, the听U.S.should reform听laws around privacy and labor outsourcing, and companies must invest in local initiatives that prioritize the dignity and fair treatment of data workers to avoid echoes of colonialist exploitation.
Nigerian artist Malik Afegbua creates hyper-realistic pictures of African elderly people, using artificial intelligence, at his home in Lagos, Nigeria January 25, 2023.
Nigerian artist Malik Afegbua creates hyper-realistic pictures of African elderly people, using artificial intelligence, at his home in Lagos, Nigeria January 25, 2023. REUTERS/Temilade Adelaja

Artificial intelligence (AI) companies鈥 persistent efforts to obscure the vital role of human labor in data work鈥攃oupled with their lack of transparency and minimal recognition of data workers as true collaborators鈥攈ighlight a troubling disregard for the very people who sustain their technologies. Beyond the engineers who design machine learning algorithms, data workers play a crucial role in processing and evaluating datasets used to train AI models in content moderation, natural language processing, and speech recognition.

The data pipeline involves collection, preparation, labeling, and verification, requiring human intervention for nuanced contextual understanding and ethical integrity. Due to its labor-intensive and costly nature, tech companies often outsource these roles to the Global South (e.g., Africa, the Caribbean, Latin America, Southeast Asia, South Asia, and Oceania) through business process outsourcing (BPO) or digital labor platforms like , (), and .

One area of data work that will continue to grow is the global data annotation market, which is a critical component of the AI industry that is estimated to expand in the coming years due to the increasing demand for labeled datasets. These labeled datasets, which provide correct answers for the AI to learn from, are crucial for training AI models, since they enable them to identify patterns and relationships between the data and labels. AI models then leverage these learned relationships to generate predictions and classifications on new, unseen information. However, the growth of AI, along with data practices, bias, and climate impacts, has given rise to significant concerns and unethical practices, potentially exacerbating existing inequalities in developing countries. As AI development scales, these issues will likely become more pressing.

The data annotation market as a site of modern-day colonialism

Many researchers have drawn parallels between modern-day data annotation efforts and the colonial extraction and exploitation of resources for economic gain. Data work in developing countries is largely unregulated, and data annotators in the Global South often face low wages, precarious employment, and minimal job security. In 2023, TIME magazine exposed that Kenyan workers training the content filters for OpenAI鈥檚 ChatGPT earned less than $2 an hour, labeling 150 to 250 passages of graphic text during each nine-hour shift. Similarly, a 2022 MIT Technology Review series on AI colonialism reported that data labelers in Venezuela, Kenya, and Northern Africa faced poor working conditions, including delayed payments, pay discrepancies, erratic schedules, and heavy workloads. These cases show that these populations are often exploited financially without concern for their psychological well-being.

Many local African economies cannot support full employment, making data work an enticing opportunity for economic growth and career advancement for citizens. However, issues like cross-border micropayments, fees, verification requirements, and trust barriers present significant obstacles. Complex international financial transactions often result in delays and extra costs, further reducing the modest earnings of data workers. Meanwhile, verification requirements can exclude those without access to necessary documentation or banking services. Persistent stereotypes about African labor鈥攕uch as assumptions of substandard or digitally illiterate workforces鈥攁lso limit access to global tech opportunities for many skilled African developers and technical workers.

To overcome these biases, some workers resort to virtual private networks (VPNs) or similar tools to conceal their location, seeking equal opportunities in the global marketplace. These barriers not only reinforce existing inequalities, but also encourage a system where data workers compete against each other for these opportunities.听

Yet, beyond these economic barriers, Western ideals further constrain data workers by imposing rigid guidelines on how data should be labeled and interpreted. Annotators are required to fit their work into strict taxonomies that reflect Western-centric perspectives, which disregard the unique cultural backgrounds and insights of these workers. This forced framework is evident when workers are asked to label social categories, such as race, according to U.S.-centric classifications that lack relevance for local cultural and racial complexities. Such impositions reveal that data work is not merely a mechanical task, but one that demands nuanced, context-specific decisions with the understanding that workers will often by undervalued by Western clients who fail to recognize the deeper knowledge required for culturally sensitive data labeling.

Forward-looking analysis

Human labor in AI development is crucial for identifying potential model problems and providing nuanced contextual understanding that vast amounts of data alone cannot offer. Data workers often possess unique insights and local knowledge that can significantly enhance the quality and contextual relevance of AI systems, especially in new domains and use cases. For instance, data workers engaged in content moderation on platforms, like social media, are often responsible for flagging harmful content that requires cultural awareness and sensitivity to local norms, as automated systems may miss culturally specific slang, symbolism, or context. Similarly, in tasks like tone detection, data workers bring essential insights into how different cultures express sarcasm, which helps refine models that otherwise might misinterpret language subtleties. In AI development for health care, data annotation is essential for training AI models to detect diseases like diabetic retinopathy, skin cancer, and cardiovascular risks.

Increasing equity in data work

Adapting AI tools and guidelines to reflect the local context and the lived experiences of annotators is crucial for ensuring the cultural relevance and accuracy of AI outputs. Localization involves modifying AI tools and data labeling guidelines to align with the socio-cultural norms and values for the regions where data workers operate. This can help avoid the imposition of Western-centric perspectives and promote the inclusion of diverse viewpoints. By considering local contexts, AI systems can become more effective and ethically sound, ultimately leading to better outcomes for all stakeholders involved.

Enhancing data worker participation in AI development can be achieved through greater AI literacy and educational opportunities. Training programs should equip workers with skills in data collection, curation, and interdisciplinary collaboration, empowering them to engage more effectively with AI tools and understand their contributions. Initiatives like Karya, launched in 2021 by former Microsoft researchers, have made significant strides by paying rural Indians 20 times the minimum wage for data work, offering ownership and royalties, upskilling opportunities to over 20,000 workers, and both their digital skills and financial literacy.

Policy recommendations toward improving the global data annotation ecosystem

To respect the lived experiences of data workers, the implementation of robust labor regulations in Global South countries is vital for protecting data workers from exploitation, discrimination, and underpayment. Governments and regulatory bodies should establish clear standards for fair wages, job security, and working conditions in the broader data annotation industry. These regulations should also address the specific challenges faced by data workers, such as precarious employment and minimal job security. By enforcing labor standards, countries can create a more equitable environment for data workers, ensuring their rights are protected and their contributions are compensated. Consumers of AI technologies, companies, and research organizations that frequently use data annotators should support local initiatives that advocate on behalf of data workers. Establishing and supporting data worker unions is crucial for promoting parity in wages, job security, and improved working conditions. Unions can provide a collective voice for data workers, enabling them to negotiate better terms and protect their rights. Initiatives like the demonstrate the potential for organized efforts to drive positive changes in the data annotation industry. Unionization efforts can empower data workers to address exploitation and advocate for more equitable treatment, contributing to a more just and ethical AI development process.

U.S. policy is essential for ensuring fair treatment of data workers both domestically and abroad. For instance, reforming existing labor laws to accurately support the nuances of data work and ensuring that the researchers and companies that are using these platforms compensate domestic workers in accordance with state and federal guidelines is crucial. Such reforms should consider the unique nature of data work, which often involves irregular hours, varying tasks and workloads, and the use of digital platforms that may not fit neatly into traditional labor law frameworks. By updating these laws, the U.S. can better protect data workers from exploitation.

The U.S. must also amend its data privacy protection regulations to better serve data workers employed within its borders and to set standards for ethical data practices that resonate globally. Data workers often handle significant amounts of personal information while labeling and processing data for AI systems, highlighting the need for clear and transparent consent processes. Amendments to these regulations should include provisions to ensure users understand how their data is used, the ability to opt out or request deletion, and fair compensation, such as royalties, for the use of their personal data in training models. Such measures would acknowledge the value of the data that they help process, while also enhancing trust and cooperation in the AI development process.

Beyond U.S. borders, where data labeling platforms often operate, these policies can influence global standards, encouraging other countries to adopt similar protections for data workers. Given the dominance of U.S.-based AI companies, these companies should also adhere to guidelines that address human rights, civil rights, and labor concerns in all regions where they operate. By establishing ethical frameworks that protect data workers both in the U.S. and internationally, American-based companies can lead by example, promoting a more equitable and ethical global data ecosystem. This approach not only sets a precedent for responsible data use, but also ensures that workers in different countries have more consistent protections and recognition for their contributions.

Finally, U.S. companies outsourcing data work to the Global South should strive to uphold ethical labor practices that align with domestic labor and employment laws and practices, even when operating in countries with more lenient regulations. While it may be tempting for companies to take advantage of lower wages and reduced labor protections in other countries, adhering to U.S. standards for wages, working conditions, and worker rights can help mitigate the risk of exploitation and underpayment.

As AI systems evolve and see deployment in diverse and complex scenarios, the need for these human qualities will grow. Although automation can reduce the need for human intervention in some areas, tasks requiring deep understanding, ethical judgment, and cultural sensitivity will always depend on human intelligence. Therefore, companies should treat annotators as collaborators, leveraging their socio-cultural perspectives to enhance AI models. By fostering collaborative frameworks where data workers contribute to the creation and refinement of guidelines, we can ensure their experiences are valued and integrated into AI systems, promoting a more inclusive and ethical approach to AI development.

Authors

  • Acknowledgements and disclosures

    Amazon and Microsoft are general, unrestricted donors to the Brookings Institution. The findings, interpretations, and conclusions posted in this piece are solely those of the authors and are not influenced by any donation.

  • Footnotes
    1. Gebrekidan, B. F. (2024). Content Moderation: The harrowing, traumatizing job that left many African data workers with mental health issues and drug dependency. Creative Commons BY 4.0.
    2. Tubaro, P., Le Ludec, C., & Casilli, A. A. (2020). Counting 鈥榤icro-workers鈥: societal and methodological challenges around new forms of labour. Work Organisation, Labour & Globalisation, 14(1), 67-82.
    3. Gray, M. L., & Suri, S. (2019). Ghost work: How to stop Silicon Valley from building a new global underclass. Eamon Dolan Books.
    4. Hao, K., & Hern谩ndez, P. A. (2020, April 20). How the AI industry profits from catastrophe. Technology Review.
    5. Okolo, C. T. (2023). Addressing global inequity in AI development. In Handbook of Critical Studies of Artificial Intelligence (pp. 378-389). Edward Elgar Publishing.
    6. Posada, J. (2022). The coloniality of data work: Power and inequality in outsourced data production for machine learning (Doctoral dissertation, University of Toronto (Canada)).
    7. Okolo, C. T. (2023). Addressing global inequity in AI development. In the Handbook of Critical Studies of Artificial Intelligence (pp. 378-389). Edward Elgar Publishing.
    8. Royer, A. (2021). The urgent need for regulating global ghost work.
    9. Perrigo, B. (2023). Exclusive: OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. Time Magazine, 18, 2023.
    10. Hao, K. &听 Hern谩ndez, P. A. (2020, April 20). How the AI industry profits from catastrophe. Technology Review.
    11. Ngene, G. (2022, April 5). What does the advent of cryptocurrency mean for the thousands of Kenyans completing digital work? Job Tech Alliance. https://jobtechalliance.com/what-does-the-advent-of-cryptocurrency-mean-for-the-thousands-of-kenyans-completing-digital-work/
    12. Ch谩vez, A. (2024). The Impact of Gift Card Payments on MTurk Workers. Edited by M. Miceli, A. Dinika, K. Kauffman, C. Salim Wagner, & L. Sachenbacher. Creative Commons BY 4.0.
    13. Hastings-Spaine, N. (2021, February 5). Life as a Nigerian Developer: Overcoming Stereotypes. Built in Africa. https://www.builtinafrica.io/blog-post/overcoming-stereotypes-life-as-a-nigerian-developer
    14. Job Tech Alliance. (2023 July). Platforms for Digitally-Delivered Work in Sub-Saharan Africa: A Landscape Scan II. Job Tech Alliance. https://jobtechalliance.com/wp-content/uploads/2023/07/ii-Jobtech-Alliance_-A-Scan-of-Platforms-for-Digitally-Delivered-Work-in-Sub-Saharan-Africa-19072023.pdf
    15. Smart, A., Wang, D., Monk, E., D铆az, M., Kasirzadeh, A., Van Liemt, E., & Schmer-Galunder, S. (2024). Discipline and Label: A WEIRD Genealogy and Social Theory of Data Annotation. arXiv preprint arXiv:2402.06811.
    16. D铆az, M., Kivlichan, I., Rosen, R., Baker, D., Amironesei, R., Prabhakaran, V., & Denton, E. (2022, June). Crowdworksheets: Accounting for individual and collective identities underlying crowdsourced dataset annotation. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 2342-2351).
    17. Gebrekidan, B. F. (2024). Content Moderation: The harrowing, traumatizing job that left many African data workers with mental health issues and drug dependency. Creative Commons BY 4.0.
    18. Joshi, A., Bhattacharyya, P., Carman, M., Saraswati, J., & Shukla, R. (2016, August). How do cultural differences impact the quality of sarcasm annotation?: A case study of Indian annotators and American text. In Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (pp. 95-99).
    19. Mesko, B. (2024, March 19). Data Annotators: The Unsung Heroes of Artificial Intelligence Development. The Medical Futurist. https://medicalfuturist.com/data-annotation/
    20. Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., & Aroyo, L. M. (2021, May). 鈥淓veryone wants to do the model work, not the data work鈥: Data Cascades in High-Stakes AI. In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-15).
    21. Gray, M. L., & Suri, S. (2019). Ghost work: How to stop Silicon Valley from building a new global underclass. Eamon Dolan Books.
    22. Karya. (2022, November 22). The Future is Data (Cooperatives) [Post]. LinkedIn. https://www.linkedin.com/pulse/future-data-cooperatives-karya-inc/