Light at The End of The Data Bias Tunnel

March 20, 2022

Particularly in the case of Machine Learning and Artificial Intelligence (AI), the root cause of bias in the algorithms generated by them is in the quality of data that is sourced to the machine to learn from. Low quality or unaudited datasets can easily solidify or even exponentially increase human bias in the new logics created by Machine Learning and AI.

Research suggests Copyright Law could be used to improve the type of data sourced to AI to learn from and improve its learning process. Lewvendowski suggests that the Fair Use Doctrine in copyright law could be used to allow developers to use data, otherwise unavailable due to copyright laws, to supply their software with potentially less biased datasets (2018). A good example could be the use of Fair Use by automakers to share collected datasets on driving and pedestrian patterns. The dangers of using biased or low-quality datasets to teach AI how to drive cars are greater than the individual benefit of companies that otherwise only have a partial dataset. Sharing this information would allow for all individual carmakers to benefit as an industry. Likewise, other industries exploring the use of AI could benefit as well.

XKCD cartoon makes fun of how a scientist is not happy with the results of its algorithm based on flawed data, so they end up training the AI to generate "better" data

Figure 1. Flawed Data. (Source: xkcd.com, n.d.)

In the medical industry, being able to share datasets used for teaching AI could save lives as well. AI's ability to better screen patients’ records such as CT scans and other imaging would benefit everyone without having to disclose individual algorithms or computer code that individual developers will want to keep protected by copyright. In essence, data sharing protected by Fair Use is a good alternative to reduce bias overall in all industries. We already share information this way, including in the software development industry. Reusing code or being “inspired” by someone else’s code posted online is very common. Many of the software tools that allow computers to work come from the collaboration and sharing of information in the computer science industry.

Reference:

Lewvendowski, A. (2018). How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem. Washington Law Review, 93(579), 580–630. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3024938

Flawed Data. (n.d.). [Cartoon]. Xkcd. https://xkcd.com/2494/

Bias in Technology

Light at The End of The Data Bias Tunnel

Comments

Post a Comment

Popular posts from this blog

Algorithmic Bias: Is Perfectly Imperfect Good Enough?

Algorithmic Bias and Filter Bubbles

An Insightful and Inspiring Tech Talk by Cathy O'Neil on Algorithmic Bias