The most important guiding principle is transparency. For a long time, we have been working with open source, open documentation, and an open backlog. Our core belief is that users (and the public) should have visibility and insight into our activities, the ability to reuse our code, and the opportunity to contribute to future developments. While this provides a strong foundation, it falls short from a data ethics perspective, as the source code and documentation are, in practice, only accessible to other developers. Research has shown that incorporating a wider range of perspectives in the development process leads to better products and reduces the risk of discriminatory effects. A problem that has been highlighted several times in discussions about AI development is that homogeneous development environments result in products with algorithmic bias. (1) But the lack of diversity has consequences on multiple levels, as AI consultant and designer Pete Trainor also points out in an article on Medium.com:
"The lack of diversity stifles innovation, perpetuates biases, and contributes to the development of technologies that reinforce societal inequalities." (2)
To mitigate these potential risks, Jobtech strives to follow the Open Data Institute's recommendations for integrating data ethics throughout the entire development process, from idea to development, release, and management. (3) To promote inclusion in our product development – ensuring that those not at the table have a voice – we openly publish our ethical evaluations in Jobtech's Community Forum and invite the public to provide feedback and share their perspectives and views. Using an agile approach, the development team conducts an initial ethical evaluation at the project's conceptual stage. This evaluation is then published in the forum, inviting feedback and discussion from other employees and teams. This allows the development team to incorporate valuable feedback into their ongoing work. In the final step, the public is also invited to participate in the same discussion, and this process continues throughout the various phases of the development project. It is important that the content of the ethical evaluation is presented in non-technical language, unlike the technical documentation. The questions we ask are designed to identify ethical risks at both the individual and societal levels. The goal is to reflect on what is right and wrong, good and bad, with the aim of safeguarding the human beings who are impacted by technology. What questions are asked during an ethical evaluation? Two examples include: “What potential risks could the service/product pose at the individual and societal levels?” and “What adjustments, if any, are necessary to mitigate the identified risks? Alternatively, why might no adjustments be needed?” The essence of this process lies in addressing the ethical risks that have been identified.
As a government agency, Arbetsförmedlingen already complies with the requirements for openness and transparency under the Principle of Public Access. However, with this initiative, we aim to raise transparency to a new level, where “open by default” should also mean fostering understanding and co-creation for a broader audience beyond the developer community. Our target audience is the citizens, both as users and stakeholders. Therefore, another crucial aspect of data ethics is ensuring that the information about open datasets and digital services is as detailed as possible. Researchers Catherine D’Ignazio and Lauren Klein highlight the importance of context for a dataset’s usability and risk minimization.
“Until we invest as much in providing (and maintaining) context as we do in publishing data, we will end up with public information resources that are subpar at best and dangerous at worst.” (4)
The researchers argue that data quality is essentially the same as metadata quality. They warn that the one-sided prioritization of “opening up” data, which has accompanied and driven the open data and Open Government movement, is risky (5). The result of this deprioritization has led to what is known as zombie data—data released with minimal context and an unclear purpose. (6) They assert that the notion of raw data is a misconception because data is already shaped by “a complex set of social, political, and historical circumstances.” (7)
In line with these insights, we aim to ensure that the data we publish is as content-rich as possible. Beyond the metadata requirements set by the DCAT-AP-SE metadata standard, there is a need for clear information regarding the provenance, purpose, limitations, collection methods, and quality assurance of the datasets. The new AI regulation will also impose similar requirements for high-quality datasets used in AI systems within high-risk areas such as recruitment. Additionally, transparency is required, including detailed documentation of the system, a framework for risk assessment and mitigation, and clear information for users about the system's purpose and limitations.
Developing these contextual parameters and the work on ethics in general is not something that happens overnight; it requires dedicated effort and time. However, the results will be worthwhile, yielding reliable, high-quality datasets and products that can drive sustainable innovation and growth.
Reference List: