Show Menu

Community Principles on Ethical Data Practices Cheat Sheet (DRAFT) by [deleted]

Community Principles on Ethical Data Practices

This is a draft cheat sheet. It is a work in progress and is not finished yet.


The Community Principles on Ethical Data Practices are being developed by people from the data science community in conjun­ction with data science organi­zat­ions. These principles focus on defining ethical and respon­sible behaviors for sourcing, sharing and implem­enting data in a manner that will cause no harm and maximize positive impact. The goal of this initiative is to develop a commun­ity­-driven code of ethics for data collec­tion, sharing and utiliz­ation that provides people in the data science community a standard set of easily digest­ible, recogn­izable principles for guiding their behaviors.


Fairness: Unders­tand, mitigate and commun­icate the presence of bias in both data practice and consum­ption.
Benefit: Set people before data and be respon­sible for maximizing social benefit and minimizing harm.
Openness: Practice humility and openness. Transp­arent practices, community engage­ment, and respon­sible commun­ica­tions are an integral part of data ethics.
Reliab­ility: Ensure that every effort is made to glean a complete unders­tanding of what is contained within data, where it came from, and how it was created. Extend this effort for future users of all data and derivative data.


1. Consider (if not collect) informed and purposeful consent of data subjects for all projects, and discard resulting data when that consent expires.
2. Make best effort to guarantee the security of data, subjects, and algorithms to prevent unauth­orized access, policy violat­ions, tampering, or other harm or actions outside the data subjects’ consent.
3. Make best effort to protect anonymous data subjects, and any associated data, against any attempts to revers­e-e­ngi­neer, de-ano­nymize, or otherwise expose confid­ential inform­ation.
 ­ ­ ­  This includes all interm­ediate results, working with indivi­duals or companies to help them maintain the anonymity of all data and parties involved, and supporting the rights to explan­ation, recourse, and rectif­ication for any data subjects impacted by data work.
4. Practice respon­sible transp­arency as the default where possible, throughout the entire data lifecycle.
 ­ ­ ­  This includes providing enough context and docume­ntation to enable other trained practi­tioners to understand and evaluate the use of data.

Social Respon­sib­ility

Principles Continued

5. Foster diversity by making efforts to ensure inclusion of partic­ipants, repres­ent­ation of viewpoints and commun­ities, and openness. The data community should be open to, welcoming of, and inclusive of people from diverse backgr­ounds.
 ­ ­ ­  This can be achieved by: being conscious of, and owning the results of actions, regardless of intent; promoting the voices of margin­alized groups; acknow­ledging and self-c­hecking privilege; accepting checks of privilege by others in good faith, and using privilege to advocate for equity.
 ­ ­ ­  The data community will not remain silent when witnessing others behaving in a manner that is not access­ible, open, welcoming and inclusive.
6. Acknow­ledge and mitigate unfair bias throughout all aspects of data work.
 ­ ­ ­  This includes but is not limited to providing details and method­ologies around data collec­tion, processing and storage, and actively working to identify and disclose bias in algori­thms, training data, and test data.
7. Hold up datasets with clearly establ­ished provenance as the expected norm, rather than the exception.
 ­ ­ ­  As a data collector, be respon­sible for recording proven­ance; as a data publisher, be respon­sible for propag­ating proven­ance; as a data scientist, be respon­sible for reviewing, consid­ering, and declaring what is known about data proven­ance.
 ­ ­ ­  Provenance is a living part of data work and can evolve with the project and all reasonable efforts should be made to understand and pass on provenance work.
8. Respect relevant tensions of all stakeh­olders as it relates to privacy and data ownership.
9. Take great care to commun­icate respon­sibly and access­ibly.
 ­ ­ ­  This includes: acknow­ledging and disclosing caveats and limita­tions to the process and outputs; consid­ering and providing clear opport­unities for feedback from all stakeh­olders; consid­ering and discussing whether something should be done (not just if it can be done); and clearly commun­icating who may be impacted, and how they are impacted, in order to minimize any potential harm from data work.
10. Ensure that all data practi­tioners take respon­sib­ility for exercising ethical imagin­ation in their work, including consid­ering the implic­ation of what came before and what may come after, and actively working to increase benefit and prevent harm to others.