Data Quality - Need process for assuring 'good data' on Data.gov

Idea#97

Stage: Active

Campaign: Strategic Intent

On Page 9 of the CONOP, the example of Forbes' use of Federal data to develop the list of "America's Safest Cities" brings to light a significant risk associated with providing 'raw data' for public consumption. As you are aware, much of the crime data used for that survey is drawn from the Uniformed Crime Reporting effort of the FBI. As self-reported on the "Crime in the United States" website, "Figures used in this Report are submitted voluntarily by law enforcement agencies throughout the country. Individuals using these tabulations are cautioned against drawing conclusions by making direct comparisons between cities. Comparisons lead to simplistic and/or incomplete analyses that often create misleading perceptions adversely affecting communities and their residents."

Because Data.gov seeks to make raw data available to a broad set of potential users, how will Data.gov address the issue of data quality within the feeds provided through Data.gov? Currently, federal agency Annual Performance Reports required under the Government Performance and Results Act (GPRA) of 1993 require some assurance of data accuracy of the data reported; will there be a similar process for federal agency data made accessible through Data.gov? If not, what measures wll be put in-place to ensure that conclusions drawn from the Data.gov data sources reflect the risks associated with 'raw' data? And, how will we know that the data made available through Data.gov is accurate and up-to-date?

Tags

Submitted by

Feedback Score

23 votes

Idea Details

Vote Activity (latest 20 votes)

  1. Agreed
  2. Agreed
  3. Agreed
  4. Agreed
  5. Agreed
  6. Agreed
  7. Agreed
  8. Agreed
  9. Agreed
  10. Disagreed
  11. Agreed
  12. Agreed
  13. Agreed
  14. Agreed
  15. Agreed
  16. Agreed
  17. Agreed
  18. Agreed
  19. Agreed
  20. Agreed
(latest 20 votes)

Similar Ideas [ 4 ]

Comments

  1. Comment
    ellena.schoop

    In response to Chuck about data quality - there is, in the meta-data (data about data) that indicates quality level - because to your point, people want to know if it's good data or bad data. I also think there is a set of criteria that speaks to 'quality' so hopefully each dataset is using the same measuring stick. However, if this metadata is not complete then we're back to your point. I'm wondering if it may make sense to make the 'required' if its not already.

  2. Comment
    Chuck Georgo ( Idea Submitter )

    Ellena, I see the potential for at least three types of potential data errors:

    - errors of commission - where a lack of rigor in the collection process may have allowed skew or bias. In the crime example; agencies may have different ways of classifying certain crimes.

    - errors of omission - where the data set is incomplete or doesn't sufficiently represent the metric measured. In the crime example; agencies may not report some (or any) data for some crimes.

    - errors of analysis - where the federal agencies release statistics based on one of the other two types of errors.

    The best some agencies may consider is to include a 'confidence' data element to provide consumers with some idea for how comfortable the federal agencies are with respect to the quality of the data they are sharing.

    r/Chuck

  3. Comment
    charleshoffman

    A method of expressing business rules should be available for whatever format which is used to express data. The business rules format should also be a global standard, just like whatever format for the information Data.gov decides to use.

  4. Comment
    Chuck Georgo ( Idea Submitter )

    NCJA just published this article on crime data qulaity...in the spiritof openness I thought i'd share ;-) ...

    Deficiencies In Old Crime Data-Collecting Methods Still Seen Today

    Problems that existed in crime reports dating back to more than 40 years ago still exist today, according to an article in the Wall Street Journal.

    According to the Journal President Lyndon B. Johnson’s Commission on Law Enforcement and Administration of Justice published a report in 1967 that claimed 52 percent of American men would be arrested in their lifetime. Flaws that were admitted in that report are still present in many of today’s reports.

    To continue reading --> http://ncja.informz.net/admin31/content/template.asp?sid=18356&brandid=3027&uid=755476241&mi=700832&ptid=55

  5. Comment
    east440

    I think the #1 priority in regard to quality is contextualizing the data presented here. That means placing links on the data set's page to the webpages of the originating studies and reporting, placing links to federal pages discussing/introducing the data, and placing links to retrospective pages which analyze the data set's limitations in a comparison of available data sets on the subject.

Add your comment