Rank 21

Idea#97

This idea is active.
Strategic Intent »

Data Quality - Need process for assuring 'good data' on Data.gov

On Page 9 of the CONOP, the example of Forbes' use of Federal data to develop the list of "America's Safest Cities" brings to light a significant risk associated with providing 'raw data' for public consumption. As you are aware, much of the crime data used for that survey is drawn from the Uniformed Crime Reporting effort of the FBI. As self-reported on the "Crime in the United States" website, "Figures used in this Report are submitted voluntarily by law enforcement agencies throughout the country. Individuals using these tabulations are cautioned against drawing conclusions by making direct comparisons between cities. Comparisons lead to simplistic and/or incomplete analyses that often create misleading perceptions adversely affecting communities and their residents."

Because Data.gov seeks to make raw data available to a broad set of potential users, how will Data.gov address the issue of data quality within the feeds provided through Data.gov? Currently, federal agency Annual Performance Reports required under the Government Performance and Results Act (GPRA) of 1993 require some assurance of data accuracy of the data reported; will there be a similar process for federal agency data made accessible through Data.gov? If not, what measures wll be put in-place to ensure that conclusions drawn from the Data.gov data sources reflect the risks associated with 'raw' data? And, how will we know that the data made available through Data.gov is accurate and up-to-date?

Comment

Submitted by 4 years ago

Comments (5)

  1. In response to Chuck about data quality - there is, in the meta-data (data about data) that indicates quality level - because to your point, people want to know if it's good data or bad data. I also think there is a set of criteria that speaks to 'quality' so hopefully each dataset is using the same measuring stick. However, if this metadata is not complete then we're back to your point. I'm wondering if it may make sense to make the 'required' if its not already.

    4 years ago
  2. Chuck Georgo Idea Submitter

    Ellena, I see the potential for at least three types of potential data errors:

    - errors of commission - where a lack of rigor in the collection process may have allowed skew or bias. In the crime example; agencies may have different ways of classifying certain crimes.

    - errors of omission - where the data set is incomplete or doesn't sufficiently represent the metric measured. In the crime example; agencies may not report some (or any) data for some crimes.

    - errors of analysis - where the federal agencies release statistics based on one of the other two types of errors.

    The best some agencies may consider is to include a 'confidence' data element to provide consumers with some idea for how comfortable the federal agencies are with respect to the quality of the data they are sharing.

    r/Chuck

    4 years ago
  3. A method of expressing business rules should be available for whatever format which is used to express data. The business rules format should also be a global standard, just like whatever format for the information Data.gov decides to use.

    4 years ago
  4. Chuck Georgo Idea Submitter

    NCJA just published this article on crime data qulaity...in the spiritof openness I thought i'd share ;-) ...

    Deficiencies In Old Crime Data-Collecting Methods Still Seen Today

    Problems that existed in crime reports dating back to more than 40 years ago still exist today, according to an article in the Wall Street Journal.

    According to the Journal President Lyndon B. Johnson’s Commission on Law Enforcement and Administration of Justice published a report in 1967 that claimed 52 percent of American men would be arrested in their lifetime. Flaws that were admitted in that report are still present in many of today’s reports.

    To continue reading --> http://ncja.informz.net/admin31/content/template.asp?sid=18356&brandid=3027&uid=755476241&mi=700832&ptid=55

    4 years ago
  5. I think the #1 priority in regard to quality is contextualizing the data presented here. That means placing links on the data set's page to the webpages of the originating studies and reporting, placing links to federal pages discussing/introducing the data, and placing links to retrospective pages which analyze the data set's limitations in a comparison of available data sets on the subject.

    4 years ago

Vote Activity Show

(latest 20 votes)