Rank4

Idea#50

This idea is active.
Agency Next Steps »

Reporting problems with data

The system should have a way to add "notes" about the data. This could include specific things that should be watched for (for example, the assumptions that the data was collected under, or the methology used to collect it)...it's not much use to use the dataset "tax data for 1999" in a mission critical application if the methology was to "ask 10 people what the tax rate was in 1999"

Likewise, there should be ways for the public to add notes about possible problems they have found with the data and a feedback mechanism to ensure that others known about these possible problems...for example "21 Main St." no longer exists and should be removed from the "current address" dataset.

Comment

Submitted by george_vic_bell 4 years ago

Comments (8)

  1. I like the idea of being able to capture quality issues with the data, however in some cases the Federal agencies are just aggregators of data that is collected and contributed by state or other partners. As such, an error might be fixed at the federal level, only to be overwritten by the bad value coming from its partner agency at the next refresh cycle. I think that if datasets and data services can be provided which provide high value, and we can then get those state partners bought in and also using that aggregated data in their own business processes, that will help improve quality.

    To get back to the core point made on reporting issues with the data, potentially one could mine the metadata records and extract the data steward point of contact and use that POC email as a means to communicate data issues back to the steward. In the case of aggregated data sets, hopefully we can be mindful of capturing contributor POC information in the metadata lineage as well, to be able to share that information back in the chain to its' ultimate source.

    4 years ago
  2. George, I especially like your second paragraph. More momentum will be gained if community knowledge can be stored accessibly, so that each user doesn't have to reinvent the wheel. In fact, if you come up with ideas about how to store such notes to facilitate browsing them, I hope you'll post them here.

    Regarding your first paragraph, see the suggestion in the Strategic Intent category, "Post Links to All Agency ICRs Together." Assumptions and methodology are exactly the kind of information that it addresses. The information exists now and need only be put in proximity to the data sets so that it can be referenced. Again, it's about conserving energy for new exploration, rather everybody falling in the same hole. Thanks.

    4 years ago
  3. David, you make a good point. I suppose existing data forums also could be used, in the event that point of contact information is not available, but having the POC info would be ideal so quality issues get fixed at the source. In K-12 education performance data, the EdFacts group has worked ceaselessly and well with all the state education data people for a few years now, in order to uncover and solve quality issues at the school level. It's challenging at the scale we're talking about on some of the data sets, which is why I'm thinking professional groups of data users might be of some assistance.

    4 years ago
  4. So we need a scalable approach to harvesting the learnings of data users who come through data.gov.

    We all know there many many hurdles to "fixing" data sets (including resources, regulatory and procedural requirements for modifications to data, confusion about what is "right")etc, so a one-size-fits-all will not work.

    BUT what if there was a standard way for Agencies to mark up and expose the comments about a data set they are receiving, linked to that data resources (and its data.gov record). This would allow all of that to be searched and aggregated by anybody!

    Maybe this could be an elaboration of what is in the existing Agency Plan outline in the OGI

    4 years ago
  5. This sounds good to me.

    4 years ago
  6. There are instances where such things as server problems can cause data to be missing for a period of time and this can have the affect of creating misleading totals for a specified span of time. Consequently there needs to be a way of alerting users of existing data anomalies that can give a misleading impression AND alert users of a dataset when a new anomaly is discovered.

    4 years ago
  7. It would be helpful to know the circumstances and environment that the data was collected, however opening up discussion to the general public on "problems" found with the data might spawn inaccurate assumptions about the data itself. For raw data, knowing it's parameters should help those using it to decide if it is relevant to their needs.

    4 years ago
  8. Sunlight Lab's "National Data Catalog" has a Community Documentation page for each entry in their catalog. This could include known weaknesses, missing elements, gaps, or other artifacts of the data.

    Here is an example:

    Comm doc page: http://nationaldatacatalog.com/data/housing-code-enforcement/docs

    Data page: http://nationaldatacatalog.com/data/housing-code-enforcement

    And Socrata has an area for comments on every data set, for instance the White House Visitors Records: http://www.socrata.com/Government/White-House-Visitor-Records-Requests/644b-gaut

    - Jon Verville, NASA/GSFC

    4 years ago

Vote Activity Show

(latest 20 votes)