Agency Next Steps

Reporting problems with data

The system should have a way to add "notes" about the data. This could include specific things that should be watched for (for example, the assumptions that the data was collected under, or the methology used to collect it)'s not much use to use the dataset "tax data for 1999" in a mission critical application if the methology was to "ask 10 people what the tax rate was in 1999"

Likewise, there should be ways for the public to add notes about possible problems they have found with the data and a feedback mechanism to ensure that others known about these possible problems...for example "21 Main St." no longer exists and should be removed from the "current address" dataset.


Submitted by

Stage: Active

Feedback Score

46 votes

Idea Details

Vote Activity (latest 20 votes)

  1. Upvoted
  2. Upvoted
  3. Upvoted
  4. Upvoted
  5. Upvoted
  6. Upvoted
  7. Upvoted
  8. Upvoted
  9. Upvoted
  10. Upvoted
  11. Upvoted
  12. Upvoted
  13. Upvoted
  14. Upvoted
  15. Upvoted
  16. Upvoted
  17. Upvoted
  18. Upvoted
  19. Upvoted
  20. Upvoted
(latest 20 votes)

Similar Ideas [ 4 ]


  1. Comment
    David Smith

    I like the idea of being able to capture quality issues with the data, however in some cases the Federal agencies are just aggregators of data that is collected and contributed by state or other partners. As such, an error might be fixed at the federal level, only to be overwritten by the bad value coming from its partner agency at the next refresh cycle. I think that if datasets and data services can be provided which provide high value, and we can then get those state partners bought in and also using that aggregated data in their own business processes, that will help improve quality.

    To get back to the core point made on reporting issues with the data, potentially one could mine the metadata records and extract the data steward point of contact and use that POC email as a means to communicate data issues back to the steward. In the case of aggregated data sets, hopefully we can be mindful of capturing contributor POC information in the metadata lineage as well, to be able to share that information back in the chain to its' ultimate source.

  2. Comment
    Kitty Wooley

    George, I especially like your second paragraph. More momentum will be gained if community knowledge can be stored accessibly, so that each user doesn't have to reinvent the wheel. In fact, if you come up with ideas about how to store such notes to facilitate browsing them, I hope you'll post them here.

    Regarding your first paragraph, see the suggestion in the Strategic Intent category, "Post Links to All Agency ICRs Together." Assumptions and methodology are exactly the kind of information that it addresses. The information exists now and need only be put in proximity to the data sets so that it can be referenced. Again, it's about conserving energy for new exploration, rather everybody falling in the same hole. Thanks.

  3. Comment
    Kitty Wooley

    David, you make a good point. I suppose existing data forums also could be used, in the event that point of contact information is not available, but having the POC info would be ideal so quality issues get fixed at the source. In K-12 education performance data, the EdFacts group has worked ceaselessly and well with all the state education data people for a few years now, in order to uncover and solve quality issues at the school level. It's challenging at the scale we're talking about on some of the data sets, which is why I'm thinking professional groups of data users might be of some assistance.

  4. Comment
    Louis Sweeny

    So we need a scalable approach to harvesting the learnings of data users who come through

    We all know there many many hurdles to "fixing" data sets (including resources, regulatory and procedural requirements for modifications to data, confusion about what is "right")etc, so a one-size-fits-all will not work.

    BUT what if there was a standard way for Agencies to mark up and expose the comments about a data set they are receiving, linked to that data resources (and its record). This would allow all of that to be searched and aggregated by anybody!

    Maybe this could be an elaboration of what is in the existing Agency Plan outline in the OGI

  5. Comment

    There are instances where such things as server problems can cause data to be missing for a period of time and this can have the affect of creating misleading totals for a specified span of time. Consequently there needs to be a way of alerting users of existing data anomalies that can give a misleading impression AND alert users of a dataset when a new anomaly is discovered.

  6. Comment

    It would be helpful to know the circumstances and environment that the data was collected, however opening up discussion to the general public on "problems" found with the data might spawn inaccurate assumptions about the data itself. For raw data, knowing it's parameters should help those using it to decide if it is relevant to their needs.

  7. Comment
    Jon Verville

    Sunlight Lab's "National Data Catalog" has a Community Documentation page for each entry in their catalog. This could include known weaknesses, missing elements, gaps, or other artifacts of the data.

    Here is an example:

    Comm doc page:

    Data page:

    And Socrata has an area for comments on every data set, for instance the White House Visitors Records:

    - Jon Verville, NASA/GSFC

Add your comment

Your comment will be published after it's approved by the moderators.