General

Relate Datasets

In browsing the current Data.gov holdings, one sees data which has in some instances been chunked (for example large datasets which would contain too many records as CSV) - perhaps these may be broken out by state, et cetera. Similarly, there are cases where there are datasets which are part of a time series (e.g. 2005->2006->2007 annual data releases) - it would be useful toward usability to have ways to relate these and treat them as groups, e.g. treating an entire collection of individual state files as a unit, or being able to navigate (for example, if I am looking at the 2005 Tennessee dataset, I might want to be able to quickly jump to the 2006 Tennessee dataset).

Tags

Submitted by

Stage: Active

Feedback Score

27 votes
Idea#37

Idea Details

Vote Activity (latest 20 votes)

  1. Downvoted
  2. Upvoted
  3. Upvoted
  4. Upvoted
  5. Upvoted
  6. Upvoted
  7. Upvoted
  8. Upvoted
  9. Upvoted
  10. Upvoted
  11. Upvoted
  12. Upvoted
  13. Upvoted
  14. Upvoted
  15. Downvoted
  16. Upvoted
  17. Upvoted
  18. Upvoted
  19. Upvoted
  20. Upvoted
(latest 20 votes)

Similar Ideas [ 4 ]

Comments

  1. Comment
    michael.daconta

    Excellent! And the broader question of robust relationships between datasets is a major requirement for semantic.data.gov.

  2. Comment
    thynge.megan

    Consistent standards/naming (RESTful URIs for asset locations) could be a useful way to organize such related datasets. If data hosting were provided by Data.gov, then the system could also support some kind of naming convention, whereby each dataset had a unique identifier. Should such naming standards/conventions be organized by each Agency individually or by Data.gov?

  3. Comment
    charleshoffman

    I also agree that this is excellent. Related datasets is key. I likewise agree with the comment about RESTful URIs. This is a lot like how the Semantic Web people think "everything should be a URI" to uniquely identify the resource and their notion of "linked data" which enables the ability to create mashups (i.e. related datasets).

  4. Comment
    Jon Verville

    Another way of relating the data sets is by having a place for narrative which qualitatively describes the relationship between datasets, such as shown below. But of course the semantic is very important too, but sometimes you have to give the soft relationship as well that is hard/impossible to give in formal data relationships.

    Sunlight Lab's "National Data Catalog" has a Community Documentation page for each entry in their catalog. This could include a narrative about the data set and its relationships to other data sets.

    Here is an example:

    Comm doc page: http://nationaldatacatalog.com/data/housing-code-enforcement/docs

    Data page: http://nationaldatacatalog.com/data/housing-code-enforcement

    - Jon Verville, NASA/GSFC

Add your comment