In browsing the current Data.gov holdings, one sees data which has in some instances been chunked (for example large datasets which would contain too many records as CSV) - perhaps these may be broken out by state, et cetera. Similarly, there are cases where there are datasets which are part of a time series (e.g. 2005->2006->2007 annual data releases) - it would be useful toward usability to have ways to relate these and treat them as groups, e.g. treating an entire collection of individual state files as a unit, or being able to navigate (for example, if I am looking at the 2005 Tennessee dataset, I might want to be able to quickly jump to the 2006 Tennessee dataset).


Submitted by David Smith 4 years ago

  1. Excellent! And the broader question of robust relationships between datasets is a major requirement for semantic.data.gov.

    4 years ago
  2. Consistent standards/naming (RESTful URIs for asset locations) could be a useful way to organize such related datasets. If data hosting were provided by Data.gov, then the system could also support some kind of naming convention, whereby each dataset had a unique identifier. Should such naming standards/conventions be organized by each Agency individually or by Data.gov?

    4 years ago
  3. I also agree that this is excellent. Related datasets is key. I likewise agree with the comment about RESTful URIs. This is a lot like how the Semantic Web people think "everything should be a URI" to uniquely identify the resource and their notion of "linked data" which enables the ability to create mashups (i.e. related datasets).

    4 years ago
  4. Another way of relating the data sets is by having a place for narrative which qualitatively describes the relationship between datasets, such as shown below. But of course the semantic is very important too, but sometimes you have to give the soft relationship as well that is hard/impossible to give in formal data relationships.

    Sunlight Lab's "National Data Catalog" has a Community Documentation page for each entry in their catalog. This could include a narrative about the data set and its relationships to other data sets.

    Here is an example:

    Comm doc page: http://nationaldatacatalog.com/data/housing-code-enforcement/docs

    Data page: http://nationaldatacatalog.com/data/housing-code-enforcement

    - Jon Verville, NASA/GSFC

    3 years ago