Definitions for every column

Idea#35

Stage: Active

Campaign: Solution Architecture

I just downloaded some energy data from data.gov on nuclear reactors and one of the columns is:

"NRC Unit"

I have no idea what that means? Every column or field of data should have a definition and that should be available on data.gov or in a standard format with the dataset. In this case, the data dictionary field that the catalog record links to does not have the definition of this field.

Tags

Submitted by

Feedback Score

73 votes

Idea Details

Vote Activity (latest 20 votes)

  1. Agreed
  2. Agreed
  3. Agreed
  4. Agreed
  5. Agreed
  6. Agreed
  7. Agreed
  8. Agreed
  9. Agreed
  10. Agreed
  11. Agreed
  12. Agreed
  13. Agreed
  14. Agreed
  15. Agreed
  16. Disagreed
  17. Agreed
  18. Agreed
  19. Agreed
  20. Agreed
(latest 20 votes)

Similar Ideas [ 4 ]

Comments

  1. Comment
    David Smith

    Great point - key to being able to make informed decisions and fully understand the data is an understanding of the constituent elements. Here, a data dictionary should be able to provide insights, such as units of measure, valid domain and other types of information.

  2. Comment
    kznewman

    Agree with the point but would request that the standard format be both human and machine readable. A software application might need to make use of "NRC Unit" and just a normal text description while good for people is largely opaque to software. Look at RDF for an example at http://www.w3.org/RDF/

  3. Comment
    michael.daconta ( Idea Submitter )

    I agree that RDF would be a good format for this but we would need to either create a clear, simple schema or determine if SKOS is simple enough for this purpose.

    The barrier to entry is that agencies are not really familiar with RDF and will be quite leery of it. However, they are getting very comfortable with XML ... so, we need to clearly show them that you can create an XML Schema that is also RDF compliant which I have done before. In fact, DoD DDMS schema is an example of this.

    Have you looked into W3C SKOS for this purpose?

    We should see if we can gen up a quick example...

  4. Comment
    dbaker

    I think this problem would be addressed with data standards (referenced in another idea posted here).

  5. Comment
    jim.hendler

    I wish I could vote for this one 10 or so times -- in our work on http://data-gov.tw.rpi.du (supporting RDF versions of data.gov datasets) we are constantly frustrated by our inability to decode entries such as zprkgplc1, cntedpre, othincm and so, so many others.

    We currently have all the properties in the datasets we have translated available on a semantic wiki if anyone feels like defining these, but obviously would be easier to do this when the data is produced than as a retrofit.

  6. Comment
    harman.john

    Agree with David Smith's and Jim Hindler's post - Each data set should have a link back to a data dictionary so that users know how to use the data set.

  7. Comment
    Louis Sweeny

    We'd all like a machine readable data dictionary, standards to do so abound, whats a smoothly scalable approach to growing a network of such data?? What is the absolute ground level...how about just two tags/fields that say "I am a data dictionary" and "I'm in this format" that way anybody could find and map/crawl that data dictionary at will

  8. Comment
    michael.daconta ( Idea Submitter )

    Some of the half-hearted efforts in this area are so very frustrating!

    I downloaded a dataset from Treasury whose data dictionary was a web page with the following:

    The following abbreviations are used in the XML feeds above:

    adjustmentamount - Adjustment Amount

    adjustmentcap - Adjustment Capital

    adjustmentdate - Adjustment Date

    adjustmentfootnote - Adjustment Foot Note

    adjustmentinvestamount - Adjustment Amount

    adjustmentreason - Adjustment Reason

    amount - Amount

    city - City

    date - Date

    description - Description

    exchangedate - Exchange Date

    exchangedescript - Exchange Description

    exchangefootnote - Exchange Foot Note

    exchangeinvestmentamount - Exchange Investment Amount

    exchangemech - Exchange Pricing Mechanism

    exchangetype - Exchange Type

    footnote - footnote

    institution - Institution

    location - Location

    mech - Pricing mech

    obligor - Obligor

    proceeds - Proceeds

    remainingamount - Remaning Amount

    remaininginvestdescript - Remaining Investment Description

    seller - Seller

    state - State

    transactiontype - Transaction Type

    type - Type

    HOW USELESS IS THAT! No definitions at all!

    Who, in their right mind, would assume that you can just shove out data with no explanation of what the data means!

    This is just plain unacceptable and those agency doing this kind of half-hearted nonsense should be ashamed of themselves.

  9. Comment
    owen.ambur

    W/re Mike's comments, AIIM's StratML Committee is trying to demonstrate good practice by including the definitions of elements in xsd:documentation within the StratML schema: http://xml.gov/stratml/references/StrategicPlan.xsd

    Doing so makes it easy for folks like Art Colman of Drybridge Techologies to automatically generate user-friendly data dictionaries like this: http://xml.gov/stratml/references/Documentation_StratMLV1R0_20090527a.doc

    In addition, we are maintaining a glossary in slighly modified SKOS format at http://xml.gov/stratml/draft/StratMLGlossary.xml using this schema (originally created by Ken Sall and Judy Newton) -- http://xml.gov/stratml/draft/StratMLGlossary.xsd -- and this stylesheet (created by Ken): http://xml.gov/stratml/draft/StratMLGlossary.xsl

  10. Comment
    charleshoffman

    One of the features of XBRL is the ability to add additional metadata to a schema. Labels, documentation, references to other documentation and other such features of XBRL allow for the users of metadata to append the metadata with additional metadata, expanding it to be helpful to users as is necessary.

    Another way to think of this is to have the metadata be available as a wiki, letting the user community maintain it as they see fit. Clearly the folks at Data.gov should also document the metadata they provide, but allowing users to play a role in this would be beneficial in my view.

  11. Comment
    Unsubscribed User

    The new open gov. directive requires agencies to post open government plans- one of the areas they have to write to is "transparency". This conversation to me has everything to do with transparecy- if the public can't understand the data- well that is certainly a barrier to transparency. I read Jim Hendler's post which said that he wishes he could vote 10 times for this suggestion. (BTW great job on the semantic work Jim). And I'm thinking that the Federal CIO Council could launch an open project with Data.gov to transform how the federal government does data dictionaries in a networked world. Agencies could put the commitment to work on data dictionaries in their web published open government plans.

  12. Comment
    michael.daconta ( Idea Submitter )

    A bit of a rant here ... just examined some of the new OGD high-value data sets that are currently being highlighted on data.gov and I am stunned to see that on the second dataset I observed (USDA MyPyramid Food Raw Data) it did not have a data dictionary!!

    Considering that this idea has been #1 for awhile now ... is anyone even reading these posts?

    When can the community expect some feedback on the implementation of the top ideas? Heck, at least the top 10???

  13. Comment
    colman

    Thanks for new and different input on a topic that I have been involved with for over a decade. As an independent software developer I am able to move quickly into an area where there is an identiied need. Owen Ambur was kind enough to mention, below, some work that I have contributed to the StratML initiative and I agree that there is appropriate relevance to this discussion.

    I agree with the comments about the difficulty in embracing RDF and have focused on facilitating the creation of user-friendly documentation derived from schema annotations. I also have found that people are becoming more comfortable with XML tools and techniques. Additionally, XML annotations can either be entered in a simple manner or in very detailed and highly formatted manner. Start simple and then improve.

    NIEM.gov suggests some straight-forward approaches to the placement of simple vs complex documentation which serve to insulate your typical documentation reader from formatting complexities imposed by some standards organizations. (xsd:appInfo for xml formatted documentation - leaving xsd:documentation for plain text)

    From a tool developer standpoint locating a standard for knowledge transfer that is not tightly coupled to a business standard has been a challenge. Thanks for pointing out SKOS as an approach.

  14. Comment
    michael.daconta ( Idea Submitter )

    I agree that SKOS would be a great standard (W3C) way to represent the data dictionaries.

    Maybe they can first be piloted on semantic.data.gov

  15. Comment
    owen.ambur

    Mike, w/re your rant above, I don't know who may or may not be reading these postings. However, I do know that little or nothing will come of them unless and until they become objectives in someone(s) performance plan(s) and others have the opportunity to track actual performance against those objectives. It will be interesting to see whether and, if so, how this objective is incorporated into the Open Gov "Dashboard". http://xml.gov/stratml/carmel/OGDwStyle.xml#_7d88136e-142e-11df-a454-3f207a64ea2a

  16. Comment
    michael.daconta ( Idea Submitter )

    Kudos to HUD! Here is an excellent example of both a good dataset (with one minor issue regarding quoting every field in a CSV file ... see my idea on data standards for CSV files) and an excellent and clear data dictionary!!

    See: http://www.data.gov/details/1259

    Great job HUD!!! Maybe data.gov should have a "best practices" section where we spread the word on our star performers.

    - Mike

  17. Comment
    ncbuyer

    Its called a data repository. I agree. +1

Add your comment