Rank1

Idea#35

This idea is active.
Solution Architecture »

Definitions for every column

I just downloaded some energy data from data.gov on nuclear reactors and one of the columns is:

"NRC Unit"

I have no idea what that means? Every column or field of data should have a definition and that should be available on data.gov or in a standard format with the dataset. In this case, the data dictionary field that the catalog record links to does not have the definition of this field.

Comment

Submitted by michael.daconta 4 years ago

Vote Activity Show

(latest 20 votes)

Comments (19)

  1. Great point - key to being able to make informed decisions and fully understand the data is an understanding of the constituent elements. Here, a data dictionary should be able to provide insights, such as units of measure, valid domain and other types of information.

    4 years ago
  2. Agree with the point but would request that the standard format be both human and machine readable. A software application might need to make use of "NRC Unit" and just a normal text description while good for people is largely opaque to software. Look at RDF for an example at http://www.w3.org/RDF/

    4 years ago
  3. michael.daconta Idea Submitter

    I agree that RDF would be a good format for this but we would need to either create a clear, simple schema or determine if SKOS is simple enough for this purpose.

    The barrier to entry is that agencies are not really familiar with RDF and will be quite leery of it. However, they are getting very comfortable with XML ... so, we need to clearly show them that you can create an XML Schema that is also RDF compliant which I have done before. In fact, DoD DDMS schema is an example of this.

    Have you looked into W3C SKOS for this purpose?

    We should see if we can gen up a quick example...

    4 years ago
  4. I think this problem would be addressed with data standards (referenced in another idea posted here).

    4 years ago
  5. I wish I could vote for this one 10 or so times -- in our work on http://data-gov.tw.rpi.du (supporting RDF versions of data.gov datasets) we are constantly frustrated by our inability to decode entries such as zprkgplc1, cntedpre, othincm and so, so many others.

    We currently have all the properties in the datasets we have translated available on a semantic wiki if anyone feels like defining these, but obviously would be easier to do this when the data is produced than as a retrofit.

    4 years ago
  6. Agree with David Smith's and Jim Hindler's post - Each data set should have a link back to a data dictionary so that users know how to use the data set.

    4 years ago
  7. We'd all like a machine readable data dictionary, standards to do so abound, whats a smoothly scalable approach to growing a network of such data?? What is the absolute ground level...how about just two tags/fields that say "I am a data dictionary" and "I'm in this format" that way anybody could find and map/crawl that data dictionary at will

    4 years ago
  8. Jim: zprkgplc1 is the guess at how many parking spaces a home has!

    out of curiosity I googled zprkgplc1, to my surprise I found, your work and then http://www.eia.doe.gov/emeu/recs/recspubuse05/layoutfiles/RECS05layoutAllData.csv which gives a hint.

    this is wildly impractical as an approach, but it does show the strange power of text searching, and affirms my view that ANY link to ANY kind of data dictionary is a great place to start.

    4 years ago
  9. michael.daconta Idea Submitter

    Some of the half-hearted efforts in this area are so very frustrating!

    I downloaded a dataset from Treasury whose data dictionary was a web page with the following:

    The following abbreviations are used in the XML feeds above:

    adjustmentamount - Adjustment Amount

    adjustmentcap - Adjustment Capital

    adjustmentdate - Adjustment Date

    adjustmentfootnote - Adjustment Foot Note

    adjustmentinvestamount - Adjustment Amount

    adjustmentreason - Adjustment Reason

    amount - Amount

    city - City

    date - Date

    description - Description

    exchangedate - Exchange Date

    exchangedescript - Exchange Description

    exchangefootnote - Exchange Foot Note

    exchangeinvestmentamount - Exchange Investment Amount

    exchangemech - Exchange Pricing Mechanism

    exchangetype - Exchange Type

    footnote - footnote

    institution - Institution

    location - Location

    mech - Pricing mech

    obligor - Obligor

    proceeds - Proceeds

    remainingamount - Remaning Amount

    remaininginvestdescript - Remaining Investment Description

    seller - Seller

    state - State

    transactiontype - Transaction Type

    type - Type

    HOW USELESS IS THAT! No definitions at all!

    Who, in their right mind, would assume that you can just shove out data with no explanation of what the data means!

    This is just plain unacceptable and those agency doing this kind of half-hearted nonsense should be ashamed of themselves.

    4 years ago
  10. W/re Mike's comments, AIIM's StratML Committee is trying to demonstrate good practice by including the definitions of elements in xsd:documentation within the StratML schema: http://xml.gov/stratml/references/StrategicPlan.xsd

    Doing so makes it easy for folks like Art Colman of Drybridge Techologies to automatically generate user-friendly data dictionaries like this: http://xml.gov/stratml/references/Documentation_StratMLV1R0_20090527a.doc

    In addition, we are maintaining a glossary in slighly modified SKOS format at http://xml.gov/stratml/draft/StratMLGlossary.xml using this schema (originally created by Ken Sall and Judy Newton) -- http://xml.gov/stratml/draft/StratMLGlossary.xsd -- and this stylesheet (created by Ken): http://xml.gov/stratml/draft/StratMLGlossary.xsl

    4 years ago
  11. One of the features of XBRL is the ability to add additional metadata to a schema. Labels, documentation, references to other documentation and other such features of XBRL allow for the users of metadata to append the metadata with additional metadata, expanding it to be helpful to users as is necessary.

    Another way to think of this is to have the metadata be available as a wiki, letting the user community maintain it as they see fit. Clearly the folks at Data.gov should also document the metadata they provide, but allowing users to play a role in this would be beneficial in my view.

    4 years ago
  12. Unsubscribed User

    The new open gov. directive requires agencies to post open government plans- one of the areas they have to write to is "transparency". This conversation to me has everything to do with transparecy- if the public can't understand the data- well that is certainly a barrier to transparency. I read Jim Hendler's post which said that he wishes he could vote 10 times for this suggestion. (BTW great job on the semantic work Jim). And I'm thinking that the Federal CIO Council could launch an open project with Data.gov to transform how the federal government does data dictionaries in a networked world. Agencies could put the commitment to work on data dictionaries in their web published open government plans.

    4 years ago
  13. michael.daconta Idea Submitter

    A bit of a rant here ... just examined some of the new OGD high-value data sets that are currently being highlighted on data.gov and I am stunned to see that on the second dataset I observed (USDA MyPyramid Food Raw Data) it did not have a data dictionary!!

    Considering that this idea has been #1 for awhile now ... is anyone even reading these posts?

    When can the community expect some feedback on the implementation of the top ideas? Heck, at least the top 10???

    4 years ago
  14. Thanks for new and different input on a topic that I have been involved with for over a decade. As an independent software developer I am able to move quickly into an area where there is an identiied need. Owen Ambur was kind enough to mention, below, some work that I have contributed to the StratML initiative and I agree that there is appropriate relevance to this discussion.

    I agree with the comments about the difficulty in embracing RDF and have focused on facilitating the creation of user-friendly documentation derived from schema annotations. I also have found that people are becoming more comfortable with XML tools and techniques. Additionally, XML annotations can either be entered in a simple manner or in very detailed and highly formatted manner. Start simple and then improve.

    NIEM.gov suggests some straight-forward approaches to the placement of simple vs complex documentation which serve to insulate your typical documentation reader from formatting complexities imposed by some standards organizations. (xsd:appInfo for xml formatted documentation - leaving xsd:documentation for plain text)

    From a tool developer standpoint locating a standard for knowledge transfer that is not tightly coupled to a business standard has been a challenge. Thanks for pointing out SKOS as an approach.

    4 years ago
  15. michael.daconta Idea Submitter

    I agree that SKOS would be a great standard (W3C) way to represent the data dictionaries.

    Maybe they can first be piloted on semantic.data.gov

    4 years ago
  16. Mike, w/re your rant above, I don't know who may or may not be reading these postings. However, I do know that little or nothing will come of them unless and until they become objectives in someone(s) performance plan(s) and others have the opportunity to track actual performance against those objectives. It will be interesting to see whether and, if so, how this objective is incorporated into the Open Gov "Dashboard". http://xml.gov/stratml/carmel/OGDwStyle.xml#_7d88136e-142e-11df-a454-3f207a64ea2a

    4 years ago
  17. michael.daconta Idea Submitter

    Kudos to HUD! Here is an excellent example of both a good dataset (with one minor issue regarding quoting every field in a CSV file ... see my idea on data standards for CSV files) and an excellent and clear data dictionary!!

    See: http://www.data.gov/details/1259

    Great job HUD!!! Maybe data.gov should have a "best practices" section where we spread the word on our star performers.

    - Mike

    4 years ago
  18. I agree Kudos to HUD for the dataset.

    http://www.trafficultimatumreviewes.com

    3 years ago
  19. Its called a data repository. I agree. +1

    3 years ago