Databases confuse me.

There. I said it. I just don’t get them. I can’t help but think of databases as some sort of computerized version of Mary Poppin’s carpet bag. She could fit more and more and more stuff in there…. endlessly. And then, magically, she could just pull out whatever it is she wanted! And until our recent class readings that’s all I had. (Honestly, I still haven’t got much more…) How does stuff gets in there (user input?) How does stuff get spun around but kept separate (mystical structures?) How does stuff get pulled back out again (keyword searching?) Seriously; this is all a mystery to me. I’m sure that this blog post will be just as disappointing as my inability to grasp the concept of a database because my reflections are largely be just yammering on how confusing I find all this, and writing out all my questions. Granted, that is still reflection. But it’s also embarrassing.

I wanted to look at Hugh Darwen’s “The Relational Model: Beginning of an Era” and “XML Based Advanced Distributed database: Implemented on Library System” by Saikat Goswami and Chandan Kundu because they talk about two different kinds of databases, and I was hoping that by comparison I might be able to sort out a clearer picture of what databases are and how they work. My understanding (despite my previously admitted and continuously embarrassing shortcomings) is that Darwen discusses a traditional model of the relational database, which has a long history and has been proven and steadily improved over the years. Goswami and Kundu explore more modern, technologically “futuristic” options of databases that are based in XML and therefore are web-friendly.

The thing is, my ambition to understand databases through comparison of two different kinds failed miserably because apparently understanding databases is necessary for being able to tell two different kinds apart. (Who’d of thunk? Not like that’s common sense or anything. (<- sarcasm).) They really seem so similar to me, except maybe traditional ones have more limited access and XML ones are a bit more flexible. In the end, I still have a lot of questions. So, here are three:

  1. How does the database get “relational”? Both the traditional and XML databases are based in this idea of being relational, and my understanding is that this means the database is smart enough to recognize the relationship between information put in it. But how is that “smartness” come by? Do we have to physically make those connections like when we try to create linked data? Or do we program it into the system to find things by scanning for key words? It seems to me that maybe we have to program the connections into a traditional database because Darwen talks about the information in such databases relying on tabular structure wherein relationships are clearly defined by assigning a each entry a “type” and connecting it with many different “types” all associated with the same overall concept. We I suppose we would still use keywords, but Darwen seems to indicate those keywords are pre-determined values like a controlled vocabulary. XML databases, or at least those designated as “Native XML Databases” by Goswami and Kundu, are not based on tables (although confusingly they still seem able to use tables) but rather this concept of a “container.” So what is the difference between a container and a table? Do containers not ascribe particular entry “types” but rather gather up all the info in one big soup under a single heading?
  2. Where does “relational” end? Darwen talks about creating relations between information in a single tuple. But he also talks about how multiple tuples can be about the same subject just with different relational information entry. Do we also have to create another layer of relations between tuples? I feel like he covers this but I just don’t understand it. Also, for the non-tabular XML databases, do we create relational structures that go container to container? Or do we not need to because we can just enter a keyword in and find everything related to that keyword? That’s what we do in google – no one went and physically connected the internet through one massive able. Right?
  3. What is the actual difference between a “container” and a “table??” What I just described seems an awful lot like what we do with tables, except tables seem to need a controlled vocabulary. I really don’t know if I understand the difference…. I want to say that tables are a more granular rendition of containing information on a related topic. I want to say that all tables are containers but not all containers are tables. I want to say that a table has all the information about a broad topic, and therefore it is all related. But within that table are columns, and those columns are related to each other on a more granular level. And within that table are rows, and those rows are related to each other on a more granular level. Columns are a topic or “type” of information, and rows are all the metadata about a single item. Thus, tables are very granular. Then I want to say that containers are filled with many, many documents that are related, but they are not granular because there is nothing within that container to distinguish all the different “types” or information about that item. The problem is, this is all just how I read the articles I’m reflecting on, and I don’t know what it MEANS. Is this really how it happens? How are containers effective if there is no organization? Surely there must be item level metadata of some kind – is it just floating around attached to the item but not really sorted out anywhere? Is that what keyword searching is for? PHEW – I’m exhausted!

It’s really very hard to reflect on this stuff when its so hard to understand. I think I understand traditional databases better than before and better than XML now – they were pretty well explained in Darwen’s article. But I don’t think I understand databases on a whole yet. I know databases are important to the profession, and based on the articles I can gather that XML is probably the direction we’re headed. Flexibility is really valuable, as is affordability and online access. Those things are hard to argue with, even if traditional databases have a long, well established history of success. I hope I can figure databases out, though. Because ultimately my reflections and observations are useless without the understanding to back them up. So please leave me comments explaining databases, because I clearly need help with that.

References:

Goswami, S., & Kundu, C. (2013). XML based advanced distributed database: Implemented on library system. International Journal of Information Management, 33(1), 28-31. Retrieved from http://www.sciencedirect.com.proxycu.wrlc.org/science/article/pii/S0268401212000655

Darwen, H. (2012). The relational model: Beginning of an era. IEEE Annals of the History of Computing, 34(4), 30-37. Retrieved from http://ieeexplore.ieee.org.proxycu.wrlc.org/xpls/icp.jsp?arnumber=6297961

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s