Dealing With Data in Python

March 3, 2013 at 2:06 pm

Tell me if this looks familiar.

some_dict = {
    "data" : [
        { "user" : { "name" : "Joshua Kehn" } }
    ]
}
name = some_dict["data"][0]["user"]["name"]
print "Gee, getting {name} was difficult!".format(name=name)
#=> Gee, getting Joshua Kehn was difficult!

Typically some_dict is found in a response from REST API where the designers thought it was a fabulous idea to shrink-wrap everything in multiple objects and/or arrays1. In itself this isn’t a huge problem, you just remember where everything is laid out for every request and pray the service provider is consistent. The problem is when this data that’s returned is missing crucial stepping points. What happens if the inner array in the example above is empty?

>>> some_dict = { "data" : [] }
>>> name = some_dict["data"][0]["user"]["name"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

An exception, how nice. Right in the middle of our production application too! There are several options for handling this, some more sane than others.

  1. Trust that the data will always be provided in the correct format.
  2. Validate everything you receive.
  3. Perform validation each time you touch data.

Option 1 is just stupid, you can’t trust anything you don’t control. Option 2 is probably the most sane, but isn’t always the easiest to do, especially when you have ad-hoc requests for more data in the middle of your application logic. This leaves us with #3, and it’s messy.

>>> some_dict = { "data" : [] }
>>> data = some_dict.get("data")
>>> if data and len(data) > 0:
...    first = data[0]
...    if first:
...        user = first.get("user")
...        if user:
...            name = user.get("name")
>>> name
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'name' is not defined

Here’s where I would think about monads.

Then I looked at monads in python and thought better of it.2

To solve this quickly I wrote a little function. It’s not super elegant, and misses out on some of the great syntax a monad solution, but it works so well I don’t miss anything better.

def tree_get (obj, *args):
    val = obj
    for arg in args:
        if val is None:
            return None
        # Allow filtering functions to be executed against ``val``.
        if callable(arg):
            val = arg(val)
        # Treat ``arg`` as a key.
        elif isinstance(val, dict):
            val = val.get(arg, None)
        # Treat ``arg`` as an index.
        elif isinstance(val, (list, tuple)):
            try:
                val = val[arg]
            except IndexError:
                return None
        # Treat ``arg`` as an object
        elif isinstance(val, object):
            val = getattr(val, arg, None)
        else:
            # ``val`` is something we can't operate on
            return None
        if val is None:
            return None
    return val

How about usage?

>>> some_dict = {
...     "data" : [
...         { "user" : { "name" : "Joshua Kehn" } }
...     ]
... }
>>> print tree_get(some_dict, "data", 0, "user", "name")
Joshua Kehn
>>> some_dict = { "data" : [] }
>>> print tree_get(some_dict, "data", 0, "user", "name")
None

Very straight forward to use, and suppresses every dict/list/tuple error I can think of. As a special treat, if callable(arg): allows lambda or filtering functions to be stacked in place.

>>> tree_get(some_dict, "data", 0, "user", "name", lambda n: n.split(" "), 0)
'Joshua'

  1. I’m using the JSON vernacular as opposed to Python’s dict and list nomenclature. 

  2. Discounting monads entirely would be a mistake, but right now it’s not something I can dive into. 

§

March 2013

Can’t find what you’re looking for? Try hitting the home page or viewing all archives.