klenwell information services : PywellInputOutput

Revision [2623]

This is an old revision of PywellInputOutput made by KlenwellAdmin on 2013-06-09 11:46:20.

Pywell Lesson: File Input/Output

return to PyWell Tutorial Index



In its simplest form, a script accepts some input data, manipulates it, and spits some data back out. Data in, data out. So where does this data come from?

It can come from a variety of resources. One of the most common is files and that's where we begin here. Python makes it easy to read data out of a file. Let's say we have a file in the tmp directory.

>>> file_path = "/tmp/myfile.txt"
>>> f = open(file_path)
>>> contents = f.read()
>>> f.close()
>>> print contents

Very simple. Data in: the contents of file /tmp/myfile.txt. Data out: the contents, unchanged. f here is a file object, a handler that provides a number of methods for manipulating files.

Other data resources include databases, web services, and message queues. All follow a similar pattern: open the resource, collect the data, close the resource.

For example, here's a simple script that pulls the latest JSON data from the USGS website's JSON feed and finds the largest earthquake in the last day. It requires Python's urllib and json libraries:

>>> import urllib, json
>>> usgs_url = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson"
>>> response = urllib.urlopen(usgs_url)
>>> raw_json = response.read()
>>> json_data = json.loads(raw_json)
>>> max([entry['properties']['mag'] for entry in json_data['features'])

Here's a breakdown of the steps:

1. Imports required Python libraries
2. Read data feed
For the sake of convenience, the USGS JSON data feed is set as the variable, usgs_url.

The url is opened using the urllib module's urlopen function (note the similarity to the file interface).

It is then read, much like a file, into the contents variable, and the contents are parsed using the json module's loads function, which takes a string of JSON data and converts it to a dictionary.


Extra Credit