klenwell information services : PywellInputOutput

Revision history for PywellInputOutput


Revision [3062]

Last edited on 2016-07-09 16:54:44 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
return to [[PyWell | Tutorial Index]]
Very simple. Data in: the contents of file ##/tmp/myfile.txt##. Data out: the contents, unchanged. ##f## here is a [[http://docs.python.org/2/library/stdtypes.html#bltin-file-objects | file object]], a handler that provides a number of methods for manipulating files.
For example, here's a simple script that pulls the latest JSON data from the USGS website's [[http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson | JSON feed]] and finds the largest earthquake in the last day. It requires Python's urllib and json libraries:
Parse the New York Times' [[http://www.nytimes.com/services/xml/rss/nyt/pop_top.xml | XML feed of Most E-Mailed Articles]] and answer the following questions:
[[http://docs.python.org/2/library/stdtypes.html#bltin-file-objects | Python Documentation on File Objects]]
[[http://docs.python.org/2/library/json.html | Python JSON Library]]
Deletions:
return to [[PyWell Tutorial Index]]
Very simple. Data in: the contents of file ##/tmp/myfile.txt##. Data out: the contents, unchanged. ##f## here is a [[http://docs.python.org/2/library/stdtypes.html#bltin-file-objects file object]], a handler that provides a number of methods for manipulating files.
For example, here's a simple script that pulls the latest JSON data from the USGS website's [[http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson JSON feed]] and finds the largest earthquake in the last day. It requires Python's urllib and json libraries:
Parse the New York Times' [[http://www.nytimes.com/services/xml/rss/nyt/pop_top.xml XML feed of Most E-Mailed Articles]] and answer the following questions:
[[http://docs.python.org/2/library/stdtypes.html#bltin-file-objects Python Documentation on File Objects]]
[[http://docs.python.org/2/library/json.html Python JSON Library]]


Revision [2635]

Edited on 2013-07-04 22:17:28 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
[[http://docs.python.org/2/library/json.html Python JSON Library]]


Revision [2634]

Edited on 2013-07-04 22:15:06 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
[[http://docs.python.org/2/library/stdtypes.html#bltin-file-objects Python Documentation on File Objects]]


Revision [2630]

Edited on 2013-06-09 19:40:48 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
~- How many earthquakes greater than magnitude 3.0 were recorded by the USGS in the last day?
Deletions:
~- How many earthquakes were recorded by the USGS in the last day?


Revision [2629]

Edited on 2013-06-09 12:16:20 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
Parse the New York Times' [[http://www.nytimes.com/services/xml/rss/nyt/pop_top.xml XML feed of Most E-Mailed Articles]] and answer the following questions:
Deletions:
Parse the New York Times' XML feed of Most E-Mailed Articles and answer the following questions:


Revision [2628]

Edited on 2013-06-09 12:13:00 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
<type 'dict'>
Deletions:
>>> <type 'dict'>


Revision [2627]

Edited on 2013-06-09 12:12:32 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
~- Use web services with ##urllib## and ##json## modules
Deletions:
~- Use web services with ##urllib2## and ##json## modules


Revision [2626]

Edited on 2013-06-09 12:12:14 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
>>> type(json_data)
>>> <type 'dict'>
>>> json_data.keys()
[u'type', u'features', u'bbox', u'metadata']
>>> json_data['metadata']
{u'url': u'http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson', u'count': 154, u'generated': 1370802521000, u'api': u'1.0.1', u'title': u'USGS All Earthquakes, Past Day'}
>>> type(json_data['features'])
<type 'list'>
>>> json_data['features'][0].keys()
[u'geometry', u'type', u'properties', u'id']
>>> first = json_data['features'][0]
>>> first.keys()
[u'geometry', u'type', u'properties', u'id']
>>> first = json_data['features'][0]
>>> first['properties']
{u'rms': 0.32, u'code': u'15357601', u'cdi': None, u'sources': u',ci,', u'nst': 15, u'tz': -420, u'magType': u'Ml', u'detail': u'http://earthquake.usgs.gov/earthquakes/feed/v1.0/detail/ci15357601.geojson', u'sig': 30, u'net': u'ci', u'type': u'earthquake', u'status': u'AUTOMATIC', u'updated': 1370801930386, u'felt': None, u'alert': None, u'dmin': 0.11678099, u'mag': 1.4, u'gap': 79.2, u'types': u',general-link,geoserve,nearby-cities,origin,scitech-link,', u'url': u'http://earthquake.usgs.gov/earthquakes/eventpage/ci15357601', u'ids': u',ci15357601,', u'tsunami': None, u'place': u'2km S of Brawley, California', u'time': 1370801714000, u'mmi': None}
>>> earthquakes = json_data['features']
>>> first_earthquake = earthquakes[0]
>>> first_earthquake['properties']
{u'rms': 0.32, u'code': u'15357601', u'cdi': None, u'sources': u',ci,', u'nst': 15, u'tz': -420, u'magType': u'Ml', u'detail': u'http://earthquake.usgs.gov/earthquakes/feed/v1.0/detail/ci15357601.geojson', u'sig': 30, u'net': u'ci', u'type': u'earthquake', u'status': u'AUTOMATIC', u'updated': 1370801930386, u'felt': None, u'alert': None, u'dmin': 0.11678099, u'mag': 1.4, u'gap': 79.2, u'types': u',general-link,geoserve,nearby-cities,origin,scitech-link,', u'url': u'http://earthquake.usgs.gov/earthquakes/eventpage/ci15357601', u'ids': u',ci15357601,', u'tsunami': None, u'place': u'2km S of Brawley, California', u'time': 1370801714000, u'mmi': None}
>>> first_earthquake['properties']['mag']
1.4


Revision [2625]

Edited on 2013-06-09 12:05:54 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
for earthquake in earthquakes:
earthquake_magnitudes.append(earthquake['properties']['mag'])
print len(earthquakes) # number of earthquakes in last day
print max(earthquake_magnitudes) # biggest earthquakes value
Write a script that collects the USGS earthquake data and answers the following questions:
~- How many earthquakes were recorded by the USGS in the last day?
~- What was the largest earthquake?
~- Where did it occur?
~- When did it occur?
Parse the New York Times' XML feed of Most E-Mailed Articles and answer the following questions:
~- What is the average length of the headlines?
~- What are the most commonly used words?
~- Can you identify any other patterns in the data?
You'll need to research how to parse XML in Python. It's not quite as simple as JSON.
Deletions:
for earthquake in earthque


Revision [2624]

Edited on 2013-06-09 11:58:41 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
~- ##import urllib, json##
~- ##usgs_url = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson"##
The url is opened using the ##urllib## module's ##urlopen## function (note the similarity to the ##file## interface):
~- ##response = urllib.urlopen(usgs_url)##
It is then read, much like a file, into the ##raw_json## variable
~- ##raw_json = response.read()##
**3. Parse JSON data into a dictionary**
The contents are parsed using the ##json## module's ##loads## function, which takes a string of JSON data and converts it to a dictionary.
~- ##json_data = json.loads(raw_json)##
A dictionary makes the data much easier to work with. Remember a dictionary is a data structure that ties a key to a value that can be accessed like so:
~- ##dict[key] = value##
The value can be almost any data type, a string, an integer, a list, another dict. Once we have our ##json_data## dict, we can play with it. For instance:
The data we're interested in specifically, that corresponds to earthquakes in the past day, is the ##features## key:
**4. Find the entry with the highest magnitude**
Here, for the sake of brevity, I use list comprehension to collect a list of the magnitude for each earthquake and then use the built-in ##max## function to find the maximum value:
~- ##max([entry['properties']['mag'] for entry in json_data['features'])##
If you're not familiar with list comprehensions yet, this code may make more sense:
earthquakes = json_data['features'])
earthquake_magnitudes = []
for earthquake in earthque
Deletions:
~~- ##import urllib, json##
The url is opened using the ##urllib## module's ##urlopen## function (note the similarity to the ##file## interface).
It is then read, much like a file, into the ##contents## variable, and the contents are parsed using the ##json## module's ##loads## function, which takes a string of JSON data and converts it to a dictionary.


Revision [2623]

Edited on 2013-06-09 11:46:20 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
**1. Imports required Python libraries**
~~- ##import urllib, json##
**2. Read data feed**
For the sake of convenience, the USGS JSON data feed is set as the variable, ##usgs_url##.
The url is opened using the ##urllib## module's ##urlopen## function (note the similarity to the ##file## interface).
It is then read, much like a file, into the ##contents## variable, and the contents are parsed using the ##json## module's ##loads## function, which takes a string of JSON data and converts it to a dictionary.
Deletions:
~- ##import urllib, json##
~~- Imports required Python libraries


Revision [2622]

Edited on 2013-06-09 11:40:45 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
For example, here's a simple script that pulls the latest JSON data from the USGS website's [[http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson JSON feed]] and finds the largest earthquake in the last day. It requires Python's urllib and json libraries:
>>> import urllib, json
>>> usgs_url = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson"
>>> response = urllib.urlopen(usgs_url)
>>> raw_json = response.read()
>>> json_data = json.loads(raw_json)
>>> max([entry['properties']['mag'] for entry in json_data['features'])
Here's a breakdown of the steps:
~- ##import urllib, json##
~~- Imports required Python libraries
Deletions:
For example, here's a simple script to pull the latest public tweets from Twitter and calculate their average length. It requires using the Python web library. But notice how similar it is to working with files:
# Open the resource
# Collect the data
# Close the resource
# Manipulate the data
# Output the data


Revision [2621]

Edited on 2013-06-05 22:15:39 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]
Additions:
In its simplest form, a script accepts some input data, manipulates it, and spits some data back out. Data in, data out. So where does this data come from?
It can come from a variety of resources. One of the most common is files and that's where we begin here. Python makes it easy to read data out of a file. Let's say we have a file in the ##tmp## directory.
%%(python)
>>> file_path = "/tmp/myfile.txt"
>>> f = open(file_path)
>>> contents = f.read()
>>> f.close()
>>> print contents
%%
Very simple. Data in: the contents of file ##/tmp/myfile.txt##. Data out: the contents, unchanged. ##f## here is a [[http://docs.python.org/2/library/stdtypes.html#bltin-file-objects file object]], a handler that provides a number of methods for manipulating files.
Other data resources include databases, web services, and message queues. All follow a similar pattern: open the resource, collect the data, close the resource.
For example, here's a simple script to pull the latest public tweets from Twitter and calculate their average length. It requires using the Python web library. But notice how similar it is to working with files:
%%(python)
# Open the resource
# Collect the data
# Close the resource
# Manipulate the data
# Output the data
%%


Revision [2394]

The oldest known version of this page was created on 2013-01-20 09:58:08 by KlenwellAdmin [Replaces old-style internal links with new pipe-split links.]