Key Word(s): I/O
I/O Intro¶
- Input Files
- XML
- YAML
- JSON
- Comments on pickling
Note: Please download this notebook. Some slides do not render properly on the website.
from IPython.display import HTML
Input Files and Parsing¶
We usually want to read data into our software:
- Input parameters to the code (e.g. time step, linear algebra solvers, physical parameters, etc)
- Input fields (e.g. fields to visualize)
- Calibration data
- $\vdots$
This data can be provided by us, or the client, or come from a database somewhere.
There are many ways of reading in and parsing data. In fact, this is often a non-trivial exercise depending on the quality of the data as well as its size.
XML Intro¶
<?xml version="1.0"?>
<ctml>
<reactionData id="test_mechanism">
<!-- reaction 01 -->
<reaction reversible="yes" type="Elementary" id="reaction01">
<equation>H + O2 [=] OH + O</equation>
<rateCoeff>
<Kooij>
<A units="cm3/mol/s">3.52e+16</A>
<b>-0.7</b>
<E units="kJ/mol">71.4</E>
</Kooij>
</rateCoeff>
<reactants>H:1 O2:1</reactants>
<products>OH:1 O:1</products>
</reaction>
<!-- reaction 02 -->
<reaction reversible="yes" type="Elementary" id="reaction02">
<equation>H2 + O [=] OH + H</equation>
<rateCoeff>
<Kooij>
<A units="cm3/mol/s">5.06e+4</A>
<b>2.7</b>
<E units="kJ/mol">26.3</E>
</Kooij>
</rateCoeff>
<reactants>H2:1 O:1</reactants>
<products>OH:1 H:1</products>
</reaction>
</reactionData>
</ctml>
What is XML?¶
Note: Material presented here taken from the following sources
Some basic XML comments:
- XML stands for
Extensible Markup Language - XML is just information wrapped in tags
- It doesn't do anything per se
- Its format is both machine- and human-readable
Some Basic XML Anatomy¶
<!-- This is an XML comment -->
<?xml version="1.0" encoding="UTF-8"?> <!-- This is the optional XML prolog -->
<dogshelter> <!-- This is the root element -->
<dog id="dog1"> <!-- This is the first child element.
It has a `id` attribute -->
<name> Cloe </name> <!-- First subchild element -->
<age> 3 </age> <!-- Second subchild element -->
<breed> Border Collie </breed>
<playgroup> Yes </playgroup>
</dog>
<dog id="dog2">
<name> Karl </name>
<age> 7 </age>
<breed> Beagle </breed>
<playgroup> Yes </playgroup>
</dog>
</dogshelter>
Note that all XML elements have a closing tag!
Some More Basic XML Anatomy¶
See w3schools XML tutorial for a very nice summary of the essential XML rules.
XML elements: a few things to be aware of:
- Elements can contain text, attributes, and other elements
XMLnames are case sensitive and cannot contain spaces- Be consistent in your naming convention
XML attributes: a few things to be aware of:
XMLattributes must be in quotes- There are no rules about when to use elements or attributes
- You could make an attribute an element and it might read better
- Rule of thumb: Data should be stored as elements. Metadata should be stored as attributes.
Python and XML¶
We will use the ElementTree class to read in and parse XML input files in Python.
A very nice tutorial can be found in the Python ElementTree documentation.
We'll work with the shelterdogs.xml file to start.
import xml.etree.ElementTree as ET
tree = ET.parse('shelterdogs.xml')
dogshelter = tree.getroot()
print(dogshelter)
print(dogshelter.tag)
print(dogshelter.attrib)
<Element 'dogshelter' at 0x7fa98bd18cb0>
dogshelter
{}
Looping Over Child Elements¶
for child in dogshelter:
print(child.tag, child.attrib)
dog {'id': 'dog1'}
dog {'id': 'dog2'}
Accessing Children by Index¶
print(dogshelter[0][0].text)
Cloe
print(dogshelter[1][0].text)
Karl
print(dogshelter[0][2].text)
Border Collie
The Element.iter() Method¶
From the documentation:
Creates a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order.
for age in dogshelter.iter('age'):
print(age.text)
3 7
The Element.findall() Method¶
From the documentation:
Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order.
print(dogshelter.findall('dog'))
[<Element 'dog' at 0x7fa98bd2d4d0>, <Element 'dog' at 0x7fa98bd3d170>]
for dog in dogshelter.findall('dog'): # Iterate over each child
print('ID: {}'.format(dog.get('id'))) # Use the get() method to get the attribute of the child
print('----------')
print('Name: {}'.format(dog.find('name').text)) # Use the find() method to find a specific subchild
age = float(dog.find('age').text)
if (dog.find('age').attrib == 'months'):
years = age / 12.0
print('Age: {} years'.format(years))
else:
print('Age: {} years'.format(age))
print('Breed: {}'.format(dog.find('breed').text))
if (dog.find('playgroup').text.split()[0] == 'Yes'):
print('PLAYGROUP')
else:
print('NO PLAYGROUP')
print('\n::::::::::::::::::::')
ID: dog1 ---------- Name: Cloe Age: 3.0 years Breed: Border Collie PLAYGROUP :::::::::::::::::::: ID: dog2 ---------- Name: Karl Age: 7.0 years Breed: Beagle PLAYGROUP ::::::::::::::::::::
What is JSON?¶
- Stands for JavaScript Object Notation
- It's actually language agnostic
- No need to learn JavaScript to use it
- Like XML, it's a human-readable format
Some Basic JSON Anatomy¶
{
"dogShelter": "MSPCA-Angell",
"dogs": [
{
"name": "Cloe",
"age": 3,
"breed": "Border Collie",
"attendPlaygroup": "Yes"
},
{
"name": "Karl",
"age": 7,
"breed": "Beagle",
"attendPlaygroup": "Yes"
}
]
}
JSON and Python¶
PythonsupportsJSONnatively- Saving
Pythondata toJSONformat is called serialization - Loading a
JSONfile intoPythondata is called deserialization
Deserialization¶
Since we're interested in reading in some fancy input file, we'll begin by discussing deserialization.
We'll work with the shelterdogs.json file.
import json
with open ("shelterdogs.json", "r") as shelterdogs_file:
shelterdogs = json.load(shelterdogs_file)
print(shelterdogs["dogs"])
[{'name': 'Cloe', 'age': 3, 'breed': 'Border Collie', 'attendPlaygroup': 'Yes'}, {'name': 'Karl', 'age': 7, 'breed': 'Beagle', 'attendPlaygroup': 'Yes'}]
print(type(shelterdogs))
<class 'dict'>
Comments on Deserialization¶
That was pretty nice! We got a Python dictionary out. We sure know how to work with Python dictionaries.
Serialization¶
You can also write data out to JSON format. Let's just do a brief example.
somedogs = {"shelterDogs": [{"name": "Cloe", "age": 3, "breed": "Border Collie", "attendPlaygroup": "Yes"},
{"name": "Karl", "age": 7, "breed": "Beagle", "attendPlaygroup": "Yes"}]}
with open("shelterdogs_write.json", "w") as write_dogs:
json.dump(somedogs, write_dogs, indent=4)
Some JSON References¶
What is YAML?¶
- The official website:
YAML - From the official website:
YAMLstands for YAML Ain't Markup Language- Example of a recursive acronym (like Linux!)
- "What It Is: YAML is a human-friendly data serialization standard for all programming languages."
- YAML is quite friendly to use and continues to gain in popularity
YAML Anatomy¶
shelterDogs:
- {age: 3, attendPlaygroup: 'Yes', breed: Border Collie, name: Cloe}
- {age: 7, attendPlaygroup: 'Yes', breed: Beagle, name: Karl}
shelterStaff:
- {Job: dogWalker, age: 100, name: Bob}
- {Job: PlaygroupLeader, age: 47, name: Sally}
someshelter = {"shelterDogs": [{"name": "Cloe", "age": 3, "breed": "Border Collie", "attendPlaygroup": "Yes"},
{"name": "Karl", "age": 7, "breed": "Beagle", "attendPlaygroup": "Yes"}],
"shelterStaff": [{"name": "Bob", "age": 100, "Job": "dogWalker"},
{"name": "Sally", "age": 47, "Job": "PlaygroupLeader"}]}
import yaml # Use conda install -c anaconda yaml if you need to install it
print(yaml.dump(someshelter))
shelterDogs: - age: 3 attendPlaygroup: 'Yes' breed: Border Collie name: Cloe - age: 7 attendPlaygroup: 'Yes' breed: Beagle name: Karl shelterStaff: - Job: dogWalker age: 100 name: Bob - Job: PlaygroupLeader age: 47 name: Sally
Serialization¶
with open("shelter_write.yaml", "w") as write_dogs:
yaml.dump(someshelter, write_dogs)
Deserialization¶
with open ("shelterdogs.yaml", "r") as shelter_dogs:
some_shelter = yaml.load(shelter_dogs)
/Users/dsondak/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
print(some_shelter)
{'shelterDogs': [{'age': 3, 'attendPlaygroup': 'Yes', 'breed': 'Border Collie', 'name': 'Cloe'}, {'age': 7, 'attendPlaygroup': 'Yes', 'breed': 'Beagle', 'name': 'Karl'}], 'shelterStaff': [{'Job': 'dogWalker', 'age': 100, 'name': 'Bob'}, {'Job': 'PlaygroupLeader', 'age': 47, 'name': 'Sally'}]}
print(some_shelter["shelterStaff"])
[{'Job': 'dogWalker', 'age': 100, 'name': 'Bob'}, {'Job': 'PlaygroupLeader', 'age': 47, 'name': 'Sally'}]
What is pickle?¶
Pythonhas it's own module for loading and writingpythondata- Part of the
pythonstandard library - Fast
- Can store arbitrarily complex
Pythondata structures
Some caveats¶
Pythonspecific: no guarantee of cross-language compatibility- Not every
pythondatastructure can be serialized bypickle - Older versions of
pythondon't support newer serialization formats- Lastest format can handle the most
pythondatastructures - They can also read in older datastructures
- Older formats cannot read in newer formats
- Lastest format can handle the most
- Make sure to use binary mode when opening
picklefiles- Data will get corrupted otherwise
import pickle
someshelter = {"shelterDogs": [{"name": "Cloe", "age": 3, "breed": "Border Collie", "attendPlaygroup": "Yes"},
{"name": "Karl", "age": 7, "breed": "Beagle", "attendPlaygroup": "Yes"}],
"shelterStaff": [{"name": "Bob", "age": 100, "Job": "dogWalker"},
{"name": "Sally", "age": 47, "Job": "PlaygroupLeader"}]}
with open('data.pickle', 'wb') as f:
pickle.dump(someshelter, f, pickle.HIGHEST_PROTOCOL) # highest protocol is the most recent one
with open('data.pickle', 'rb') as f:
data = pickle.load(f)
print(data)
{'shelterDogs': [{'name': 'Cloe', 'age': 3, 'breed': 'Border Collie', 'attendPlaygroup': 'Yes'}, {'name': 'Karl', 'age': 7, 'breed': 'Beagle', 'attendPlaygroup': 'Yes'}], 'shelterStaff': [{'name': 'Bob', 'age': 100, 'Job': 'dogWalker'}, {'name': 'Sally', 'age': 47, 'Job': 'PlaygroupLeader'}]}
%%bash
cat "data.pickle"
Border Collie��attendPlaygroup��Yes�u}�(h�Karl�hKh�Beagle�h h ue�shelterStaff�]�(}�(h�Bob�hKd�Job�� dogWalker�u}�(h�Sally�hK/h�PlaygroupLeader�ueu.