Key Word(s): I/O
I/O Intro¶
- Input Files
- XML
- YAML
- JSON
- Comments on pickling
Note: Please download this notebook. Some slides do not render properly on the website.
from IPython.display import HTML
Input Files and Parsing¶
We usually want to read data into our software:
- Input parameters to the code (e.g. time step, linear algebra solvers, physical parameters, etc)
- Input fields (e.g. fields to visualize)
- Calibration data
- $\vdots$
This data can be provided by us, or the client, or come from a database somewhere.
There are many ways of reading in and parsing data. In fact, this is often a non-trivial exercise depending on the quality of the data as well as its size.
XML Intro¶
<?xml version="1.0"?>
<ctml>
<reactionData id="test_mechanism">
<!-- reaction 01 -->
<reaction reversible="yes" type="Elementary" id="reaction01">
<equation>H + O2 [=] OH + O</equation>
<rateCoeff>
<Kooij>
<A units="cm3/mol/s">3.52e+16</A>
<b>-0.7</b>
<E units="kJ/mol">71.4</E>
</Kooij>
</rateCoeff>
<reactants>H:1 O2:1</reactants>
<products>OH:1 O:1</products>
</reaction>
<!-- reaction 02 -->
<reaction reversible="yes" type="Elementary" id="reaction02">
<equation>H2 + O [=] OH + H</equation>
<rateCoeff>
<Kooij>
<A units="cm3/mol/s">5.06e+4</A>
<b>2.7</b>
<E units="kJ/mol">26.3</E>
</Kooij>
</rateCoeff>
<reactants>H2:1 O:1</reactants>
<products>OH:1 H:1</products>
</reaction>
</reactionData>
</ctml>
What is XML?¶
Note: Material presented here taken from the following sources
Some basic XML
comments:
- XML stands for
Extensible Markup Language
- XML is just information wrapped in tags
- It doesn't do anything per se
- Its format is both machine- and human-readable
Some Basic XML
Anatomy¶
<!-- This is an XML comment -->
<?xml version="1.0" encoding="UTF-8"?> <!-- This is the optional XML prolog -->
<dogshelter> <!-- This is the root element -->
<dog id="dog1"> <!-- This is the first child element.
It has a `id` attribute -->
<name> Cloe </name> <!-- First subchild element -->
<age> 3 </age> <!-- Second subchild element -->
<breed> Border Collie </breed>
<playgroup> Yes </playgroup>
</dog>
<dog id="dog2">
<name> Karl </name>
<age> 7 </age>
<breed> Beagle </breed>
<playgroup> Yes </playgroup>
</dog>
</dogshelter>
Note that all XML
elements have a closing tag!
Some More Basic XML
Anatomy¶
See w3schools XML tutorial for a very nice summary of the essential XML
rules.
XML
elements: a few things to be aware of:
- Elements can contain text, attributes, and other elements
XML
names are case sensitive and cannot contain spaces- Be consistent in your naming convention
XML
attributes: a few things to be aware of:
XML
attributes must be in quotes- There are no rules about when to use elements or attributes
- You could make an attribute an element and it might read better
- Rule of thumb: Data should be stored as elements. Metadata should be stored as attributes.
Python and XML
¶
We will use the ElementTree
class to read in and parse XML
input files in Python
.
A very nice tutorial can be found in the Python
ElementTree
documentation.
We'll work with the shelterdogs.xml
file to start.
import xml.etree.ElementTree as ET
tree = ET.parse('shelterdogs.xml')
dogshelter = tree.getroot()
print(dogshelter)
print(dogshelter.tag)
print(dogshelter.attrib)
<Element 'dogshelter' at 0x7fa98bd18cb0> dogshelter {}
Looping Over Child Elements¶
for child in dogshelter:
print(child.tag, child.attrib)
dog {'id': 'dog1'} dog {'id': 'dog2'}
Accessing Children by Index¶
print(dogshelter[0][0].text)
Cloe
print(dogshelter[1][0].text)
Karl
print(dogshelter[0][2].text)
Border Collie
The Element.iter()
Method¶
From the documentation:
Creates a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order.
for age in dogshelter.iter('age'):
print(age.text)
3 7
The Element.findall()
Method¶
From the documentation:
Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order.
print(dogshelter.findall('dog'))
[<Element 'dog' at 0x7fa98bd2d4d0>, <Element 'dog' at 0x7fa98bd3d170>]
for dog in dogshelter.findall('dog'): # Iterate over each child
print('ID: {}'.format(dog.get('id'))) # Use the get() method to get the attribute of the child
print('----------')
print('Name: {}'.format(dog.find('name').text)) # Use the find() method to find a specific subchild
age = float(dog.find('age').text)
if (dog.find('age').attrib == 'months'):
years = age / 12.0
print('Age: {} years'.format(years))
else:
print('Age: {} years'.format(age))
print('Breed: {}'.format(dog.find('breed').text))
if (dog.find('playgroup').text.split()[0] == 'Yes'):
print('PLAYGROUP')
else:
print('NO PLAYGROUP')
print('\n::::::::::::::::::::')
ID: dog1 ---------- Name: Cloe Age: 3.0 years Breed: Border Collie PLAYGROUP :::::::::::::::::::: ID: dog2 ---------- Name: Karl Age: 7.0 years Breed: Beagle PLAYGROUP ::::::::::::::::::::
What is JSON?¶
- Stands for JavaScript Object Notation
- It's actually language agnostic
- No need to learn JavaScript to use it
- Like XML, it's a human-readable format
Some Basic JSON
Anatomy¶
{
"dogShelter": "MSPCA-Angell",
"dogs": [
{
"name": "Cloe",
"age": 3,
"breed": "Border Collie",
"attendPlaygroup": "Yes"
},
{
"name": "Karl",
"age": 7,
"breed": "Beagle",
"attendPlaygroup": "Yes"
}
]
}
JSON
and Python
¶
Python
supportsJSON
natively- Saving
Python
data toJSON
format is called serialization - Loading a
JSON
file intoPython
data is called deserialization
Deserialization¶
Since we're interested in reading in some fancy input file, we'll begin by discussing deserialization.
We'll work with the shelterdogs.json
file.
import json
with open ("shelterdogs.json", "r") as shelterdogs_file:
shelterdogs = json.load(shelterdogs_file)
print(shelterdogs["dogs"])
[{'name': 'Cloe', 'age': 3, 'breed': 'Border Collie', 'attendPlaygroup': 'Yes'}, {'name': 'Karl', 'age': 7, 'breed': 'Beagle', 'attendPlaygroup': 'Yes'}]
print(type(shelterdogs))
<class 'dict'>
Comments on Deserialization¶
That was pretty nice! We got a Python
dictionary out. We sure know how to work with Python
dictionaries.
Serialization¶
You can also write data out to JSON
format. Let's just do a brief example.
somedogs = {"shelterDogs": [{"name": "Cloe", "age": 3, "breed": "Border Collie", "attendPlaygroup": "Yes"},
{"name": "Karl", "age": 7, "breed": "Beagle", "attendPlaygroup": "Yes"}]}
with open("shelterdogs_write.json", "w") as write_dogs:
json.dump(somedogs, write_dogs, indent=4)
Some JSON
References¶
What is YAML
?¶
- The official website:
YAML
- From the official website:
YAML
stands for YAML Ain't Markup Language- Example of a recursive acronym (like Linux!)
- "What It Is: YAML is a human-friendly data serialization standard for all programming languages."
- YAML is quite friendly to use and continues to gain in popularity
YAML
Anatomy¶
shelterDogs:
- {age: 3, attendPlaygroup: 'Yes', breed: Border Collie, name: Cloe}
- {age: 7, attendPlaygroup: 'Yes', breed: Beagle, name: Karl}
shelterStaff:
- {Job: dogWalker, age: 100, name: Bob}
- {Job: PlaygroupLeader, age: 47, name: Sally}
someshelter = {"shelterDogs": [{"name": "Cloe", "age": 3, "breed": "Border Collie", "attendPlaygroup": "Yes"},
{"name": "Karl", "age": 7, "breed": "Beagle", "attendPlaygroup": "Yes"}],
"shelterStaff": [{"name": "Bob", "age": 100, "Job": "dogWalker"},
{"name": "Sally", "age": 47, "Job": "PlaygroupLeader"}]}
import yaml # Use conda install -c anaconda yaml if you need to install it
print(yaml.dump(someshelter))
shelterDogs: - age: 3 attendPlaygroup: 'Yes' breed: Border Collie name: Cloe - age: 7 attendPlaygroup: 'Yes' breed: Beagle name: Karl shelterStaff: - Job: dogWalker age: 100 name: Bob - Job: PlaygroupLeader age: 47 name: Sally
Serialization¶
with open("shelter_write.yaml", "w") as write_dogs:
yaml.dump(someshelter, write_dogs)
Deserialization¶
with open ("shelterdogs.yaml", "r") as shelter_dogs:
some_shelter = yaml.load(shelter_dogs)
/Users/dsondak/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
print(some_shelter)
{'shelterDogs': [{'age': 3, 'attendPlaygroup': 'Yes', 'breed': 'Border Collie', 'name': 'Cloe'}, {'age': 7, 'attendPlaygroup': 'Yes', 'breed': 'Beagle', 'name': 'Karl'}], 'shelterStaff': [{'Job': 'dogWalker', 'age': 100, 'name': 'Bob'}, {'Job': 'PlaygroupLeader', 'age': 47, 'name': 'Sally'}]}
print(some_shelter["shelterStaff"])
[{'Job': 'dogWalker', 'age': 100, 'name': 'Bob'}, {'Job': 'PlaygroupLeader', 'age': 47, 'name': 'Sally'}]
What is pickle
?¶
Python
has it's own module for loading and writingpython
data- Part of the
python
standard library - Fast
- Can store arbitrarily complex
Python
data structures
Some caveats¶
Python
specific: no guarantee of cross-language compatibility- Not every
python
datastructure can be serialized bypickle
- Older versions of
python
don't support newer serialization formats- Lastest format can handle the most
python
datastructures - They can also read in older datastructures
- Older formats cannot read in newer formats
- Lastest format can handle the most
- Make sure to use binary mode when opening
pickle
files- Data will get corrupted otherwise
import pickle
someshelter = {"shelterDogs": [{"name": "Cloe", "age": 3, "breed": "Border Collie", "attendPlaygroup": "Yes"},
{"name": "Karl", "age": 7, "breed": "Beagle", "attendPlaygroup": "Yes"}],
"shelterStaff": [{"name": "Bob", "age": 100, "Job": "dogWalker"},
{"name": "Sally", "age": 47, "Job": "PlaygroupLeader"}]}
with open('data.pickle', 'wb') as f:
pickle.dump(someshelter, f, pickle.HIGHEST_PROTOCOL) # highest protocol is the most recent one
with open('data.pickle', 'rb') as f:
data = pickle.load(f)
print(data)
{'shelterDogs': [{'name': 'Cloe', 'age': 3, 'breed': 'Border Collie', 'attendPlaygroup': 'Yes'}, {'name': 'Karl', 'age': 7, 'breed': 'Beagle', 'attendPlaygroup': 'Yes'}], 'shelterStaff': [{'name': 'Bob', 'age': 100, 'Job': 'dogWalker'}, {'name': 'Sally', 'age': 47, 'Job': 'PlaygroupLeader'}]}
%%bash
cat "data.pickle"
Border Collie��attendPlaygroup��Yes�u}�(h�Karl�hKh�Beagle�h h ue�shelterStaff�]�(}�(h�Bob�hKd�Job�� dogWalker�u}�(h�Sally�hK/h�PlaygroupLeader�ueu.