CL8: Scientific Computing#
Welcome to the eighth coding lab!
This coding lab focuses on getting comfortable working with data using pandas
, writing code that adheres to good code style principles, and code testing.
Part 1: Setup#
Data wrangling often requires additional functionality outside what’s included in Python by default. For this, we’ll import other functionality from helpful packages.
Import the following packages using their common shortened name found in parentheses:
numpy
(np
)pandas
(pd
)
### BEGIN SOLUTION
import numpy as np
import pandas as pd
### END SOLUTION
assert pd
assert np
Part II: Module Files#
First, in the notebook below, do the following:
Write at least one new Class
Make sure there is at least one method in the class that returns something/prints something out
Write two new functions, that do something with instances of your new class(es)
For example, take a list of custom objects, and call a method on each one
Classes here:
### BEGIN SOLUTION
# Classes here
class Noodles():
def __init__(self, size = 'large', available = 'Yes'):
self.size = size
self.available = available
def order_to_go(self):
if self.available == None:
out = "I'm sorry, but we aren't taking to-go orders right now."
if self.available == 'Yes':
out = "I'll have your order ready in 20 minutes"
return out
### END SOLUTION
Functions here:
### BEGIN SOLUTION
# new functions here
def eating(Noodles):
if Noodles.available == 'Yes':
print('Itadakimasu') # means "I will receive"; something you say before eating
out = 'Slurp slurp slurp, delicious!'
else:
out = 'Quarantine can be rough. It be like that sometimes.'
return out
def soup(Noodles):
if Noodles.size == 'large':
out = "\n" + 'Should I drink the soup?' + "\n\t" + "I really shouldn't... There's too much sodium."
if Noodles.size != 'large':
out = "\n" + 'Should I drink the soup?' + "\n\t" + "You only live once! Slurp slurp slurp!"
return out
### END SOLUTION
Next, we are going to move this code to an external file - a Python module file.
Open a new text file, from the Jupyter server page
Copy your classes and functions into that file
Save that file in the same folder/directory where this notebook is located, with some name that you give it, and a ‘.py’ extension. Note that the filename should NOT have any spaces in it (underscores are fine).
Now,
import
the classes and functions from that file into the notebook, and check that you can use them.
### BEGIN SOLUTION
# save the above in crave.py
import cravings as crave
tonkotsu = crave.Noodles()
print(crave.eating(tonkotsu))
print(crave.soup(tonkotsu))
### END SOLUTION
Part III: pandas
#
pandas
: data#
FiveThirtyEight makes a lot of their data publicly available, so to get some practice using pandas
, read in a dataframe directly using the pandas
function read_csv()
. The input to this function should be the following URL, as a string: https://raw.githubusercontent.com/fivethirtyeight/data/master/voter-registration/new-voter-registrations.csv. Store, this in the object df
.
### BEGIN SOLUTION
df = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/voter-registration/new-voter-registrations.csv')
### END SOLUTION
assert isinstance(df, pd.DataFrame) # correct object type
assert df.shape == (106, 4) # correct no. rows & columns
pandas
: attributes#
pandas
DataFrame objects have a number of attributes. In the cell below, look up the following information specified by the text provided above each code cell.
Note: each cell should have code that looks like df.attribute
, where attribute
is replaced by the attribute that returns the information in the comment
Calculate the number of rows and columns:
### BEGIN SOLUTION
df.shape
### END SOLUTION
(106, 4)
Column names:
### BEGIN SOLUTION
df.columns
### END SOLUTION
Index(['Jurisdiction', 'Year', 'Month', 'New registered voters'], dtype='object')
pandas
: methods#
pandas
DataFrame objects also have a number of helpful methods. In the cell below, look up the following information specified in cell above each code cell.
Each cell should have code that looks like df.method()
, where method()
is replaced by the method that returns the information in the comment.
Note that some methods will also need information within the parentheses following the method name (i.e. df.method(arg=val)
and others may require you to specify the series on which you want to operate (i.e. df['series'].method()
)
see the first 5 rows of df
:
### BEGIN SOLUTION
df.head()
### END SOLUTION
Jurisdiction | Year | Month | New registered voters | |
---|---|---|---|---|
0 | Arizona | 2016 | Jan | 25852 |
1 | Arizona | 2016 | Feb | 51155 |
2 | Arizona | 2016 | Mar | 48614 |
3 | Arizona | 2016 | Apr | 30668 |
4 | Arizona | 2020 | Jan | 33229 |
determine which different months are included in df
:
### BEGIN SOLUTION
df['Month'].unique()
### END SOLUTION
array(['Jan', 'Feb', 'Mar', 'Apr', 'May'], dtype=object)
determine how many different months are included in df
:
### BEGIN SOLUTION
df['Month'].nunique()
### END SOLUTION
5
determine how many times each month shows up in df
:
### BEGIN SOLUTION
df['Month'].value_counts()
### END SOLUTION
Month
Jan 24
Feb 24
Mar 24
Apr 24
May 10
Name: count, dtype: int64
calculate basic summary statistics on New registered voters:
### BEGIN SOLUTION
df['New registered voters'].describe()
### END SOLUTION
count 106.000000
mean 48223.462264
std 48596.080089
min 589.000000
25% 19137.500000
50% 33301.500000
75% 55257.500000
max 238281.000000
Name: New registered voters, dtype: float64
The End!#
This is the end of this Coding Lab!
Be sure you’ve made a concerted effort to complete all the tasks specified in this lab. Then, go ahead and submit on datahub!
This “blank” cell included intentionally. Do not do anything here. (It’s being used in grading.)