CL9: Code Projects#
Welcome to the ninth coding lab!
This coding lab focuses on documentation, writing code that adheres to good code style principles, and code testing.
Part 0: Setup#
For testing out this first question, we’ll use the same dataset from the previous lab, which you’ll have to read in using pandas. Run the cell below before proceeding. Included in the lab is a local copy, to prevent too many connections to the server during grading. The source of the file is https://raw.githubusercontent.com/fivethirtyeight/data/master/voter-registration/new-voter-registrations.csv but it has already been downloaded for you and is available in the lab’s folder.
import pandas as pd
df = pd.read_csv('new-voter-registrations.csv')
Part I: Code Style & Documentation#
For this first question… a function get_most_common is provided:
# edit code style for this function
def get_most_common(d,c):
MC=d[c].value_counts()
a=MC.idxmax();b=MC.max()
return a,b
For this question:
Take a look at the code and examples provided in the cells to understand what the function is doing.
Consider the code style guidelines discussed in class and edit the code in the cell below to make the code more readable.
Add a numpy-style docstring to the function.
Note: the name and the functionality of the code will not change, but the style (parameter/variable naming, spacing, indentation, capitalization, etc.) should be improved.
### BEGIN SOLUTION
def get_most_common(df, variable): # better parameter names
"""
Identify the most common category that shoes up in the column of the DataFrame
specified on input as wel as how many times that value shows up.
Parameters
----------
df : DataFrame
The pandas DataFrame containing the column to be summarized
variable : str
The column in the pandas DataFrame to be summarized
Returns
-------
max_label : str
The category/label that shows up most frequently in the column interrogated
max_value : int
The number of times the `max_label` shows up
"""
# fix indentation; better spacing around operators; improve variable names
most_common = df[variable].value_counts()
max_label = most_common.idxmax()
max_value = most_common.max() # separate out onto two lines
return max_label, max_value
### END SOLUTION
# execute function
get_most_common(df, 'Jurisdiction')
('District of Columbia', 10)
# execute function
# store output in two separate variables
month_common, month_val = get_most_common(df, 'Month')
print(month_common, month_val)
Jan 24
Provided is a function you’ve seen before in A3:
# edit code style for this function
def end_chat(i):
if 'quit' in i:o='Bye';c=False
else:o=None;c=True
return o, c
…but here, it’s got particularly terrible code style. As above, edit the function for code style and add a numpy-style docstring. Its functionality will not change, but its readability and documentation will.
### BEGIN SOLUTION
def end_chat(input_string): # better parameter name
"""
Determine if the word 'quit' is in function's input
Parameters
----------
input_string : str
The string to be analyzed to see if it contains the word 'quit'
Returns
-------
output : str or None
Function returns the string 'Bye' if 'quit' is in the input_string and None otherwise
chat : bool
Function returns False if 'quit' is in the input_string and True otherwise; controls if chat should continue
"""
# fix indentation; better spacing around operators; improve variable names
if 'quit' in input_string:
output = 'Bye'
chat = False # separate out onto two lines
else:
output = None
chat = True
return output, chat
### END SOLUTION
# execute function
end_chat('I want to quit')
('Bye', False)
# execute function
# store output in two separate variables
output, chat = end_chat('I want to quit')
print(output, chat)
Bye False
Part II: Code Testing#
After editing the function above, write a test function test_get_most_common() that will test the functionality of the get_most_common function above.
Note: you’ll likely need to create a dummy dataframe within the function
### BEGIN SOLUTION
def test_get_most_common():
# create data frame for testing
test_df = pd.DataFrame({'col1': ['a', 'b', 'b'],
'col2': [3, 4, 5]})
# execute function
out = get_most_common(test_df, 'col1')
# add assert statements that test function
assert callable(get_most_common)
assert isinstance(out, tuple)
assert out == ('b', 2)
### END SOLUTION
# should pass silently
# when you execute test
test_get_most_common()
# call test function
out = test_get_most_common()
# out should store none if passes silently
assert out == None
Part III: pytest (optional, but highly recommended)#
Let’s try out pytest!
Copy the function
get_most_commonto afunctions.pyfile (in the same directory as this notebook.)Copy the test function
test_get_most_commonto a test filetest_functions.py. Be sure to add the necessaryimportstatement to the top of this file.Execute
pytestin the cell below.
### BEGIN SOLUTION
# executing this should show passing tests
# once the above steps are carried out
# assumes test_functions.py is in same directory
# as this lab
!pytest test_functions.py
### END SOLUTION
=================================================== test session starts ===================================================
platform linux -- Python 3.11.9, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/grader-cogs18-04/source/CL9-Testing
plugins: anyio-4.3.0
collecting ...
collected 0 items
================================================== no tests ran in 0.01s ==================================================
ERROR: file or directory not found: test_functions.py
The End!#
This is the end of this Coding Lab!
Be sure you’ve made a concerted effort to complete all the tasks specified in this lab. Then, go ahead and submit on datahub!
This “blank” cell included intentionally. Do not do anything here. (It’s being used in grading.)