Lo-Fi Python

Mar 21, 2020

Automating pytest on Windows with a .bat file, Python, Task Scheduler and Box

Automatic pytest Testing and Failure Alert Monitoring

This is my solution to replace manually running pytest each day in command prompt. I want to automate running pytest every day, test if my automated python scripts ran smoothly and get notified if any tests fail.

Installing pytest, a Python testing library:

python -m pip install pytest

A Few Words on pytest

It is a unit test framework in python. pytest expects you to write each test as a self-contained function. One python file can contain many different test functions.

Writing a Test

Let's use test_file_date.py as our test, which uses the glob module and os.getmtime to get the csv with the most recent modification dateon my desktop. Then it tests if that date is today, in my case, for an expected daily file drop.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from datetime import datetime, date
import glob
import os
import getpass


def test_csv_date_equals_today():
    """The match format is for a Windows path with Unix style pattern matching."""
    match = f"C:Users/{getpass.getuser()}/Desktop/*.csv"
    # Get the most recent csv from a folder.
    csv = sorted(glob.iglob(match), key=os.path.getmtime)[-1]
    csv_timestamp = os.path.getmtime(csv)
    csv_date = datetime.fromtimestamp(csv_timestamp)
    print(csv_date.day)
    print(date.today().day)
    assert csv_date.day == date.today().day

Here's the pytest text output when the test is passing:

============================= test session starts =============================
platform win32 -- Python 3.8.1, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: C:\
collected 1 item

..\..\Users\erick\Desktop\test_file_date.py . [ 14%]
                                                                   [100%]

============================== 1 passed in 0.28s ==============================

Creating a Task with Windows Task Scheduler

If you haven't used python with Windows Task Scheduler before, my previous post on creating a task may help you. We'll create two of them for our testing system.

Adding Your Task Scheduler Program: a Windows .bat file

Add your username to the text below and adjust the paths to your system. Then save a Windows .bat file with this text, which points to your pytest.exe file:

cmd /c "C:\Users\your_username\Desktop\sandbox\Scripts\pytest.exe --capture=sys" ^
C:\Users\your_username\Desktop\test_file_date.py > C:\Users\your_username\Desktop\sandbox\Test_Results\Test_Results.txt

This example is referencing an .exe within a hypothetical "sandbox" virtual environment, located on my Desktop. If you have a virtualenv or venv, check the Scripts folder. (Bin on Linux.)

Breaking this out, there are five .bat files parts:

cmd /c "C:\Users\your_username\Desktop\sandbox\Scripts\pytest.exe --capture=sys"

Windows' cmd command takes a program, so we're passing pytest. The --capture=sys argument tells pytest to capture the test results. Note: switching cmd /c to cmd /k forces the terminal to stay open when you are testing your bat file. You can double-click your .bat file to test run it.



^

circumflex represents a line continuation in Windows batch files for better readability



C:\Users\your_username\Desktop\test_file_date.py

Next we're passing our python file as an argument to pytest, testing our file's modified date matches today's date.

>

This is a Windows redirect. It redirects the pytest output from sys to a text file, which is the last argument in our .bat file:

C:\Users\your_username\Desktop\sandbox\Test_Results\Test_Results.txt

Browse to select your .bat file for your Windows Task Scheduler task:

bat_task

Reading the Tests and Triggering Alerts

Passing tests signal your scripts are running successfully. When things don't work, email alerts of the failure help us respond quickly.

Let's set another task scheduler job to run read_test_results.py, to run a few minutes after the first job each day. See this example of running Python with Task Scheduler if you haven't triggered a python script from Task Scheduler before.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from datetime import date
import getpass
import logging
import os

"""Automate pytest with Windows Task Scheduler. Use Task Scheduler run a batch file.
The batch file runs pytest and captures our pytest function results to sys.
If a text file contains a failure or error, write the test contents into a folder.
"""
logging.basicConfig(
    filename="Automated_Testing_Alerts.log",
    level=logging.INFO,
    format="%(asctime)s - %(message)s",
    datefmt="%d-%b-%y %H:%M:%S",
)
logging.info("Checking for failed tests...")
directory = f"C:/Users/{getpass.getuser()}/Desktop/test_results/"
test_results = [fname for fname in os.listdir(directory) if ".txt" in fname]
for txt_file in test_results:
    file_path = directory + txt_file
    with open(file_path) as f:
        text = f.read()
    if "FAILURES" in text:
        directory = f"C:/Users/{getpass.getuser()}/Desktop/send_failure_alert/"
        name = f"{directory}{txt_file}_Failed_Results_{date.today()}.txt"
        with open(name, "w+") as f:
            f.write(name)
            f.write(text)
    else:
        print("No failed tests found in file:")
        print(txt_file)

Setting up Email Alert Notifications on a Box Folder

The above script checks the test results and creates a file with any failed tests in a different folder. I edited the folder's settings to email me when a new file is added, thanks to Box notifications. We use Box for secure file sharing at my day current day job.

Alternatively for those without Box, you could use 'ole trusty smtplib to send the failure alerts with python. I chose the easier, ready to go option. Remember, "Simple is better than complex."

Also, the pywin32 module has an interface to Outlook that is very handy. For an example of sending a Windows Outlook email, check the very end of this post I wrote on "Scripting Windows Like a Pro".

We now have email monitoring alerts if our csv file date test fails, thanks to:

  • Windows Task Scheduler (two tasks running each day in succession)
  • python/pytest
  • a Windows .bat file
  • Box (or smtplib or pywin32) email alerts

In Summation

  1. The first task runs a .bat file to run pytest and create a text file with daily automation test results.
  2. The second task runs a python file. Setting in motion:
  3. Checking the test result text files, looking for failed tests
  4. Creating a text file with any failing tests in a Box folder, if found
  5. Then Box emails me if any test fails.

This was the first time I successfully deployed a Windows batch file. It took me many tries and googling to properly format of the .bat file. They are worth understanding and seem to open up many possibilities on Windows. In this case it was a "glue" piece that allowed me to accomplish my goal, automated testing and alerts for my python scripts.

What we learn along the way shapes us. Learning to use shell commands with Python scripts extends their abilities to help you get stuff done. I've benefitted from improving both my Windows and Ubuntu shell knowledge, which can sometimes be a handy substitute or complement to python. Now, time to write more tests. Thanks for reading!

See also:

pytest-csv: write test results to a csv with this plugin

Read more about software testing in my post here.

Jan 28, 2020

Lessons Learned from "Lost in Space" on Netflix

Being a developer has similarities to the life of a family surviving in space. Sure, the stakes are lower when writing code versus being chased by alien life forms, but the core idea is the same. One is faced with a problem and it must be solved, with varying degrees of consequences for failure.

I enjoyed watching the Robinsons solve problem after problem to stay alive in alien worlds and the vacuum of space. I believe you can learn from anything. So in that spirit, here are some lessons from the show for us problem solving humans:

  1. tayrobDon't panic when you're in trouble. A cool head is vitally important to your survival chances when you're in a bind. Judy demonstrates this quality in evading the raptor-like creatures chasing her.
  2. Problems solved often lead to more problems to solve. We would be wise to anticipate what might happen if we do succeed.
  3. Be cautious when venturing into unfamiliar territories. You never know what could be there to trip you up. Which leads me to my next point...
  4. Respect nature's ability to f*ck you up. The environment you're in, other organisms, and bacteria can wreak havoc. Despite our perceived self-importance, nature is our god. We are but specks in its creation. Cases in point: the metal-dissolving virus that destroys Judy's chariot and the poisonous algae Don accidentally discovers that immobilizes him.
  5. Know when to go for broke, or when to re-group and wait to make your next move. The Robinsons spent 7 months stocking up at their new beach home. Until they suffered a setback...beach
  6. Adapt your plan when new information becomes available or unexpected events happen. They will happen.
  7. Think of unconventional ways to use the resources you do have to fix the problem at hand. Who would have thought a spaceship could be retrofitted into a boat?
  8. Things often work 83% of the way you expect them to work. Sometimes that's enough to move on. Or maybe it buys you enough time to figure out your next move. The electrical field stalled the fleet of robots long enough for the Robinsons to realize the children should "Lord of the Flies" their way to Alpha Centauri.
  9. Question your own motivations, and be open to changing course if you are wrong. Adler realized he should help Will save Scarecrow's life after first trying to stop him. Scarecrow then exhibits this same quality when he turns against his robot comrades to save the children.
  10. Ethics and morals may conflict with each other. Was Maureen Robinson wrong for fraudulently saving her son Will? Ethically, yes. But morally, as a mother saving her son? No.
  11. Make time for your family and friends. Maureen was so busy saving the family that she didn't read Penny's book. Stop and smell the roses every once in a while. Or read your daughter's damn book.
  12. Humans are capable of both great compassion and horrible cruelty. We're running from alien robots, but sometimes we're the monsters? Cough cough Smith and Hastings... cough. Excuse me. (smh 🤦)
  13. robotI'll spare you any parallels about modern artificial intelligence. The best, most grounded definition I've heard of modern AI is "automated cognition".
  14. Self-sacrifice is an honorable death to save the lives of others, but can be frustrating to watch on TV. "Did you really have to kill yourself there?" - me, several times during the show 😆

Jan 25, 2020

"Git" The Basics: A Version Control Cheat Sheet

I am finally starting to understand git version control! It makes developing a project on different computers easy. Some of these notes were picked up from the super informative Reproducible Data Analysis in Jupyter video series by Jake VanderPlas, author of the Python Data Science Handbook.

First, go here to download and install git if you haven't yet. Alright, here are my go-to steps and commands for git:

Beginning a New Project With Git

  1. Create new repository on GitHub
  2. Add a README
  3. Add a Python gitignore
  4. Add a license. Jake V. used an MIT license.

Clone Any Repository from GitHub

  1. Visit your new project, or any GitHub project, click the green "Clone or download" button and copy the link.
  2. In your terminal or command prompt, navigate to the directory where you want to clone your project.
  3. In terminal, enter: git clone REPO_URL
  4. Now cd into your project folder.

Push Your Local Computer Changes to the Remote Repository

Let's say you did some work on your computer and want to push the changes to GitHub. Enter these commands in terminal:

1
2
3
git add .
git commit -m "Add your commit note here"
git push origin main

Above: "git add ." stages all files in project directory for main. Add your commit message, then push your changes to the remote repo.

Fetch Changes From Main Branch to Your Local Computer

You might want to update your local computer with any main branch changes before beginning work on it. Enter these commands in terminal:

1
2
git fetch
git pull origin main

Review Merge Conflicts

Sometimes, your code may conflict with changes in the main branch. You'll find out if you try to push or pull changes and the auto-merge fails. Use "git status" to locate the files with the conflicts. Enter in your terminal:

1
git status

Then follow these instructions to review the merge conflicts.

Or maybe you want to discard any local changes, then merge:

1
2
3
git fetch
git checkout
git pull origin main

Recovering from a Corrupted Repository

1
git fsck --full --no-dangling

Additional Reading

Revert local repo to a previous commit, then push your repo to the remote.

1
2
git reset --hard rollback_commit_id
git push origin main

Supplementary Reading

CS Visualized: Useful Git Commands

8 Git Commands I Use Every Day

On Commit Messages

Pandas Library Git Workflows

Dec 26, 2019

Comparing Text Editors on Ubuntu: Atom, Emacs, Sublime, Vim & VS Code

The text editor is a core tool for writing software. I've always used Atom. Lately, I've noticed my Atom text editor was bogging down on my Chromebook running Ubuntu 16.04. Keystrokes and mouse movements were lagging and slowing my coding down. I'm also getting low on disk space, which may be a related issue.

You'll want to choose a text editor based on how it suits your own needs. In this case, I want a light-weight, responsive editor with no lag. Bells and whistles are less important. I'm also looking to minimize disk space required.

I decided to compare the apt installed package size of some popular editors. First, I installed Atom, Emacs, Sublime, VS Code and Vim using the Ubuntu 16.04 terminal. You could also consider using IDLE, python's built-in text editor as an alternative that requires no additional software.

Then I found the below command to list all installed apt packages by package size on Ubuntu:

dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n

Here's the terminal output with relevant packages in Megabytes (including Firefox for comparative size):

136    emacsen-common
366 vim-common
1071    vim-tiny
2400    vim
21648   emacs26
26870   vim-runtime
34033   sublime-text
70307   emacs26-common
193694  firefox
236965  code
607932  atom

Atom is by far the largest package I downloaded. It is nearly three times the size of VS Code, my second largest package. The next largest was my web browser, Firefox. Most of the other packages I downloaded were considerably smaller.

Side note: I also found out Ubuntu 16.04 ships with a stripped down version of Vim called vim-tiny.

Total Installed Package Sizes in Megabytes (Smallest to Largest)

Some of these editors have multiple packages they are dependent on.

Text Editor Total Installed Packages Size Packages Installed
Vim 29,636 MB vim, vim-common, vim-runtime
Sublime 34,033 MB sublime-text
Emacs 91,955 MB emacs26-common, emacs26, emacsen-common
VS Code 236,965 MB code
Atom 607,932 MB atom

Results: Vim and Sublime win for smallest installed package size.

  • Vim is the lightest-weight package of these 5 popular text editors, with Sublime not far behind. Emacs is comparable to them for usability and relatively small.
  • Atom is nearly 3x the size of VS Code and 20x the size of Vim. Atom and VS Code are larger than the Firefox browser package, the third largest of any downloaded package on my system.
  • I tested out all of the editors by opening the same Python file and making some edits. I found Vim, Emacs and Sublime were much more responsive than Atom and VS Code.

Conclusion: All Editors Are Not Created Equal

These are fine editors when paired with the right machine and developer needs. In my case, a Chromebook running Linux installed with Crouton, a few editors are performing faster and taking up less space. I've chosen Sublime or Emacs as my editor for this computer. One slightly annoying feature of Sublime is being prompted to buy the paid version from the trial version. Atom, Emacs, Vim and VS Code are free. Vim might be a good option if I ever decide to conquer its notoriously high learning curve.

[Bonus] Find the size of all packages matching with "vim" in their name:

dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n | grep vim

Terminal output from above command:

366 vim-common
1071 vim-tiny
2400 vim
26870 vim-runtime

Additional Reading

What is the best lightweight text editor?

Atom as an Editor is Too Big

Dec 21, 2019

Copying a pandas Dataframe to Google Sheets with pygsheets

Disclaimer: This endeavor was before I discovered AppScript, which may be an alternative solution to using pygsheets or other python libraries. pygsheets is interesting, but it could be a stretch to justify using it for something that could be done with AppScript. Both are ways to solve a problem by automating Google Sheet operations.

This was done on the Windows 7 OS. First, install libraries with pip. Enter in command prompt or terminal:

python -m pip install pandas
python -m pip install numpy
python -m pip install pygsheets

After installing necessary libraries, follow the steps documented by pygsheets:

  1. Create a Google Developer Account at console.developers.google.com
  2. Enable Sheets API to account
  3. Enable Drive API to account. Same as last step, but search for Drive.
  4. Create a Client Secret json file. Select "Credentials" tab, and "Create Credentials". Select Client Secret from options. Export from console and place in same directory as your .py file.
  5. Create a Service Account json file by selecting it instead of "Client Secret".
  6. Authorize pygsheets with your json files. See below.
  7. Copy spreadsheet to Google Sheet with pandas and pygsheets. See below.

After completing the first 5 steps, import pygsheets and authorize your account with the client secret json file:

1
2
import pygsheets
gc = pygsheets.authorize(client_secret='path/to/client_secret[...].json')

You will be prompted by the terminal to go to a hyperlink in a browser, get your authorization code, and enter that authorization code into the terminal.

Now, import both libraries needed and switch to authorize with your service json file. Then, load the csv to a dataframe with pandas. Finally, copy it to an existing Google Sheet with pygsheets:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import pygsheets
import pandas as pd

"""Select worksheets by id, index, or title."""
gc = pygsheets.authorize(service_file='path/to/service_account_credentials.json')
sh = gc.open('add_google_sheet_name_here')
wks = sh.worksheet_by_title('add_sheet_tab_name_here')

"""Set a pandas dataframe to google sheet, starting at 1st row, 1st column"""
df = pd.read_csv('Example_Data.csv')
wks.set_dataframe(df,(1,1))

[Example] Split and upload a sheet with 40 columns

Google Sheets limits importing to 26 columns and 1,000 rows at a time. So you'll have to load the sheets in chunks if you have more than that. This approach uses numpy's array_split:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import pygsheets
import pandas as pd
import numpy as np

gc = pygsheets.authorize(client_secret='path/to/client_secret[...].json')
sh = gc.open('add_google_sheet_name_here')
wks = sh.worksheet_by_title('add_sheet_tab_name_here')
df = pd.read_csv('Data_to_GSheets.csv')

# split columns into two dataframes with numpy and pandas
first_half_cols, second_half_cols = np.array_split(df.columns, 2)
first_half = df[first_half_cols]
second_half = df[second_half_cols]

# set both dataframes side-by-side in Google sheet
wks.set_dataframe(first_half,(1,1))
start_column = first_half.shape[1]
wks.set_dataframe(second_half,(1, start_column))

Conclusion

I found the terminal error messages from pygsheets to be very helpful while debugging the above. This module offers many other nifty spreadsheet operations. Solid library. You can now create and edit Google Sheets with Python.

AppsScript should probably be the default tool when working with Google Sheets because it is built in, but Python does have tools available to work with Google Sheets.

Resources

pygsheets Github

pygsheets Documentation

Google Sheets Documentation

pandas Documentation

Dec 05, 2019

A Collection of Software Testing Opinions for Python and Beyond

I am a beginner to testing my code. I wanted to write about testing to better understand it. While shaping this link fest masquerading as an essay, I collected great ideas from people who are way more experienced than me. You'll find a few of my thoughts, a Pytest example I use to monitor files, ideas for unit testing, property testing, test driven development and many other commonly used software tests.

An Introductory Rant on Testing

Over several years as a programmer, I've slowly grasped the landscape of testing in software development. After moving beyond my first few tutorials and projects, it seemed very noisy to sort out. Examples provided are usually simple assertions that seem tough to relate to a real use case. It might be easy test the wrong things. Plus, some don't do it at all! The quality of the tests is more important than the quantity. But what makes a quality test? Where's the balance between testing every minute detail of a program and not at all?

Yeah. And the worst thing that happens is that you get people that just stop thinking about what they’re doing. “This is the principle, to always write unit tests, so I’m always going to write unit tests,” and then they’re just not thinking about how they’re spending their time, and they wind up wasting a lot of it.

Joel Spolsky, Stack Overflow Podcast #38

Implementing software tests is a best practice for maintaining code, but seems ambiguous to someone who has not tested any code before. I guess the best way is to read open source projects with test suites, but those can be tough to find. How do you know a good test suite when you see it? Maybe the maintainers went rogue and off the deep end with tests. Online, everybody says you should test your code, is the emperor wearing any clothes?

As a beginner stumbling across articles on testing, these questions were tough to answer. As with most things in programming, figuring out the right question to ask is a challenge in itself. Codeacademy and Coursera never mentioned anything about writing tests. On the other hand, Django includes testing in its tutorial and documentation. Also, most languages come with built-in testing tools. Python has the unittest library.

Why test at all? First, some solid benefits of software testing:

  • With tests on your code in place, you can implement changes and have confidence the code still works if the tests pass. This gives developers more confidence to iterate and improve an application.
  • Detect problems faster. Passing tests are a good indicator that your programs are actually doing what they're supposed to do. If they don't pass, you likely found a bug you might have missed otherwise.
  • When you find a bug, you either need to amend your code, or your tests. write a test for that bug and then fix it. Either that, or you need to be amend your tests. You've just improved the quality of your test suite.
  • Automation. If you are writing tests, those tests can be automated. If you are manually checking the results of your program, you're missing a chance to automate those checks away. I haven't applied it yet, but have heard the Tox library may be useful to automate tests related to Python packaging. For more on automating tests, see this PyCon talk, Three Excellent Python Tools to Automate Repetitive Tasks.
  • Test Driven Development can decrease the time spent debugging code. This claim sometimes lacks empirical evidence, supporting evidence tends to be anecdotal.

A good unit test, therefore, is one that helps enforce the contract to which the function is committed.

If a good unit test breaks, the contract is violated and should be either explicitly amended (by changing the documentation and tests), or fixed (by fixing the code and leaving the tests as is).

A good unit test is also strict. It does its best to ensure the output is valid. This helps it catch more bugs.

Pytest and Unit Testing in Python

This is where the Python hits the pavement. Unit tests are generally liked, although some prefer property tests or integration tests because they think the scope of unit tests is too narrow. The unittest library is Python's default testing framework. However nowadays, pytest seems to be the preferred unit testing framework for Python. Hypothesis is another popular framework I've read about.

Pytest Testing

Tests start to lose signal when Mock becomes routine instead of a reluctant workaround. - Brandon Rhodes, When Python Practices Go Wrong

Testing in Python \ General Unit Testing Ideas

I think hypothesis is probably underrated—some libraries are hesitant to incorporate it into their testing frameworks, but I think the property-based testing has real potential to catch scenarios humans would have a hard time anticipating, or at least that would take a long time to properly plan for. I find that hypothesis almost always adds a few useful test cases I hadn’t thought of that will require special error handling, for example.

Tyler Reddy, SciPy Release Manager

Integration \ Property Tests

Traditional, or example-based, testing specifies the behavior of your software by writing examples of it—each test sets up a single concrete scenario and asserts how the software should behave in that scenario. Property-based tests take these concrete scenarios and generalize them by focusing on which features of the scenario are essential and which are allowed to vary. This results in cleaner tests that better specify the software’s behavior—and that better uncover bugs missed by traditional testing.

Assertions

Assertions are generally accepted as welcome additions to your code.

In reality, the safety and restraints that these code carabiners provide actually give you more freedom to take risks in your coding. If you want to try out some risky feature, refactoring, or external library, you know something is wrong as soon as one of your assertions or tests fail and can undo back to an earlier working state.

Phillip J. Guo, Code Carabiners, (Link Broken)

Test Driven Development

Eventually, you'll discover the evangelists preaching Test Driven Development. There are certain discussions which polarize us in the software development world, such as the appropriate scenarios to deploy this system of development.

Opinions vary widely on the merits and appropriate application of TDD. I'm admittedly skeptical but do see the merits of TDD. But which flavor? Where do unit tests and integration tests fit in? How many tests should I write? What exactly should I be testing? This essay claims anyone pair programming software with an expected life of 3 or more years should use Test Driven Development.

"Test Driven Development is a tool for continuously evaluating hypotheses."

General Testing Ideas and Principles

Other common tests types:

Unit test: when it fails, it tells you what piece of your code needs to be fixed.

Integration test: when it fails, it tells you that the pieces of your application are not working together as expected.

Acceptance test: when it fails, it tells you that the application is not doing what the customer expects it to do.

Regression test: when it fails, it tells you that the application no longer behaves the way it used to.

Testing maturity level progression:

  1. No tests
  2. Occasional, slow, unreliable tests
  3. Semi-comprehensive integration tests
  4. Fast, comprehensive unit tests comprise the bulk of testing
    • Dependency injection
    • Composable subsystem design
  5. Real-time test feedback (ideally integrated into the editor)
  6. Tests are extremely reliable or guaranteed reliable by the type system
    • With tooling that tracks the reliability of tests and provides that feedback to authors.
  7. Fuzzing, statistically automated microbenchmarking, rich testing frameworks for every language and every platform, and a company culture of writing the appropriate number of unit tests and high-value integration tests.

I recently wrote my first unit tests with pytest. Below is a script named test_file_date.py. It tests if the day of month of the most recently changed file in a directory matches today's day. To install pytest, enter into command prompt or terminal:

python -m pip install pytest

test_file_date.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import glob
import os
from datetime import datetime, date

# The dir_query format is for a Windows path with Unix style pattern matching.
def test_csv_date_equals_today():
    dir_query = 'C:\\Users\\your_username\\Desktop\\*.csv' # specify csv extension and folder
    file_path = sorted(glob.iglob(dir_query), key=os.path.getmtime)[-1] # get most recent file
    file_timestamp = os.path.getmtime(file_path)
    file_date = datetime.fromtimestamp(file_timestamp)
    print(file_date.day)
    print(date.today().day)
    assert file_date.day == date.today().day

Run the test with pytest by entering:

pytest test_file_date.py

Conclusion I write programs for personal productivity and to automate processes. The scope of problems my code solves has grown with my programming ability. I'm now reaching the point where I can apply tests to my advantage. However, sometimes I'll write a quick-hitter script for which I can't justify writing tests. Beyond those cases, testing can help if you pick the right style for your project. More so for recurring, automated processes.

It feels pretty cool when your tests run and you know with more certainty whether a part of your program is getting the job done or not. After setting up my first test with pytest, I have leveled up to novice tester, instead of blissfully not knowing what I don't know about testing. That's a step in the right direction.

Nov 30, 2019

Inserting New Rows Into A Microsoft Access Database With Python and pyodbc

I recently automated the loading of data into a Microsoft Access database with pyodbc, a Python library for connecting to databases. ODBC stands for Open Database Connectivity. It can be used for a variety of Database Management Systems outside of Access also.

First, install libraries with pip. Enter in terminal or command prompt:

python -m pip install pyodbc
python -m pip install pandas

Next, check available Microsoft Access drivers on your computer. Enter the below statements into the Python interpreter:

1
2
3
python
>>> import pyodbc
>>> [x for x in pyodbc.drivers() if x.startswith('Microsoft Access Driver')]

Drivers for Access & Many Other Data Sources

The driver is the engine that allows you to connect to a specific type of database. The drivers available vary depending on your machine.

The two most common drivers for Access are Microsoft Access Driver (.mdb) and Microsoft Access Driver (.mdb, *.accdb). My computer only had *.mdb, which has been deprecated. My Access database was a .mdb file, so I was able to use this driver as shown below. Read more on Access drivers here.

"Drivers exist for all major DBMSs, many other data sources like address book systems and Microsoft Excel, and even for text or comma-separated values (CSV) files."  - Wikipedia

Database Data Types

I set all of the field data types to "Short Text" because I'm passing strings as SQL parameters below. Uploading as other data types may require additional formatting. To edit the data types of your table, open the table and select "Design View" under the "Home" tab. It got the job done for me!

Inserting new rows into a Microsoft Access Database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import pandas as pd
import pyodbc


def df_to_row_tuples(df):
    """Use list comprehension to format df rows as a list of tuples:
    rows = [('Garfield','Orange','Eat'),('Meowth','White','Scratch')]
    """
    df = df.fillna('')
    rows = [tuple(cell) for cell in df.values]
    return rows

"""Rows are not added to DB until they are committed.
Pass each row tuple as a SQL parameter (?,?,?).
cursor.execute docs: https://www.mcobject.com/docs/Content/Programming/Python/Classes/Cursor/execute.htm
"""
conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb)};DBQ=C:\Add_Path\To_DB\Here\Your_DB.mdb;')
cursor = conn.cursor()
sql = ''' INSERT INTO Cats (Name, Color, Move)
          VALUES(?,?,?) '''
df = pd.read_csv('Cat Data.csv')
rows = df_to_row_tuples(df)
for row in rows:
    cursor.execute(sql, row)
conn.commit()

Conclusion

Running the above in command prompt uses pyodbc and SQL to add dataframe rows to a Microsoft Access DB table named "Cats". Passing each row as a SQL parameter has two benefits:

  1. It handles strings with single quotes (') and loads them to the DB.
  2. It protects against SQL injection attacks.

Access Limitation Disclaimer

Access topped out just shy of 10 million rows in my use case, when records stopped getting added to my database. So keep that in mind if you're thinking about using Access to store your data.

Supplementary Resources

Insert Values into MS Access Table using Python

pyodbc documentation

Microsoft Documentation pyodbc example

The Python Cursor Class

Psycopg Cursor Class Documentation

Apr 25, 2019

Installing Debian 9 Stretch Linux OS on a Dell Inspiron Laptop and Configuring the Wifi Network

Yesterday, I converted an 11-year old Dell Inspiron E1505 from Windows XP to Debian 9 Stretch. I may have overwrote my Windows XP OS. I do not care if I lost it since it's a vulnerable and outdated OS, which is no longer supported by Microsoft. I encountered difficulty with getting the wifi to work on Debian, but was able to find a solution using Wicd. Here are the steps I followed to do it all.

Using The Debian Installer-Loader

  1. First, back up your Windows computer files. Then download the Debian-Installer Loader Windows executable from the Debian wiki.
  2. Click the downloaded executable and IMG_20190423_184816999follow instructions. I followed the default settings all the way through.
  3. You may need to choose your own partitioning settings to ensure Windows is preserved if desired.
  4. During installation, choose your Linux collection. I chose Xfce because it seems to be highest ranked among Linux users and "not just helpful for older computers where few system resources are available, but also simply for those who want to get the most out of their systems." Gnome and KDE are other popular options.
IMG_20190423_184023738

After completing installation, restart your computer and select your new OS on boot-up. The following error codes displayed for me while starting up, signaling missing wifi firmware.

ERROR Failed to load firmware!
b43ssb0:0: firmware: failed to load b43/ucode5.fw (-2)
b43ssb0:0: firmware: failed to load b43-open/ucode5.fw (-2)
b43-phy0 ERROR: You must go to https://wireless.wiki.kernel.org/en/users/drivers/b43#devicefirmware and download the correct firmware for this driver version.

It took me a few boot-ups before I realized what this error message meant. In the rest of this post I am trying to figure out and fix what is wrong before I saw the error message. I enjoyed learning how to introspect Linux networks, but if I were trying to fix this problem again, I'd go to directly to *`this page <http://linuxwireless.sipsolutions.net/en/users/Drivers/b43/>`__*, which is linked to from the link in the error message, and try the solution there first.

Post Installation Setup

Open up the terminal once you're into your new Desktop OS and enter the below commands.

su -
apt-get install sudo -y usermod -aG sudo yourusername

1) Enable yourself as root user.

2) Install sudo.

3) Give yourself sudo user permission.

Optional: Replacing Network-Manager With Wicd

This Debian 9 package ships with Network-Manager. After logging in, I wasn't sure why wifi was not working, so I decided to remove Network-Manager and install Wicd. (This was before I realized what the error code displayed on boot-up meant.) Wicd is a Linux network managing alternative and it's built with Python, by the way. I followed these instructions to execute the below commands.

uninstall nm
sudo apt-get install -d --reinstall network-manager network-manager-gnome
install wicd
sudo apt-get install wicd-gtk

After installing Wicd, my Ethernet connection was not working. This fixed it for me:

sudo ifconfig eth0 up

Troubleshooting Linux Wifi & Inspecting Your System

Now, let's check for enabled network interfaces. "wlan0" is usually the name of the wireless interface. Does wlan0 show when you enter this command? If not, then you may need to update your wifi firmware. This was the case for me. Below is an output where wlan0 is correctly configured.

sudo ifconfig
eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 00:25:a5:cf:38:7d txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 17
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1 (Local Loopback)
RX packets 4 bytes 240 (240.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4 bytes 240 (240.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 191.142.1.297 netmask 255.255.255.0 broadcast 182.138.5.255
inet6 2601:241:8c00:50ea:21a:92ff:fe0d:7531 prefixlen 64 scopeid 0x0
inet6 fe80::22a:42tf:fe0d:7531 prefixlen 64 scopeid 0x20 ether 00:2a:92:2d:45:51 txqueuelen 1000 (Ethernet)
RX packets 8509 bytes 4639778 (4.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6206 bytes 923792 (902.1 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Check wlan0 is not blocked on kill list.

sudo rfkill list all

Some computers have a "kill switch" for wifi. This command lists any blocked interfaces. If it is blocked, this thread might be useful. If nothing shows when you run this command, or if you see wlan0 is not blocked, carry on.

Check which wifi controller you have.

This thread provides more info on what this means.

lspci -nn | grep -e 0200 -e 0280
03:00.0 Ethernet controller [0200]: Broadcom Limited BCM4401-B0 100Base-TX [14e4:170c] (rev 02)
0b:00.0 Network controller [0280]: Broadcom Limited BCM4311 802.11b/g WLAN [14e4:4311] (rev 01)

Find your system architecture. This determines which firmware you should download in the next step.

sudo dpkg --print-architecture

First, read here to determine the right packages for your Linux system. Then download the appropriate missing wifi firmware. For Debian, I downloaded the two packages below.

  1. b43-fwcutter 2) b43-installer

"cd" into the directory with .deb files. Run the below commands to install the new firmware, then reboot your computer. The last two commands are adapted from this thread.

sudo dpkg -i firmware-b43-installer_019-3_all.deb
sudo dpkg -i firmware-b43-fwcutter_019-3_i386.deb
sudo modprobe -r b43
sudo modprobe b43

Edit Wicd preferences to set wlan0 as the wireless interface if needed.

change_wicd_settings

Success! Wireless networks are now showing.

wicd_success

Wrapping Up

I'd like to thank the awesome people who contributed to the Debian Installer-loader and all the help in Linux forums that enabled me figure this out. I'm new to the world of Linux but already enjoying diving into this operating system. Its ability to do just about anything from the command line are a lot of fun. I am now running two Linux systems, one on my Dell and another running Ubuntu that I installed on a Chromebook with Crouton. Both have been relatively painless to set up. It this case, it turned a sluggish laptop into a very capable machine. They should call it Lit-nux :)

Full Disclosure

This worked on my computer for a few days before the keyboard stopped working correctly on my computer. Typing became impossible because the keys didn't work or entered the wrong letters when pressed. I'm not sure what the cause of it was, but consider that before attempting this on a machine. Be prepared to lose it. If you really need the machine to be functional, it may not be a great idea to try this. This was attempted on an old beat up computer. I would try this method of porting a Windows machine to Linux again as a salvage project or on a low-risk Windows machine if I had one lying around.

Feb 12, 2019

Lightning Scripts ⚡

You may or may not be familiar with PyCon's Lightning Talks, a session of quick hitter talks given each year on a variety of topics. These "Lightning Scripts" are ten of my favorite Python scripts that have served me well as of late to perform a variety of tasks.

Highlights include the glob, os, calendar, webbrowser and pandas modules. Plus, how to convert a dataframe into a pdf and how to take a snapshot of a website with pdfkit. The webpage pdf was not the greatest snapshot in some cases, but did capture text and many of a live website's visual elements.

If you have Python installed, paste the script into a text editor and run your .py file from the command line or however you like to run your scripts. You may have to change file paths to work on your operating system.

The first 5 scripts use modules that ship with Python.

(1) Uppercase all of the file names in the current working directory.

1
2
3
4
import os
filenames = os.listdir(os.getcwd())
for filename in filenames:
    os.rename(filename, filename.upper())

os module documentation

(2) Get all filenames in the current directory and write to a text file.

1
2
3
4
5
import os
folder_contents = os.listdir(os.getcwd())
with open("list_of_dir_files.txt","w") as fhand:
    for item in folder_contents:
        fhand.write('f{item}\n')

(3) Check what day of the week is today.

1
2
3
4
5
6
7
import calendar
from datetime import date
today = date.today()
if calendar.day_name[today.weekday()] == 'Friday':
    print("Today is Friday.")
else:
    print("Today is not Friday.")

calendar documentation

(4) Get the two most recent file names from a directory. This is a Windows file path example. I have escaped the backslashes below to make it work in my Atom text editor.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import glob
import getpass
import os
# getctime == by created date, getmtime == by date modified
# get all files, regardless of extension
folder = f'C:/Users/{getpass.getuser()}/Desktop/'
list_of_files = glob.glob(folder)
newest_file = sorted(glob.iglob('C:/Users/your_username/Desktop/*.csv'), key=os.path.getmtime)[-1]
second_newest_file = sorted(glob.iglob('C:/Users/your_username/Desktop/*.csv'), key=os.path.getmtime)[-2]
print(newest_file, second_newest_file)

glob module documentation: https://docs.python.org/3/library/glob.html#glob.iglob

(5) Auto-Hamilton Ticket Lottery

1
2
import webbrowser
webbrowser.open('https://www.luckyseat.com/hamilton-chi/', new=0)

webbrowser PMOTW: https://pymotw.com/3/webbrowser

To Run The Next 3 Scripts, install pandas:

python -m pip install pandas

(6) Split a csv file into smaller excel files.

1
2
3
4
5
6
7
import pandas as pd
file_name = 'big_file.csv'
# slice off extension
src_file_minus_ext = file_name.split('.')[:-1]
for i, df in enumerate(pd.read_csv(file_name, chunksize=250000)):
    csv = f'{src_file_minus_ext}{str(i)}.csv'
    df.to_csv(csv, index=False, header=df.columns)

pandas beginner tutorial: https://www.youtube.com/watch?v=5JnMutdy6Fw

(7) Convert a .xlsx to .csv

1
2
3
import pandas as pd
df = pd.read_excel("input.xlsx")
df.to_csv("output.csv", index=False)

(8) Convert a .xlsx to .html

1
2
3
4
import pandas as pd
file_name = "data.xlsx"
df = pd.read_excel(file_name)
df.to_html("data.html")

To Run The Last Two Scripts, install pdfkit:

pdfkit relies on another library, wkhtmltopdf, which adds an extra wrinkle to setting it up for Windows users. I've laid out the differences between using pdfkit for Windows vs. Ubuntu below.

python -m pip install pdfkit

installing wkhtmltopdf on Windows

To use pdfkit on Windows, go to the link, choose your version to download. This installs a Windows executable that pdfkit needs to find to work.

https://wkhtmltopdf.org/downloads.html

installing wkhtmltopdf on Ubuntu

sudo apt-get install wkhtmltopdf


**For Windows, **\ `download wkhtmltopdf <https://wkhtmltopdf.org/downloads.html>`__\ ** and add this config patch for pdfkit:**
1
2
3
4
import pdfkit
url = 'https://www.hollywoodreporter.com/news/grammys-alicia-keys-best-moments-as-host-1185013'
config = pdfkit.configuration(wkhtmltopdf='C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe')
pdfkit.from_url(url, 'webpage.pdf', configuration=config)

4 Convert a .html file to .pdf. (Add config patch above for Windows.)

1
2
import pdfkit
pdfkit.from_file('data.html', 'report.pdf')

5 Create a pdf snapshot of a webpage on Ubuntu below. (Add config patch above for Windows.)

1
2
3
import pdfkit
url = 'https://www.hollywoodreporter.com/news/grammys-alicia-keys-best-moments-as-host-1185013'
pdfkit.from_url(url, 'alicia_keys.pdf')

pdfkit documentation: https://pypi.org/project/pdfkit/

Final Thoughts

I am very pleased with pdfkit. I am also toying around with Python's PyFPDF: https://pyfpdf.readthedocs.io/en/latest/index.html and PyPDF2: https://pythonhosted.org/PyPDF2 libraries. Together, these three offer some unique abilities for creating and manipulating pdf files. And the pandas library is amazing for reading and writing tabular data sheets, can't say enough great things about pandas. Worth the time. Hope you enjoyed these Lightning Scripts :)

Nov 25, 2018

Automated Python With Windows Task Scheduler

So you want to run your Python scripts automatically, but how?

I had heard of several popular scheduling libraries in Python like celery, Invoke, and schedule. One of my requirements is to run the python file "in the background", not in command prompt or an open window.

Enter Windows Task Scheduler, the de facto scheduler on Windows 7 computers. I have  scheduled a few scripts and it is working like a charm. In this post, I will schedule an example script to clean up my desktop at the beginning of each day. I have a habit of accumulating many Excel files there throughout the workday. This example automatically moves them into a folder.

Other Windows scheduling alternatives worth mentioning include creating a Windows service, or using schtasks if you prefer the command line.

Here's how to schedule a Python script to run:

  1. Search for Windows Task Scheduler in the start menu. Then select "Task Scheduler Library" to see all of the tasks Windows is running automatically.
  2. In the right toolbar, select "Create Basic Task" and give it a name and description. Note: I selected "Configure for: Windows 7, Windows Server 2008 R2".
general
  1. Set the time and frequency that the program will run in the "Triggers" tab.
  2. Under the "Actions" tab, select "Start a Program" from the dropdown. Under "Program/Script", enter the path to your Python.exe file. I set mine to a Python executable located within my virtual environment, but yours might be found wherever you have Python installed.
C:\Users\your_username\Desktop\36env\Scripts\python.exe
  1. Under "Add arguments (optional)", add the path to your .py script, within quotes:
"C:\Users\your_username\Desktop\36env\clean_desktop_excels.py"
actions
  1. Select additional conditions and settings as desired, such as "Wake the computer to run this task" and "Run with highest privileges".

I am enjoying this simple, easy and convenient scheduling manager for Windows. I figured most of this out thanks to this blog. Below is my script to clean my desktop each morning by moving my Excel files into a folder, using Python's stock shutil and os libraries. Set it and forget it, ya know what i mean? :D

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from shutil import move
import getpass
import os

# Desktop Spreadsheet File Cleaner: get all Desktop files and folders
src = f"C:/Users/{getpass.getuser()}/Desktop"
dir_items = os.listdir(src)
excel_files = [item for item in dir_items if ".csv" in item or ".xls" in item]
dst = f"C:/Users/{getpass.getuser()}/Desktop/Excels"
os.makedirs(dst, exist_ok=True)
for xl in excel_files:
    path_to_file = src + xl
    move(path_to_file, dst)

Additional Reading

Troubleshooting Windows Task Scheduler - Windows Documentation

← Previous Next → Page 11 of 14