Lo-Fi Python

Apr 06, 2021

Aggregating A Python Error Summary from Log Files

Follow these steps to maintain more reliable scripts and catch more of your traceback errors:

  1. automate your scripts to run daily, weekly, monthly, etc.
  2. Log your traceback errors with the logging module. I tend to dump all of my logs into a single folder.
  3. automate aggregating the logs and parsing tracebacks
  4. start a feedback loop of fixing the tracebacks until 0 tracebacks remain
  5. re-run the script and confirm tracebacks disappeared
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import itertools
import os

def parse_errors(log):
    """look in each log file, line by line for Python error keywords"""
    errors = list()
    with open(log,'r') as f:
        for line in f:
            if 'Traceback' in line or 'Error' in line:
                # replace commas for csv
                line = line.strip().replace(',','')
                errors.append([log,line])
        return errors

# Parse traceback errors from logs in working directory, then write to them to a csv file.
logs = [f for f in os.listdir(os.getcwd()) if '.log' in f.lower()]
tracebacks = [parse_errors(log) for log in logs]
# dedupe list of lists with itertools module + list comprehension
tracebacks = [t for t,_ in itertools.groupby(tracebacks)]
with open('Log Traceback Errors.csv', 'w') as fhand:
    fhand.write('Log,Traceback') # csv header row
    for t in tracebacks:
        for error in t:
            fhand.write(f"\n{','.join(error)}")

This pure python script allows me to hone in on potential automation problem areas with my scheduled Python scripts. It doesn't catch the entire traceback. Rather, it shows the error type and the name of the log file that contains that error in a csv. I use this log aggregation script to monitor my daily or weekly scheduled python scripts, along with pytest tests.

Noteworthy gains from aggregating my logs:

  • less fear of missing mistakes
  • more freedom to improve the code
  • catch the mistakes faster

See also: Python Documentation - Basic Logging Tutorial

Nov 30, 2019

Inserting New Rows Into A Microsoft Access Database With Python and pyodbc

I recently automated the loading of data into a Microsoft Access database with pyodbc, a Python library for connecting to databases. ODBC stands for Open Database Connectivity. It can be used for a variety of Database Management Systems outside of Access also.

First, install libraries with pip. Enter in terminal or command prompt:

python -m pip install pyodbc
python -m pip install pandas

Next, check available Microsoft Access drivers on your computer. Enter the below statements into the Python interpreter:

1
2
3
python
>>> import pyodbc
>>> [x for x in pyodbc.drivers() if x.startswith('Microsoft Access Driver')]

Drivers for Access & Many Other Data Sources

The driver is the engine that allows you to connect to a specific type of database. The drivers available vary depending on your machine.

The two most common drivers for Access are Microsoft Access Driver (.mdb) and Microsoft Access Driver (.mdb, *.accdb). My computer only had *.mdb, which has been deprecated. My Access database was a .mdb file, so I was able to use this driver as shown below. Read more on Access drivers here.

"Drivers exist for all major DBMSs, many other data sources like address book systems and Microsoft Excel, and even for text or comma-separated values (CSV) files."  - Wikipedia

Database Data Types

I set all of the field data types to "Short Text" because I'm passing strings as SQL parameters below. Uploading as other data types may require additional formatting. To edit the data types of your table, open the table and select "Design View" under the "Home" tab. It got the job done for me!

Inserting new rows into a Microsoft Access Database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import pandas as pd
import pyodbc


def df_to_row_tuples(df):
    """Use list comprehension to format df rows as a list of tuples:
    rows = [('Garfield','Orange','Eat'),('Meowth','White','Scratch')]
    """
    df = df.fillna('')
    rows = [tuple(cell) for cell in df.values]
    return rows

"""Rows are not added to DB until they are committed.
Pass each row tuple as a SQL parameter (?,?,?).
cursor.execute docs: https://www.mcobject.com/docs/Content/Programming/Python/Classes/Cursor/execute.htm
"""
conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb)};DBQ=C:\Add_Path\To_DB\Here\Your_DB.mdb;')
cursor = conn.cursor()
sql = ''' INSERT INTO Cats (Name, Color, Move)
          VALUES(?,?,?) '''
df = pd.read_csv('Cat Data.csv')
rows = df_to_row_tuples(df)
for row in rows:
    cursor.execute(sql, row)
conn.commit()

Conclusion

Running the above in command prompt uses pyodbc and SQL to add dataframe rows to a Microsoft Access DB table named "Cats". Passing each row as a SQL parameter has two benefits:

  1. It handles strings with single quotes (') and loads them to the DB.
  2. It protects against SQL injection attacks.

Access Limitation Disclaimer

Access topped out just shy of 10 million rows in my use case, when records stopped getting added to my database. So keep that in mind if you're thinking about using Access to store your data.

Supplementary Resources

Insert Values into MS Access Table using Python

pyodbc documentation

Microsoft Documentation pyodbc example

The Python Cursor Class

Psycopg Cursor Class Documentation