Lo-Fi Python

Jan 08, 2023

pymarketer: an HTTP + Spreadsheet Wrangling Python package

Typically, this blog reviews the other Python libraries in its vast ecosystem. This time, it's my own package I made for fun, pymarketer. This was created in a single day and can be installed from the Github repo. Have a go at my most read post if you need help with pip.

Install with pip from the source Github repo:

python -m pip install git+https://github.com/erickbytes/pymarketer.git

The pymarketer package helps you do things like:

  • merging all the tabs of an Excel file into one CSV
  • generate HTTP code
  • make a word cloud image
  • splitting a CSV
  • merging CSVs

Generating a Word Cloud with the pymarketer Package** via wordcloud

1
2
3
4
5
6
7
8
import pandas as pd
import pymarketer as pm

xl = "Chicago Breweries.xlsx"
df = pd.read_excel(xl)
# Make a wordcloud from a pandas dataframe.
wordcloud = pm.word_cloud(df)
wordcloud.to_file("Text Word Cloud Visualization.jpg")
Python wordcloud example

This package relied on several Python libraries to complete:

I'll likely expand on this in the future. Anyone who wrangles data might be able to apply this package to good profit. At minimum, you might find it interesting to take a look at the project's __init__.py to see how some of the functions are implemented.

Additional Resources

Nov 30, 2019

Inserting New Rows Into A Microsoft Access Database With Python and pyodbc

I recently automated the loading of data into a Microsoft Access database with pyodbc, a Python library for connecting to databases. ODBC stands for Open Database Connectivity. It can be used for a variety of Database Management Systems outside of Access also.

First, install libraries with pip. Enter in terminal or command prompt:

python -m pip install pyodbc
python -m pip install pandas

Next, check available Microsoft Access drivers on your computer. Enter the below statements into the Python interpreter:

1
2
3
python
>>> import pyodbc
>>> [x for x in pyodbc.drivers() if x.startswith('Microsoft Access Driver')]

Drivers for Access & Many Other Data Sources

The driver is the engine that allows you to connect to a specific type of database. The drivers available vary depending on your machine.

The two most common drivers for Access are Microsoft Access Driver (.mdb) and Microsoft Access Driver (.mdb, *.accdb). My computer only had *.mdb, which has been deprecated. My Access database was a .mdb file, so I was able to use this driver as shown below. Read more on Access drivers here.

"Drivers exist for all major DBMSs, many other data sources like address book systems and Microsoft Excel, and even for text or comma-separated values (CSV) files."  - Wikipedia

Database Data Types

I set all of the field data types to "Short Text" because I'm passing strings as SQL parameters below. Uploading as other data types may require additional formatting. To edit the data types of your table, open the table and select "Design View" under the "Home" tab. It got the job done for me!

Inserting new rows into a Microsoft Access Database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import pandas as pd
import pyodbc


def df_to_row_tuples(df):
    """Use list comprehension to format df rows as a list of tuples:
    rows = [('Garfield','Orange','Eat'),('Meowth','White','Scratch')]
    """
    df = df.fillna('')
    rows = [tuple(cell) for cell in df.values]
    return rows

"""Rows are not added to DB until they are committed.
Pass each row tuple as a SQL parameter (?,?,?).
cursor.execute docs: https://www.mcobject.com/docs/Content/Programming/Python/Classes/Cursor/execute.htm
"""
conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb)};DBQ=C:\Add_Path\To_DB\Here\Your_DB.mdb;')
cursor = conn.cursor()
sql = ''' INSERT INTO Cats (Name, Color, Move)
          VALUES(?,?,?) '''
df = pd.read_csv('Cat Data.csv')
rows = df_to_row_tuples(df)
for row in rows:
    cursor.execute(sql, row)
conn.commit()

Conclusion

Running the above in command prompt uses pyodbc and SQL to add dataframe rows to a Microsoft Access DB table named "Cats". Passing each row as a SQL parameter has two benefits:

  1. It handles strings with single quotes (') and loads them to the DB.
  2. It protects against SQL injection attacks.

Access Limitation Disclaimer

Access topped out just shy of 10 million rows in my use case, when records stopped getting added to my database. So keep that in mind if you're thinking about using Access to store your data.

Supplementary Resources

Insert Values into MS Access Table using Python

pyodbc documentation

Microsoft Documentation pyodbc example

The Python Cursor Class

Psycopg Cursor Class Documentation

Nov 20, 2018

Packaging Python as a Windows App via Pyinstaller

My research found that for creating a single-file Windows executable from a .py file, the front-running Python library is Pyinstaller. It worked on my Windows 7 PC. My program used a Gooey GUI, but many of the popular Python GUI libraries work as well.

Installation:

To install, enter this into command your command prompt or terminal:
python -m pip install pyinstaller
At the time of this article, this installed Pyinstaller version 3.3.1 using Python version 3.6.5. Go here for a refresher on setting up pip if you need it.

Using the build.spec file and starting Pyinstaller:

Most examples I found used a build.spec file. Think of this as the blueprint by which your app will be constructed. It provides the compiling settings, images and any other data necessary for assembling your app. The format for passing the .spec file to Pyinstaller in pseudo-code:

pyinstaller (run mode options) (.spec file)
Basic start compiler command to build.spec:
pyinstaller build.spec

Establishing a debugging loop with Pyinstaller

Debug mode can be set from the command line:

pyinstaller -debug build.spec

OR by passing debug=True to the EXE() function of the build.spec. I used the second option. See my full build.spec file at the bottom of this post. Pyinstaller displayed a lot of error messages while compiling my app, but it still compiled into a working .exe.

To see your app's error message, run the resulting your_app.exe from the command line. You can find it in the 'dist' folder that pyinstaller creates when you pass it the build.spec file. Set the dist folder as your working directory, type your_app.exe and hit enter. Once you are reading and fixing error messages, you're well on the way to creating your own desktop app.

Flushing sys.stdout/Python printing for Windows:

Python's design requires some code to play nice with Windows when it prints a statement. Simply add this to your .py file. I used write mode 'w'. What worked for me: don't pass 0 to fdopen(). This was contrary to Gooey's instructions.

1
2
3
4
import os
import sys
nonbuffered_stdout = os.fdopen(sys.stdout.fileno(), 'w')
sys.stdout = nonbuffered_stdout

Fetching the local user's system information:

In order to run on any user's system, we need to grab their local file paths. I accomplished this by referencing the sys._MEIPASS via the below code I found from a Stack Overflow post.

1
2
3
4
5
6
7
def resource_path(relative_path):
    """ Get absolute path to resource, works for dev and for PyInstaller"""
    base_path = getattr(sys, '_MEIPASS', os.path.dirname(os.path.abspath(__file__)))
    return os.path.join(base_path, relative_path)

temp_folder_path = getattr(sys, '_MEIPASS', os.getcwd())
relative_path = resource_path(temp_folder_path)

Adding library-specific patches:

  1. Below is the fix I found for the Pandas library, which I added to my build.spec:
1
2
hiddenimports=['pandas._libs.tslibs.timedeltas','pandas._libs.tslibs.nattype',
'pandas._libs.tslibs.np_datetime','pandas._libs.skiplist']

2. The Gooey library needs some special code added to the build.spec for its images and languages data. More details are found in this blog post, written by the author of the Gooey library.

Will Ferrell Old School

Last, but not least: don't panic.

Compiling Python to Windows binary code sounds like a dauntingtask, but it wasn't nearly as complex as I feared. The folks behind Pyinstaller have done a great job of making it possible and, dare I say, simple. Stay calm, drink some coffee, dig in and welcome the challenge with a trial and error mentality. I was able to figure it out over the span of one Saturday. Good luck.

Useful Resources:

  1. Pyinstaller Github - If Things Go Wrong
  2. Pyinstaller Documentation:Using Pyinstaller Arguments
  3. Gooey Pyinstaller Instructions
  4. Pandas hiddenimports Discussion

Caveats:

  1. You should compile the program on the operating system it will be run on. There are options for creating a multi-os compatible package, but I did not attempt them.
  2. Windows 7 is proven to work with Pyinstaller, per the documentation. It's also what I am running on. Other Windows systems older than Windows 10 may be less reliable.
  3. I experienced trouble when passing arguments from the command line to pyinstaller and have so far been unable to get the console window to hide. Neither the -w, --windowed, --noconsole arguments seemed to work. I will update if I am able to find a solution.
  4. Now that I am testing my compiled app, I am seeing 10x performance slowdowns when running as the final .exe vs. the original .py file. But at least the program runs correctly and still does so relatively quickly.
  5. I also received the error: "Fatal error: Pyinstaller does not include a pre-compiled bootloader for your platform." I fixed this by upgrading to the latest version of Pyinstaller:
pip3 install --upgrade pyinstaller

My full build.spec file, modified from here :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# -*- mode: python -*-
import gooey
gooey_root = os.path.dirname(gooey.__file__)
gooey_languages = Tree(os.path.join(gooey_root, 'languages'), prefix = 'gooey/languages')
gooey_images = Tree(os.path.join(gooey_root, 'images'), prefix = 'gooey/images')

a = Analysis(['your_program.py'],
             pathex=['C:\\Python36\\Lib\\site-packages\\your_appdir'],
             hiddenimports=['pandas._libs.tslibs.timedeltas', 'pandas._libs.tslibs.np_datetime', 'pandas._libs.tslibs.nattype', 'pandas._libs.skiplist'],
             hookspath=None,
             runtime_hooks=None,
             )
options = [('u', None, 'OPTION')]
a.datas += [('program_icon.ico', 'C:\\Python36\\Lib\\site-packages\\your_appdir\\program_icon.ico',  'DATA'),
            ('config_icon.png', 'C:\\Python36\\Lib\\site-packages\\your_appdir\\config_icon.png','DATA')]

pyz = PYZ(a.pure)
exe = EXE(pyz,
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,
          options,
          gooey_languages,
          gooey_images,
          name='ADD YOUR APP NAME HERE',
          debug=True,
          strip=False,
          upx=True,
          console=True,
          icon=os.path.join('program_icon.ico'))

Jul 15, 2018

Findstr, RegEx File Searches for Windows

Findstr is the Windows alternative to GREP, which runs on the Unix operating system. Findstr searches files with regular expressions and seems useful for string matching within files and directories.  It is one of over 280 command prompt commands. Here's the official Windows Documentation and some Linux vs. Windows Examples.

Update: Windows announced that Grep and several other Unix command line tools will be added to Windows 10. This is a new alternative to findstr.

This findstr command returns all lines containing an '@' in a text file.

findstr @ test.txt
findstr Emails

I was happy to see Findstr's convenient help menu:

findstr -?
findstr_help

Regular expressions are so powerful. It's nice to have this utility within the command prompt. I am hoping to get to know some of the other 280 command prompt commands.

I've previously explored regex with Python. This Python regex example finds all words in a text file containing '@' symbols:

1
2
3
4
5
6
7
8
import re

# read the file to string + regex email search
with open('test.txt', 'r') as fhand:
    string = fhand.read()
    # this regex returns a python list of emails:
    emails = re.findall('(\S*@\S+)', string)
    print(emails)
findall_python

For more command prompt nuggets, check out my more recent post: Exploring Windows Command Line Tools, Batch Files and Remote Desktop Connection.

May 12, 2018

A Stroll Through Pandas 1.0, Python’s Tabular Data Powerhouse

Introduction

pandasThanks to pandas, I have automated some data cleaning and file reading processes at my job. Here are some terms and code that have been useful or interesting to me after 2 years of exploration. I also checked out "Python for Data Analysis" from theChicago Public Library.

If I could suggest anything to be successful with pandas, it is repetition. I use it nearly every day at work. Dive into its API documentation. There are tons of useful tools there, laid out with meticulous detail and examples. I began learning pandas with this PyCon 2015 tutorial from Brandon Rhodes, it's informative and entertaining! (It's a little dated now but I still recommend it.) The Reproducible Data Analysis in Jupyter video series by Jake VanderPlas is also a great example of pandas-related workflows.

Table of Contents

  1. Pythonpandas Installation and Version Compatibility
  2. Welcome to pandas 1.0
  3. Data Wrangling, Exploration and Broadcasting
    • Series.str & Series.dt accessors
    • apply, applymap, lambda and map
    • featuring pandas.to_markdown()
    • SQL operations with df.merge() and pandas.read_sql()
    • pandas.read_clipboard()
    • converting between Series and DataFrame
  4. Turning json API responses into a dataframe with pandas.json_normalize()
  5. Plotting Visualizations with matplotlib
  6. Supplementary Resources and Links

(1) Python + pandas Installation and Version Compatibility

Python 3.6 and higher can install pandas 1.0.

Installing Python 3.8 on Windows

For Windows installation, see the python docs for an installer, "Using Python on Windows".

Installing Python 3.8 on Ubuntu

Follow these steps to download and install Python 3.8 in the Ubuntu terminal. To upgrade to pandas 1.0, I installed Python 3.8, the latest stable release, "from source" on Ubuntu 16.04.

If you intend to use pandas.to_markdown() on Ubuntu, it might save you trouble to pre-emptively install the '_bz2' library before you build your Python from source.

On Ubuntu, I ran into ModuleNotFoundError: No module named '_bz2' and fixed by entering in the terminal:

sudo apt-get install libbz2-dev

I also saw this message when completing install:

The necessary bits to build these optional modules were not found. To find the necessary bits, look in setup.py in detect-modules() for the module's name.

If you need to re-build Python on Ubuntu, enter:

cd /home/erick/Python-3.8.0/
./configure --enable-loadable-sqlite-extensions && make && sudo make install

I installed missing  _bz2 and _sqllite3 modules then re-built with these commands.

Installing Older pandas Versions on Ubuntu

The version downloaded with this command is about 6 months behind the current version. For me, this installed pandas 0.17 on Ubuntu:

sudo apt-get install python3-pandas

As of February 2020, this command installs pandas version 0.24 with pip when used with Python 3.5 on Linux Ubuntu 16.04:

python3.5 -m pip install pandas
successful_python_install

If pandas is already installed, you can upgrade with pip:

pip_list
python -m pip install --upgrade pandas

To check if pip is installed:

python -m pip list

Best Practice: Virtual Environments

Create a virtual environment with your new Python version. venv wasn't included in my Python 3.8 installation on Ubuntu 16.04, so I installed virtualenv:

python -m pip --user install virtualenv

Let's create a new virtual environment. Enter in terminal or command prompt:

virtualenv -p python3.8 add_env_name_here

Now, activate your new virtual environment on Linux:

source add_env_name_here/bin/activate

Or activate environment on Windows:

cd add_env_name_here\scripts & activate

"ImportError: Missing optional dependency 'tabulate'. Use pip or conda to install tabulate:" To use pd.to_markdown(), install Tabulate:

python -m pip install tabulate

To use pd.read_clipboard() on Linux, install xclip or xsel:

sudo apt-get install xclip

I also saw a prompt to install pyperclip:

python -m pip install pyperclip

Now install pandas 1.0 and matplotlib in your virtual environment for visualizations.

python3.8 -m pip install pandas
python -m pip install -U matplotlib

(2) Welcome to pandas 1.0

You did it! Welcome to the good life. The basis of pandas is the "dataframe", commonly abbreviated as df, which is similar to a spreadsheet. Another core pandas object is the pandas.Series object, which is similar to a Python list or numpy array. When imported, pandas is aliased as "pd". The pd object allows you to access many useful pandas functions. I'll use it interchangeably with pandas in this post.

The library’s name derives from panel data, a common term for multidimensional data sets encountered in statistics and econometrics.

pandas: a Foundational Python Library for Data Analysis and Statistics

  • Wes McKinney

(3) Data Wrangling, Exploration and Broadcasting

Data is commonly read in from file with pd.read_csv()

1
2
3
4
5
6
7
8
import pandas as pd
file_name = 'my_bank_statement.csv'
# you may sometimes need to specify an alternate encoding: encoding = "ISO-8859-1"
df = pd.read_csv(file_name, encoding='utf-8')
print(df.head())
print(df.shape) # returns a tuple: (# of rows, # of columns)
print(df.dtypes)
print(df.info())

Create a dataframe from a list of Python lists, named movies below, with pd.DataFrame:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import pandas as pd

column_names = ["Title", "Release Date", "Character", "Actor", "Movie Budget", "Worldwide Gross"]
movies = [["Ocean's 11", "12/7/2001", "Danny Ocean", "George Clooney","$85,000,000"," $450,728,529"],
["Ocean's 11", "12/7/2001", "Tess Ocean", "Julia Roberts","$85,000,000"," $450,728,529"],
["Runaway Bride", "6/30/1999", "Ike Graham", "Richard Gere","$70,000,000","$309,457,509"],
["Runaway Bride", "6/30/1999", "Maggy Carpenter", "Julia Roberts","$70,000,000","$309,457,509"],
["Bonnie and Clyde", "9/1/1967", "Clyde Barrow", "Warren Beaty","$2,500,000", "$70,000,000"],
["Bonnie and Clyde", "9/1/1967", "Bonnie Parker", "Faye Dunaway","$2,500,000", "$70,000,000"]]

df = pd.DataFrame(movies, columns=column_names)
df = df[["Title","Character", "Actor", "Movie Budget", "Worldwide Gross"]]
print(df.to_markdown(showindex=False, tablefmt="simple"))

Let's print the table to our terminal with pd.to_markdown() new in pandas version 1.0.0:

simple_layout_markdown

Slicing and sorting a dataframe, removing duplicates, and working with datetime objects

  1. Let's create a new dataframe slice with only two columns
  2. Drop duplicate movies
  3. Convert the dates to datetime objects
  4. Get the year from an array of datetime objects
  5. Set the year as the dataframe index
1
2
3
4
5
6
7
8
df = pd.DataFrame(movies, columns=column_names)
date_df = df[['Title', 'Release Date']].drop_duplicates(subset=['Title'])
date_df['Release Date'] = pd.to_datetime(date_df['Release Date'])
# create year column using the pd.Series.dt datetime accessor:
date_df['Release Year'] = df['Release Date'].dt.year
date_df = date_df.sort_values(by=['Release Date'])
date_df = date_df.set_index('Release Year')
print(date_df.to_markdown(showindex=False, tablefmt='simple'))

dates_of_movies

Applying Broadcasting in pandas

Broadcasting means to map a function or an arithmetic calculation over an over an array (using apply or map) or dataframe (applymap).

"Summing up, apply works on a row/column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series."

Applying a function to a pandas column

  • Convert columns to int and calculate the difference between two columns.
  • Let's format those integers back to dollars with python's lambda and pandas' applymap for extra jazz.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def format_dollars_as_int(dollars):
    """Accepts a dollar formatted string, returns an int."""
    number = dollars.replace('$','').replace(',','')
    return int(number)

df = pd.DataFrame(movies, columns=column_names)
df = df.drop_duplicates(subset=['Title'])
df[['Movie Budget','Worldwide Gross']] = df[['Movie Budget','Worldwide Gross']].astype(str).applymap(format_dollars_as_int)
df['Movie Net Income'] = df['Worldwide Gross'] - df['Movie Budget']
money_columns = ['Movie Budget', 'Worldwide Gross','Movie Net Income']
df[money_columns] = df[money_columns].applymap(lambda x:'${:,}'.format(x))

Creating a new column and writing to a .csv file

  • Then add the IMDB ratings of our three films in a new column.
  • Finally, write the result to markdown and a csv file.
1
2
3
4
5
# create a new column with the three movies' IMDB ratings
df['IMDB Rating'] = list([7.8,5.5,7.8])
print(df.to_markdown(showindex=False, tablefmt='simple'))
df.to_csv('Movies.csv', index=False)
print(df.Actor.value_counts().to_markdown(tablefmt="github"))
IMDB_movies

See also: pandas.Series.value_counts() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html


Notice for column names without spaces, you can use dot notation instead of brackets:

1
2
3
# both valid ways to access column by name
df.Actor
df['Actor']

Lowercase column names Python's map function:

1
df.columns = map(str.lower, df.columns)

Strip whitespace from a column of strings with the pandas.Series.str accessor:

1
df['Character'] = df['Character'].astype(str).str.strip()

Fix pesky leading zero zip codes with str.zfill():

1
log_df['zip'] = log_df['zip'].astype(str).str.zfill(5)

Get a row by index number pandas.DataFrame.loc[]:

1
2
first_row = df.loc[0, df.columns]
third_row = df.loc[2, df.columns]

Filter the df to get rows where the actor is 'Julia Roberts'.

1
2
julia_roberts_movies = df[df.Actor=='Julia Roberts'].reset_index(drop=True)
print(julia_roberts_movies.head())

"Get" an item from a column of lists with str.get().

1
2
# returns first item in each cell's list into new column
df['first_item'] = df['items'].str.get(0)

Execute SQL-like operations between dataframes with df.merge().

First, use df.copy() to create a new dataframe copy of our actors table above.  By default, df.merge() uses an inner join to merge two dfs on a common column. Let's add each film's release year from our date_df to our original actors table, with an inner join based on 'Title':

1
2
3
4
5
6
actors = df.copy(deep=True)
# slice only the columns we want to merge:
date_df = date_df[['Title','Release Year']]
joined_df = actors.merge(date_df, on='Title', how='inner')
# You can pass the number of rows to see to head. It defaults to 5.
print(joined_df.head(10))

Execute database queries with pd.read_sql().

When the chunksize argument is passed, pd.read_sql() returns an iterator. We can use this to iterate through a database with lots of rows. When combined with DB connection libraries like pyodbc or SQLAlchemy, you can process a database in chunks. In this example, it's an Access DB connection via pyodbc to process 500,000 rows per chunk. Pyodbc works on a wide range of other databases also.

uses pd.Series.isin() to check if each email is in the DB.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import pandas as pd
import pyodbc

emails = ['[email protected]', '[email protected]', '[email protected]']
connection_string = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\path_to_db\emails_database.accb;'
print(connection_string)
conn = pyodbc.connect(connection_string)
query = """
    SELECT *
    FROM   ADD_TABLE_NAME
    """
dfs = list()
for i, db_chunk in enumerate(pd.read_sql(query, conn, chunksize=500000)):
    emails_in_db = db_chunk[db_chunk.Email.isin(emails)]
    dfs.append(emails_in_db)
    print(i)
emails_in_db = pd.concat(dfs)
emails_in_db.to_csv('DB_Email_Query_Results.csv', index=False)
In case you are wondering, enumerate is a python built-in for enumerating, or counting an iterable, e.g. list or generator, as you iterate through it.

Using pd.read_clipboard():

1
2
3
import pandas as pd
clipboard_contents = pd.read_clipboard()
print(clipboard_contents)

Use pd.to_clipboard() to store a dataframe as clipboard text:

1
2
3
4
5
6
import pandas as pd
truths = ['pandas is great','I love pandas','pandas changed my life']
df = pd.DataFrame([truths], columns=['Truths'])
df = df.to_clipboard(index=False, sep='|')
clipboard_contents = input('Press ctrl-v ')
print(clipboard_contents)

Convert the clipboard contents to df with pd.DataFrame() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html:

1
2
3
4
import pandas as pd
clipboard_contents = list(input('Press ctrl-v '))
df = pd.DataFrame([clipboard_contents])
print(df.head())

If the clipboard dataframe has one column, you could squeeze the clipboard contents into a pd.Series object:

1
2
3
4
5
6
import pandas as pd
clipboard_text = pd.read_clipboard()
clipboard_contents = list(clipboard_text)
df = pd.DataFrame([clipboard_contents], columns=['Clipboard Data'])
clipboard_series = df.squeeze(axis='columns')
print(type(clipboard_series))

Inversely, consider using pandas.Series.to_frame() to convert a Series to a dataframe:

1
2
3
4
import pandas as pd
clipboard_contents = pd.Series(input('Press ctrl-v '))
df = clipboard_contents.to_frame()
print(df.head())

(4) Turning json API responses into a dataframe with pd.json_normalize()

Update: beginning in pandas 1.0, json_normalize became a top-level pandas namespace. For older pandas versions:

1
2
3
4
5
6
7
import pandas as pd
import requests
url = 'pseudo_API.com/endpoint/'
parameters = {'page_size'=100, format='json', api_type='contact_sync' }
response = requests.get(url=url, params=parameters)
data = response.json() # decode response into json
df = pd.json_normalize(data['any_key'])

pandas.json_normalize() is now exposed in the top-level namespace. Usage of json_normalize as pandas.io.json.json_normalize is now deprecated and it is recommended to use json_normalize as pandas.json_normalize() instead (GH27586).

What's new in pandas 1.0.0

(5) Plotting Visualizations with matplotlib

Make a bar plot of the movie release year counts using pandas and matplotlib formatting.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import matplotlib.ticker as ticker

column_names = ["Title", "Release Date", "Character", "Actor"]
rows = [["Ocean's 11", "12/7/2001", "Danny Ocean", "George Clooney"],
["Ocean's 11", "12/7/2001", "Tess Ocean", "Julia Roberts"],
["Runaway Bride", "6/30/1999", "Ike Graham", "Richard Gere"],
["Runaway Bride", "6/30/1999", "Maggy Carpenter", "Julia Roberts"],
["Bonnie and Clyde", "9/1/1967", "Clyde Barrow", "Richard Gere"],
["Bonnie and Clyde", "9/1/1967", "Bonnie Parker", "Julia Roberts"]]
df = pd.DataFrame(rows, columns=column_names)
ax = df.Year.value_counts().plot(x='title', ylim=0, kind='bar', title='Release Year of Movies', rot=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
fig = ax.get_figure()
fig.tight_layout()
fig.savefig('images/Movie_Plot.png')

Use Jupyter Notebook to show plot, and/or download plot from command line.

Plot George Clooney's movies over time in a line graph. [Source]

1
2
3
4
5
import matplotlib.pyplot as plt
df = df[df.Actor=='George Clooney']
df.groupby(['Year']).size().plot(ylim=0)
fig = ax.get_figure()
fig.savefig('figure.pdf')

Feb 29, 2016

Tkinter and Python Libraries

Python's set of libraries are fun to explore. They allow Python to work with many types of other coding languages, allow you to do cool stuff, and seem to require a minimal knowledge of the actual code. You can read the complete list of standard libraries here. There are also others available that you can download and install for unique challenges or software such as Android.

Currently, my favorite library is Tkinter. Tkinter is a module that replicates Tk/TCL within Python and allows you to create a quick Graphical User Interface (GUI) for your programs - great for trying to put together a prototype program with basic aesthetic design control for things like buttons, entry boxes, text and other visual elements that make up the front end of a computer program.

+Pro Tip: When you're trying to use a new module, you can read how to use it in the module's documentation. Here's Tkinter's documentation.

++ Today's find: Free guide called "Automate the Boring stuff with Python"