Lo-Fi Python | Page 12

Nov 20, 2018

Packaging Python as a Windows App via Pyinstaller

My research found that for creating a single-file Windows executable from a .py file, the front-running Python library is Pyinstaller. It worked on my Windows 7 PC. My program used a Gooey GUI, but many of the popular Python GUI libraries work as well.

Installation:

To install, enter this into command your command prompt or terminal:

python -m pip install pyinstaller

At the time of this article, this installed Pyinstaller version 3.3.1 using Python version 3.6.5. Go here for a refresher on setting up pip if you need it.

Using the build.spec file and starting Pyinstaller:

Most examples I found used a build.spec file. Think of this as the blueprint by which your app will be constructed. It provides the compiling settings, images and any other data necessary for assembling your app. The format for passing the .spec file to Pyinstaller in pseudo-code:

pyinstaller (run mode options) (.spec file)

Basic start compiler command to build.spec:

pyinstaller build.spec

Establishing a debugging loop with Pyinstaller

Debug mode can be set from the command line:

pyinstaller -debug build.spec

OR by passing debug=True to the EXE() function of the build.spec. I used the second option. See my full build.spec file at the bottom of this post. Pyinstaller displayed a lot of error messages while compiling my app, but it still compiled into a working .exe.

To see your app's error message, run the resulting your_app.exe from the command line. You can find it in the 'dist' folder that pyinstaller creates when you pass it the build.spec file. Set the dist folder as your working directory, type your_app.exe and hit enter. Once you are reading and fixing error messages, you're well on the way to creating your own desktop app.

Flushing sys.stdout/Python printing for Windows:

Python's design requires some code to play nice with Windows when it prints a statement. Simply add this to your .py file. I used write mode 'w'. What worked for me: don't pass 0 to fdopen(). This was contrary to Gooey's instructions.

import os
import sys
nonbuffered_stdout = os.fdopen(sys.stdout.fileno(), 'w')
sys.stdout = nonbuffered_stdout

Fetching the local user's system information:

In order to run on any user's system, we need to grab their local file paths. I accomplished this by referencing the sys._MEIPASS via the below code I found from a Stack Overflow post.

def resource_path(relative_path):
    """ Get absolute path to resource, works for dev and for PyInstaller"""
    base_path = getattr(sys, '_MEIPASS', os.path.dirname(os.path.abspath(__file__)))
    return os.path.join(base_path, relative_path)

temp_folder_path = getattr(sys, '_MEIPASS', os.getcwd())
relative_path = resource_path(temp_folder_path)

Adding library-specific patches:

Below is the fix I found for the Pandas library, which I added to my build.spec:

hiddenimports=['pandas._libs.tslibs.timedeltas','pandas._libs.tslibs.nattype',
'pandas._libs.tslibs.np_datetime','pandas._libs.skiplist']

2. The Gooey library needs some special code added to the build.spec for its images and languages data. More details are found in this blog post, written by the author of the Gooey library.

Last, but not least: don't panic.

Compiling Python to Windows binary code sounds like a dauntingtask, but it wasn't nearly as complex as I feared. The folks behind Pyinstaller have done a great job of making it possible and, dare I say, simple. Stay calm, drink some coffee, dig in and welcome the challenge with a trial and error mentality. I was able to figure it out over the span of one Saturday. Good luck.

Useful Resources:

Pyinstaller Github - If Things Go Wrong
Pyinstaller Documentation:Using Pyinstaller Arguments
Gooey Pyinstaller Instructions
Pandas hiddenimports Discussion

Caveats:

You should compile the program on the operating system it will be run on. There are options for creating a multi-os compatible package, but I did not attempt them.
Windows 7 is proven to work with Pyinstaller, per the documentation. It's also what I am running on. Other Windows systems older than Windows 10 may be less reliable.
I experienced trouble when passing arguments from the command line to pyinstaller and have so far been unable to get the console window to hide. Neither the -w, --windowed, --noconsole arguments seemed to work. I will update if I am able to find a solution.
Now that I am testing my compiled app, I am seeing 10x performance slowdowns when running as the final .exe vs. the original .py file. But at least the program runs correctly and still does so relatively quickly.
I also received the error: "Fatal error: Pyinstaller does not include a pre-compiled bootloader for your platform." I fixed this by upgrading to the latest version of Pyinstaller:

pip3 install --upgrade pyinstaller

My full build.spec file, modified from here :

# -*- mode: python -*-
import gooey
gooey_root = os.path.dirname(gooey.__file__)
gooey_languages = Tree(os.path.join(gooey_root, 'languages'), prefix = 'gooey/languages')
gooey_images = Tree(os.path.join(gooey_root, 'images'), prefix = 'gooey/images')

a = Analysis(['your_program.py'],
             pathex=['C:\\Python36\\Lib\\site-packages\\your_appdir'],
             hiddenimports=['pandas._libs.tslibs.timedeltas', 'pandas._libs.tslibs.np_datetime', 'pandas._libs.tslibs.nattype', 'pandas._libs.skiplist'],
             hookspath=None,
             runtime_hooks=None,
             )
options = [('u', None, 'OPTION')]
a.datas += [('program_icon.ico', 'C:\\Python36\\Lib\\site-packages\\your_appdir\\program_icon.ico',  'DATA'),
            ('config_icon.png', 'C:\\Python36\\Lib\\site-packages\\your_appdir\\config_icon.png','DATA')]

pyz = PYZ(a.pure)
exe = EXE(pyz,
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,
          options,
          gooey_languages,
          gooey_images,
          name='ADD YOUR APP NAME HERE',
          debug=True,
          strip=False,
          upx=True,
          console=True,
          icon=os.path.join('program_icon.ico'))

posted at 02:59 · coding, pandas, python, software · app development computers programming Windows

Sep 14, 2018

Making A Desktop Color Eyedropper in Python to Grab Color Values

Goal: recreate my resume in the dark Atom text editor theme (background and fonts). Sub-goal: find a color eyedropper to grab the actual color values of the Atom layout.

Approach #1: find an Atom eyedropper package to grab the colors. My first thought was to find the easiest solution, within the packages of my Atom text editor. After searching Atom's packages, the two best potential solutions were "an-color-eyedropper" and "color picker" . The an-color-eyedropper description sounds perfect: "A simple "real" color picker. By "real" I mean it's able to pick colors anywhere on any screen."

Color picker an color eyedropper

Unfortunately it failed to install and displayed the error, "Unable to download 400 Bad Request Repository inaccessible". It seems to rely on the "python" Atom package which is now deprecated. I was unable to find a repo anywhere by googling.

Color picker has easy-to-follow instructions and installed with no problem. It allows you to quickly select any color visually with sliders. Then the RGB or Hexadecimal values of your color are added as text in the editor in proper format. However, we are looking for a color grabber to pull colors from a screen object. This is more of a productivity enhancing and color exploration tool for programmers. On to Python options.

Approach #2: Use the python tkcolorpicker package to grab the colors.

The first thing I found on Google was tkcolorpicker, a package that uses the tkinter library. I couldn't tell exactly what it was, so let's find out. First, install via pip install:

python -m pip install tkcolorpicker

Then run the below script. Cool gadget for sure, but also not quite what I was looking to use. It allows selection of a color with sliders or input values, similar to Atom's color picker, but for user input rather than color picking. Nice little tool. :D

import tkinter as tk
import tkinter.ttk as ttk
from tkcolorpicker import askcolor

root = tk.Tk()
style = ttk.Style(root)
style.theme_use("clam")
hex_code, RGB_code = askcolor((255, 255, 0), root)
print(hex_code, RGB_code)
root.mainloop()

askcolor() returns a tuple with both the RGB and hex codes selected by the user. Above, we are unpacking that tuple into the hex_code and RGB_code variables.

Approach #3: Use the Python eyedropper package to grab the colors.

I then found eyedropper for Windows, which has a minimalist repository and offers a simple approach to desktop eyedropper functionality. Install eyedropper via pip:

python -m pip install eyedropper

Hover your mouse over the object you want to grab the color from (in my case, the Atom text editor background). Alternatively, I was able to run eyedropper from the command line by entering:

py -m eyedropper

Mission possible. Then I hit ctrl+v in a text file and there was the hex code for my Atom background. Some of the colors that eyedropper grabbed were nearly identical to those in the Atom text editor dark theme. Others were not quite the same. I made slight eyeball adjustments to the colors for some of the fonts.

Using Python to convert hex to RGB

Microsoft Word uses RGB codes but eyedropper gave us hex. To convert, I found this website practical and quick. Alternatively, you could convert a hex code to RGB with python:

hex_code = input("Enter hex: ").lstrip("#")
RGB_code = tuple(int(hex_code[i : i + 2], 16) for i in (0, 2, 4))
print("RGB =", RGB_code)

Bonus: use pd.read_clipboard() docs to get the hex codes.

Once eyedropper sends the color values to your system's clipboard, there are multiple ways to access them. This alternative uses pandas.

Installing pandas and pyperclip with pip:

python -m pip install pandas
python -m pip install pyperclip

On Linux, install xclip or xsel

sudo apt-get install xclip

To get the clipboard contents with pandas:

import pandas as pd

hex_code_df = pd.read_clipboard()
print(hex_code_df.head())

Supplementary Notes and Links

Here's a Python eyedropper script featuring the pillow and xlib libraries that I was unable to get working.
I didn't try ColorCop for Windows, but it may be a non-Python alternative.
Did you know? Python 2 had a ColorPicker module that is not in Python 3.
How pandas read_clipboard method works
pandas to_clipboard and read_clipboard source code
This desktop tool has great U/X and an eyedropper feature. It runs on Electron, a Javascript based framework. Coincidentally, Electron was used to build Atom.

posted at 05:45 · coding, Colors, pandas, programming · atom color eyedropper gui python

Aug 25, 2018

Gooey GUI for Python Scripts

GUI stands for "Graphical User Interface", aka the part of a program designed for human interaction. Adding a GUI to a Python script allows anyone to run it without having to code or use the command line.

There are several GUI libraries in Python. A few I have heard of are Tkinter (comes in the standard library), wxPython, PyQT, easygui, DearPyGui and PySimpleGUI. I explored Tkinter back when I first got into Python. It was more intricate and offered more control over the look of your app, and took longer to pick up. Gooey is more of a pre-packaged GUI library.

The Gooey Github page was most useful to me and helped me to do what I needed. The script posted in this blog helped as well. I needed to enable a human to supply three files and enter a number. Gooey was a good match for this. The library has two branches:

some basic widgets piggyback off the argparse library
another part of the library uses a function called the GooeyParser. The GooeyParser offers more advanced widgets, like a file chooser. This was exactly what I was looking to use to pull in files for my script.

Installing Gooey

Argparse comes stock with Python. You can install Gooey via the pip installer. Open command prompt or terminal and enter:

python -m pip install Gooey

Below is a basic argparse/Gooey combination script. The argparse version offers a handful of widgets such as checkboxes and dropdown, but I had trouble getting them to work with the GooeyParser (used in 2nd script).

from argparse import ArgumentParser
from gooey import Gooey

@Gooey(program_name='Report Generator', default_size=(575, 600))
def get_args():
    """Demonstrating python's vars built-in to store arguments in a python dict."""
    parser = ArgumentParser(description='A simple argument parser', epilog='This is where you might put example usage')
    parser.add_argument('Name', action='store', required=True, help='Help text for option X')
    parser.add_argument('Email', help='Help text for option Y', default=False)
    parser.add_argument('Campaign Number', help='Help text for option Z', type=int)
    parser.add_argument('Campaign Segment', choices=['A', 'B','All'], default='a', nargs='?')
    user_inputs = vars(parser.parse_args())
    print(user_inputs)
    name = user_inputs['Name']
    campaign_number = user_inputs['Campaign Number']
    return parser.parse_args()

if __name__ == '__main__':
    get_args()

Side note: Check out Python's vars() built-in function above. It returns the input data as a dictionary called user_inputs. Then we can get the values via the dictionary's keys. Pretty nifty!

The @Gooey() part of the code is an advanced function known as a decorator in Python. Put simply, decorators are functions that modify the function to which they are attached.

Below is my script that uses the more advanced GooeyParser for its "FileChooser" widget. Gooey allows you to group widgets together and set how many widgets per line with the gooey_options={} parameter.

from gooey import Gooey, GooeyParser

@Gooey(program_name='Email Campaign Reporting Generator', default_size=(575, 600))
def get_args():
    """Adding two argument groups, each accepting two arguments. Using gooey_options to set layout."""
    parser = GooeyParser(description='Export campaign report spreadsheets and upload below.')
    top_group = parser.add_argument_group(gooey_options={'show_border': False,'columns': 1})
    top_group.add_argument('Contact List', help='Upload Send List (.xlsx)', widget='FileChooser')
    top_group.add_argument('Opens List', help='Upload Opens List (.xlsx)', widget='FileChooser')
    top_group.add_argument('Unsubscribe List', help='Upload Unsubscribe List (.xlsx)', widget='FileChooser')
    bottom_group = parser.add_argument_group(gooey_options={'show_border': False,'columns': 1, 'required':False})
    bottom_group.add_argument('Campaign ID', action='store', help="Number found in the Campaign 'Reports' tab")
    bottom_group.add_argument('Campaign Segment', action='store', help='Enter A, B, or All. All lists supplied must match segment.')
    user_inputs = vars(parser.parse_args())
    name = user_inputs['Name']
    return parser.parse_args()

if __name__ == '__main__':
    get_args()

Overall, Gooey knows what it wants to be, an easy to use GUI framework for Python. It does it well. Here's a screenshot of my program's shiny GUI:

Now that I have a GUI on top of my program and it delivers the expected output file, I'm hoping to take it one step further by packaging it up as a Windows .exe file. This would allow it to run as a desktop app on any Windows computer without the need to install Python or library dependencies. I've only begun exploring options to do this but a few libraries I've heard of are pyinstaller, cx_Freeze and Py2Exe. Updates coming if I figure it out. Cheers :D

Update: I did figure out how to compile my Gooey app to a Windows application with Pyinstaller. You can read more on how I did it here.

posted at 16:32 · coding, data, product development, programming, software · gooey gui python software development

Jul 15, 2018

Findstr, RegEx File Searches for Windows

Findstr is the Windows alternative to GREP, which runs on the Unix operating system. Findstr searches files with regular expressions and seems useful for string matching within files and directories. It is one of over 280 command prompt commands. Here's the official Windows Documentation and some Linux vs. Windows Examples.

Update: Windows announced that Grep and several other Unix command line tools will be added to Windows 10. This is a new alternative to findstr.

This findstr command returns all lines containing an '@' in a text file.

findstr @ test.txt

I was happy to see Findstr's convenient help menu:

findstr -?

Regular expressions are so powerful. It's nice to have this utility within the command prompt. I am hoping to get to know some of the other 280 command prompt commands.

I've previously explored regex with Python. This Python regex example finds all words in a text file containing '@' symbols:

import re

# read the file to string + regex email search
with open('test.txt', 'r') as fhand:
    string = fhand.read()
    # this regex returns a python list of emails:
    emails = re.findall('(\S*@\S+)', string)
    print(emails)

For more command prompt nuggets, check out my more recent post: Exploring Windows Command Line Tools, Batch Files and Remote Desktop Connection.

posted at 19:52 · command prompt, data, python, Windows · programming regex

May 12, 2018

A Stroll Through Pandas 1.0, Python’s Tabular Data Powerhouse

Introduction

Thanks to pandas, I have automated some data cleaning and file reading processes at my job. Here are some terms and code that have been useful or interesting to me after 2 years of exploration. I also checked out "Python for Data Analysis" from theChicago Public Library.

If I could suggest anything to be successful with pandas, it is repetition. I use it nearly every day at work. Dive into its API documentation. There are tons of useful tools there, laid out with meticulous detail and examples. I began learning pandas with this PyCon 2015 tutorial from Brandon Rhodes, it's informative and entertaining! (It's a little dated now but I still recommend it.) The Reproducible Data Analysis in Jupyter video series by Jake VanderPlas is also a great example of pandas-related workflows.

Table of Contents

Pythonpandas Installation and Version Compatibility
Welcome to pandas 1.0
Data Wrangling, Exploration and Broadcasting
- Series.str & Series.dt accessors
- apply, applymap, lambda and map
- featuring pandas.to_markdown()
- SQL operations with df.merge() and pandas.read_sql()
- pandas.read_clipboard()
- converting between Series and DataFrame
Turning json API responses into a dataframe with pandas.json_normalize()
Plotting Visualizations with matplotlib
Supplementary Resources and Links

(1) Python + pandas Installation and Version Compatibility

Python 3.6 and higher can install pandas 1.0.

Installing Python 3.8 on Windows

For Windows installation, see the python docs for an installer, "Using Python on Windows".

Installing Python 3.8 on Ubuntu

Follow these steps to download and install Python 3.8 in the Ubuntu terminal. To upgrade to pandas 1.0, I installed Python 3.8, the latest stable release, "from source" on Ubuntu 16.04.

If you intend to use pandas.to_markdown() on Ubuntu, it might save you trouble to pre-emptively install the '_bz2' library before you build your Python from source.

On Ubuntu, I ran into ModuleNotFoundError: No module named '_bz2' and fixed by entering in the terminal:

sudo apt-get install libbz2-dev

I also saw this message when completing install:

The necessary bits to build these optional modules were not found. To find the necessary bits, look in setup.py in detect-modules() for the module's name.

If you need to re-build Python on Ubuntu, enter:

cd /home/erick/Python-3.8.0/
./configure --enable-loadable-sqlite-extensions && make && sudo make install

I installed missing _bz2 and _sqllite3 modules then re-built with these commands.

Installing Older pandas Versions on Ubuntu

The version downloaded with this command is about 6 months behind the current version. For me, this installed pandas 0.17 on Ubuntu:

sudo apt-get install python3-pandas

As of February 2020, this command installs pandas version 0.24 with pip when used with Python 3.5 on Linux Ubuntu 16.04:

python3.5 -m pip install pandas

If pandas is already installed, you can upgrade with pip:

python -m pip install --upgrade pandas

To check if pip is installed:

python -m pip list

Best Practice: Virtual Environments

Create a virtual environment with your new Python version. venv wasn't included in my Python 3.8 installation on Ubuntu 16.04, so I installed virtualenv:

python -m pip --user install virtualenv

Let's create a new virtual environment. Enter in terminal or command prompt:

virtualenv -p python3.8 add_env_name_here

Now, activate your new virtual environment on Linux:

source add_env_name_here/bin/activate

Or activate environment on Windows:

cd add_env_name_here\scripts & activate

"ImportError: Missing optional dependency 'tabulate'. Use pip or conda to install tabulate:" To use pd.to_markdown(), install Tabulate:

python -m pip install tabulate

To use pd.read_clipboard() on Linux, install xclip or xsel:

sudo apt-get install xclip

I also saw a prompt to install pyperclip:

python -m pip install pyperclip

Now install pandas 1.0 and matplotlib in your virtual environment for visualizations.

python3.8 -m pip install pandas
python -m pip install -U matplotlib

(2) Welcome to pandas 1.0

You did it! Welcome to the good life. The basis of pandas is the "dataframe", commonly abbreviated as df, which is similar to a spreadsheet. Another core pandas object is the pandas.Series object, which is similar to a Python list or numpy array. When imported, pandas is aliased as "pd". The pd object allows you to access many useful pandas functions. I'll use it interchangeably with pandas in this post.

The library’s name derives from panel data, a common term for multidimensional data sets encountered in statistics and econometrics.

pandas: a Foundational Python Library for Data Analysis and Statistics

Wes McKinney

(3) Data Wrangling, Exploration and Broadcasting

Data is commonly read in from file with pd.read_csv()

import pandas as pd
file_name = 'my_bank_statement.csv'
# you may sometimes need to specify an alternate encoding: encoding = "ISO-8859-1"
df = pd.read_csv(file_name, encoding='utf-8')
print(df.head())
print(df.shape) # returns a tuple: (# of rows, # of columns)
print(df.dtypes)
print(df.info())

Create a dataframe from a list of Python lists, named movies below, with pd.DataFrame:

import pandas as pd

column_names = ["Title", "Release Date", "Character", "Actor", "Movie Budget", "Worldwide Gross"]
movies = [["Ocean's 11", "12/7/2001", "Danny Ocean", "George Clooney","$85,000,000"," $450,728,529"],
["Ocean's 11", "12/7/2001", "Tess Ocean", "Julia Roberts","$85,000,000"," $450,728,529"],
["Runaway Bride", "6/30/1999", "Ike Graham", "Richard Gere","$70,000,000","$309,457,509"],
["Runaway Bride", "6/30/1999", "Maggy Carpenter", "Julia Roberts","$70,000,000","$309,457,509"],
["Bonnie and Clyde", "9/1/1967", "Clyde Barrow", "Warren Beaty","$2,500,000", "$70,000,000"],
["Bonnie and Clyde", "9/1/1967", "Bonnie Parker", "Faye Dunaway","$2,500,000", "$70,000,000"]]

df = pd.DataFrame(movies, columns=column_names)
df = df[["Title","Character", "Actor", "Movie Budget", "Worldwide Gross"]]
print(df.to_markdown(showindex=False, tablefmt="simple"))

Let's print the table to our terminal with pd.to_markdown() new in pandas version 1.0.0:

Slicing and sorting a dataframe, removing duplicates, and working with datetime objects

Let's create a new dataframe slice with only two columns
Drop duplicate movies
Convert the dates to datetime objects
Get the year from an array of datetime objects
Set the year as the dataframe index

df = pd.DataFrame(movies, columns=column_names)
date_df = df[['Title', 'Release Date']].drop_duplicates(subset=['Title'])
date_df['Release Date'] = pd.to_datetime(date_df['Release Date'])
# create year column using the pd.Series.dt datetime accessor:
date_df['Release Year'] = df['Release Date'].dt.year
date_df = date_df.sort_values(by=['Release Date'])
date_df = date_df.set_index('Release Year')
print(date_df.to_markdown(showindex=False, tablefmt='simple'))

Applying Broadcasting in pandas

Broadcasting means to map a function or an arithmetic calculation over an over an array (using apply or map) or dataframe (applymap).

"Summing up, apply works on a row/column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series."

Stack Overflow [Source]

Applying a function to a pandas column

Convert columns to int and calculate the difference between two columns.
Let's format those integers back to dollars with python's lambda and pandas' applymap for extra jazz.

def format_dollars_as_int(dollars):
    """Accepts a dollar formatted string, returns an int."""
    number = dollars.replace('$','').replace(',','')
    return int(number)

df = pd.DataFrame(movies, columns=column_names)
df = df.drop_duplicates(subset=['Title'])
df[['Movie Budget','Worldwide Gross']] = df[['Movie Budget','Worldwide Gross']].astype(str).applymap(format_dollars_as_int)
df['Movie Net Income'] = df['Worldwide Gross'] - df['Movie Budget']
money_columns = ['Movie Budget', 'Worldwide Gross','Movie Net Income']
df[money_columns] = df[money_columns].applymap(lambda x:'${:,}'.format(x))

Creating a new column and writing to a .csv file

Then add the IMDB ratings of our three films in a new column.
Finally, write the result to markdown and a csv file.

# create a new column with the three movies' IMDB ratings
df['IMDB Rating'] = list([7.8,5.5,7.8])
print(df.to_markdown(showindex=False, tablefmt='simple'))
df.to_csv('Movies.csv', index=False)
print(df.Actor.value_counts().to_markdown(tablefmt="github"))

See also: pandas.Series.value_counts() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html

Notice for column names without spaces, you can use dot notation instead of brackets:

# both valid ways to access column by name
df.Actor
df['Actor']

Lowercase column names Python's map function:

df.columns = map(str.lower, df.columns)

Strip whitespace from a column of strings with the pandas.Series.str accessor:

df['Character'] = df['Character'].astype(str).str.strip()

Fix pesky leading zero zip codes with str.zfill():

log_df['zip'] = log_df['zip'].astype(str).str.zfill(5)

Get a row by index number pandas.DataFrame.loc[]:

first_row = df.loc[0, df.columns]
third_row = df.loc[2, df.columns]

Filter the df to get rows where the actor is 'Julia Roberts'.

julia_roberts_movies = df[df.Actor=='Julia Roberts'].reset_index(drop=True)
print(julia_roberts_movies.head())

"Get" an item from a column of lists with str.get().

# returns first item in each cell's list into new column
df['first_item'] = df['items'].str.get(0)

Execute SQL-like operations between dataframes with df.merge().

First, use df.copy() to create a new dataframe copy of our actors table above. By default, df.merge() uses an inner join to merge two dfs on a common column. Let's add each film's release year from our date_df to our original actors table, with an inner join based on 'Title':

actors = df.copy(deep=True)
# slice only the columns we want to merge:
date_df = date_df[['Title','Release Year']]
joined_df = actors.merge(date_df, on='Title', how='inner')
# You can pass the number of rows to see to head. It defaults to 5.
print(joined_df.head(10))

Execute database queries with pd.read_sql().

When the chunksize argument is passed, pd.read_sql() returns an iterator. We can use this to iterate through a database with lots of rows. When combined with DB connection libraries like pyodbc or SQLAlchemy, you can process a database in chunks. In this example, it's an Access DB connection via pyodbc to process 500,000 rows per chunk. Pyodbc works on a wide range of other databases also.

uses pd.Series.isin() to check if each email is in the DB.

import pandas as pd
import pyodbc

emails = ['[email protected]', '[email protected]', '[email protected]']
connection_string = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\path_to_db\emails_database.accb;'
print(connection_string)
conn = pyodbc.connect(connection_string)
query = """
    SELECT *
    FROM   ADD_TABLE_NAME
    """
dfs = list()
for i, db_chunk in enumerate(pd.read_sql(query, conn, chunksize=500000)):
    emails_in_db = db_chunk[db_chunk.Email.isin(emails)]
    dfs.append(emails_in_db)
    print(i)
emails_in_db = pd.concat(dfs)
emails_in_db.to_csv('DB_Email_Query_Results.csv', index=False)

In case you are wondering, enumerate is a python built-in for enumerating, or counting an iterable, e.g. list or generator, as you iterate through it.

Using pd.read_clipboard():

import pandas as pd
clipboard_contents = pd.read_clipboard()
print(clipboard_contents)

Use pd.to_clipboard() to store a dataframe as clipboard text:

import pandas as pd
truths = ['pandas is great','I love pandas','pandas changed my life']
df = pd.DataFrame([truths], columns=['Truths'])
df = df.to_clipboard(index=False, sep='|')
clipboard_contents = input('Press ctrl-v ')
print(clipboard_contents)

Convert the clipboard contents to df with pd.DataFrame() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html:

import pandas as pd
clipboard_contents = list(input('Press ctrl-v '))
df = pd.DataFrame([clipboard_contents])
print(df.head())

If the clipboard dataframe has one column, you could squeeze the clipboard contents into a pd.Series object:

import pandas as pd
clipboard_text = pd.read_clipboard()
clipboard_contents = list(clipboard_text)
df = pd.DataFrame([clipboard_contents], columns=['Clipboard Data'])
clipboard_series = df.squeeze(axis='columns')
print(type(clipboard_series))

Inversely, consider using pandas.Series.to_frame() to convert a Series to a dataframe:

import pandas as pd
clipboard_contents = pd.Series(input('Press ctrl-v '))
df = clipboard_contents.to_frame()
print(df.head())

(4) Turning json API responses into a dataframe with pd.json_normalize()

Update: beginning in pandas 1.0, json_normalize became a top-level pandas namespace. For older pandas versions:

import pandas as pd
import requests
url = 'pseudo_API.com/endpoint/'
parameters = {'page_size'=100, format='json', api_type='contact_sync' }
response = requests.get(url=url, params=parameters)
data = response.json() # decode response into json
df = pd.json_normalize(data['any_key'])

pandas.json_normalize() is now exposed in the top-level namespace. Usage of json_normalize as pandas.io.json.json_normalize is now deprecated and it is recommended to use json_normalize as pandas.json_normalize() instead (GH27586).

What's new in pandas 1.0.0

(5) Plotting Visualizations with matplotlib

Make a bar plot of the movie release year counts using pandas and matplotlib formatting.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import matplotlib.ticker as ticker

column_names = ["Title", "Release Date", "Character", "Actor"]
rows = [["Ocean's 11", "12/7/2001", "Danny Ocean", "George Clooney"],
["Ocean's 11", "12/7/2001", "Tess Ocean", "Julia Roberts"],
["Runaway Bride", "6/30/1999", "Ike Graham", "Richard Gere"],
["Runaway Bride", "6/30/1999", "Maggy Carpenter", "Julia Roberts"],
["Bonnie and Clyde", "9/1/1967", "Clyde Barrow", "Richard Gere"],
["Bonnie and Clyde", "9/1/1967", "Bonnie Parker", "Julia Roberts"]]
df = pd.DataFrame(rows, columns=column_names)
ax = df.Year.value_counts().plot(x='title', ylim=0, kind='bar', title='Release Year of Movies', rot=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
fig = ax.get_figure()
fig.tight_layout()
fig.savefig('images/Movie_Plot.png')

Use Jupyter Notebook to show plot, and/or download plot from command line.

Plot George Clooney's movies over time in a line graph. [Source]

import matplotlib.pyplot as plt
df = df[df.Actor=='George Clooney']
df.groupby(['Year']).size().plot(ylim=0)
fig = ax.get_figure()
fig.savefig('figure.pdf')

(7) Supplementary Resources and Guides

Popular Supporting Libraries and Tools

Supplementary Resources:

posted at 03:39 · coding, data, excel, pandas, python · data analysis programming software

Mar 24, 2018

list(cool_python_things)

pytudes: "Python programs for perfecting programming skills."
Building a video synthesizer in Python
Running micropython on a microcontroller
Pandas: Super awesome excel and data analysis library. The talk Pandas from the Ground Up by Brandon Rhodes is great!
Fuzzy Wuzzy: fuzzy string matching Python library, written by SeatGeek
Talk Python to Me, a Python podcast
Free Programming Books from O'Reilly

posted at 19:38 · coding, pandas, programming · computers python

Jan 20, 2018

How to Install Libraries and Enable the pip Installer in Python

Python comes with a bunch of standard modules. My favorites are shutil, glob, datetime, time, os (operating system), re (regular expressions) and webbrowser. The standard library is loaded.

Inevitably, you'll want to install new libraries from Python's rich ecosystem of external modules. Enter pip, Python's handy package manager and people's champion.

This post will teach you some Python history, show how to install pandas, and help you troubleshoot problems if it's not working. You'll find Windows and Linux commands for venv setup (recommended). With pip, you'll feel like Neo when installing new modules. Any skill is at your fingertips. It's like learning kung fu. There's probably a library for that!

First, Some Python Version Caveats + History

Python 2 reached end of life on January 1st, 2020. Python 2 has officially been sunset.

Python comes with pip now, no setup is required. But certain versions such as Python 3.2 or the Python 2.7 that came stock on my improbably still functioning 2008 black Macbook, for example, may not have it installed.

In December 2021, Python 3.6 reached "end of life phase". Python 3.6 is "now effectively frozen". Read more in PEP 494. (Released Oct. 2022)

TLDR: use Python 3.7 to 3.11. This blog endorses using the lightning fast Python version 3.11.

Enter This in Your Terminal

python -m pip install pandas

Pandas is a super useful library for wrangling spreadsheet data, AKA "tabular" data. If successful, you should see activity that looks similar to the below screenshot, where I am installing openpyxl, an additional Python Excel library you'll likely want. You are good to go! This is the part where you get to feel like Neo! See Installing Python Modules in the Python Documentation for more detailed instructions.

To view all your installed libraries, enter:

pip list

Write a "requirements.txt" of installed libraries:

pip freeze > requirements.txt

You can list your outdated packages with the --outdated argument:

pip list --outdated

Use pip's -h help command line argument:

pip -h

View your system and user pip config settings:

pip config debug

Supplementary Resources

Take a look at this list of 20 modules to get started. Here is another more comprehensive list of libraries you can now install.
Swing by the pip documentation Quickstartand User Guide to learn some helpful commands.
PyPI, the Python Package Index is the official Python package repository.
Why you should use 'python -m pip'
pip cheat sheet from opensource.com

Congrats on figuring out how to install packages with pip, have fun!

Having issues? Try upgrading your pip version.

python -m pip install --upgrade pip

Try the ensurepip command.

This command will install and upgrade pip to the newest version. New in Python 3.4:

python -m ensurepip --upgrade

"The ensurepip package provides support for bootstrapping the pip installer into an existing Python installation or virtual environment. This bootstrapping approach reflects the fact that pip is an independent project with its own release cycle, and the latest available stable version is bundled with maintenance and feature releases of the CPython reference interpreter."

- ensurepip Python Documentation

You should follow best practice and create a virtual environment before installing libraries. venv or virtualenv. To create with venv:

python3 -m venv add_env_name_here

After your environment is created, activate it with the first command below, then install a library on Ubuntu Linux:

source add_env_path_here/bin activate
python -m pip install pandas

Alternatively, on Windows computers:

cd add_env_path_here\scripts & activate
python -m pip install pandas

Getting the prefix right can be tricky.

In the install command, the prefix is a reference to your Python executable. You may just need to alter your prefix to call it correctly. Here are some to try in place of "python". Observe what happens when you run these command variations. Good luck!

python3 -m pip install pandas
python3.11 -m pip install pandas
py -m pip install pandas
pip3 install pandas

How to Manually Enable the pip Installer

The rest of this post may be useful to you if you are:

Working on legacy Python 2 or < 3.3 for which pip is not installed.
Seeking to fix a faulty pip install that is not working properly.
Curious to know how to manually set up pip.

Assumes Python is already installed. If you're running Windows 10, I found it easy to install Python from the Windows store. Download the get-pip.py file. Go to the link, right click the page and "Save As" a .py file to download. Then place the file where you want to access it. I placed mine in C:Python27Libsite-packages

You could also download the file with curl:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.pyt-pip.py

If you are not sure where your site-packages folder is, type python -m site into command prompt for file path ideas.

Run the get-pip.py file.

Using command prompt's cd command with a Windows "&" operator to run the Python file in a Windows command prompt:

cd c:\Python27\Lib\site-packages & python get-pip.py

Or Linux terminal:

cd /Python27/Lib/site-packages && python get-pip.py

You should see some activity in command prompt that shows installation/updating of "setup" and "wheel". When it finishes, you have installed pip.

Type into command prompt at the same location:

python -m pip install requests

This installs the Requests module into your Python libraries. Requests is an http module which is highly regarded almost universally by the Python community.

Thanks for reading!
Check out these other posts with pip installed library examples:

fix Grammar and Spelling with language_tool_python and textblob

static site generation with pelican

text mojibake mash fixing with ftfy

a guide to making HTTP requests

simple GUI for scripts with gooey

posted at 17:50 · coding, programming · command line computers matrix neo pip python python libraries software

Jan 14, 2018

Python File Handling Basics

The basis of many great programs revolve around a simple set of operations:

Open a file.
Do something with the file contents.
Save the new file for the user.

Python is nice and simple for this. Paste the below lines into a text editor and save as a .py file. You need to have Python 3 installed. In the same folder as your .py file, save a .txt file with some words in it. Alright, let's write some code:

file_name = input("Enter your file name. e.g. words.txt")
file_handle = open(file_name, "r")
lines = file_handle.readlines()
print (lines)
file_handle.close()

In line 1, we ask the user to enter their file name with Python's raw_input function. When the program runs, the user enters their text file name with extension. This line stores the name in a variable called file_name.

In line 2, we open your text file and store it in a variable I have named file_handle. Think of the file handle as a bridge between your code and the text file. Quick point about the 'r' above: that tells the program to open the file in "Read" mode. There are several different file modes in programming. Some modes are just for reading an existing file, some are just for writing a new file, and some are capable of both. This Stack Overflow post is well written and details the differences between file modes. Once established, the file handle allows you to read the file's contents or write new contents to the file.

In line 3, we are calling the .readlines() method on our file handle. This method takes the file contents and stores them, line by line, into a list named "lines". An alternative method is .read(), which opens the file and stores its contents as one string. Try switching this out in place of .readlines() to check out the difference.

In line 4, we are printing the stored lines to show them to the user. We now have the file contents, ready to be used however we please.

In line 5, we are closing the file.

Below, we are going to write a new file using the with statement, which is generally accepted as the best way to read or write a file:

with open("Notes.txt", "w") as fhand:
    fhand.write("Did you know whales can live up to 90 years?")

In line 1, we are using Python's input function to ask the user what to name the file and storing it in a variable named file_name.

In line 2, we are calling the open function again that we used in the first example, but this time, notice the "w". This indicates that we are opening the file in "write" mode.

In line 3, we are calling the .write() method on our file handle, named save_file, and passing it our text to be saved in our new file.

In line 4, we are closing the file, completing the creation of our new file in the same folder as our .py program file.

Your program is now ready to be run. Double-click your .py file to execute it.

Before learning Python, file operations were a mystery to me. It took me a while to understand this clearly, and I wanted to share. Once you master these basic file operations, programming gets to be a lot more fun. Do try it out for yourself :D

posted at 18:53 · coding, programming · data python software

Aug 09, 2017

Creating A Simple Website and Server Environment with Node.js and Express.js

Here is what I have deduced is the fastest way to get an app up and running with Node.js. This requires some familiarity with using the command line. I completed the Codeacademy course "Learn The Command Line" before beginning with Node.js. I think it helped me better understand what the commands are and what they do.

Download and install Node.JS Open the node command prompt. This was done on a windows machine.

First, create a folder for your app(s):

mkdir node_apps

Change the command prompt directory to your app's folder:

cd \app_name

Creates json file for your app. Fill out applicable info or just hit enter until the file is created.

npm init

Install express.js module in node.js:

npm install express

Install express-generator module in node.js:

npm install express-generator -g

Create the structure for your app and all necessary folders. (views, css, Javascript, routing, etc.)

express app_name

Ensure all app module dependencies are installed:

npm install

Start your server and web app:

npm start

Go to http://localhost:3000 in a browser. Port 3000 is the default port of Express. Your app is live in dev environment.

Notes

I learned most of this from this great blog post.
The above does not include a database integration. I integrated with a MongoDB database by following a blog post that has since been removed from the internet.
This YouTube video was also very helpful to me for figuring out MongoDB and Node.js integration.
An HTML shorthand language called jade (aka pug) comes stock within Express.js. Here's further reading on the pros and cons.
All of the above has been from my own studies. I do not claim anything listed as the most efficient or best way to use Node.js. This is what has worked for me over the past two days.
It feels good to whip up a nimble app environment that is capable of producing and supporting world changing software; Node.js is used by Netflix, PayPal, Microsoft and Uber.

posted at 21:16 · coding, javascript, programming, web development · command line creating websites express.js mongodb Node.js web dev website

Aug 05, 2017

Oversimplified Javascript Terms

I'm finally coming around in my understanding of Javascript. Here are a few quick explanations to help you if you are new to it.

Javascript

The language of the web. Most commonly used as a complement to HTML and CSS to create an interactive website.

JQuery

A popular Javascript library with many powerful commands that are quick and easy to call.

Node.JS

Software that allows you to run Javascript from the command line without being connected to the internet.

Express.JS

A popular Node.JS framework.

Angular

A popular front-end Javascript web framework. There are many out there but this seems to be the one I've heard of the most.

I've begun to see a pattern with programming languages:

Learn to execute the core building blocks like using variables, loops and functions.
Learn more advanced libraries, documentation and uses.
Consider using and learning web frameworks depending on what you're trying to do with the language.
Practice to hone your knowledge. Build things you like.

I've also noticed that Javascript has been easier to learn than Python was for me, because it's not my first programming language. The concepts are the same. The syntax can trip me up at times, but I'm currently flying through Codeacademy's Javascript courses. Sometimes it even seems fun!

posted at 18:37 · coding, javascript, programming · angular express.js javascript terms JQuery Node.js software understanding javascript web frameworks.

← Previous Next → Page 12 of 14