Nov 20, 2018
My research found that for creating a single-file Windows executable from a .py file, the front-running Python library is Pyinstaller.
It worked on my Windows 7 PC. My program used a Gooey GUI, but many of the popular
Python GUI libraries work as well.
Installation:
To install, enter this into command your command prompt or terminal:
python -m pip install pyinstaller
At the time of this article, this installed Pyinstaller version 3.3.1 using Python version 3.6.5.
Go here for a refresher on setting up pip if you need it.
Using the build.spec file and starting Pyinstaller:
Most examples I found used a build.spec file. Think of this as the blueprint by which your app will be constructed. It provides the compiling settings, images and any other data necessary for assembling your app. The format for passing the .spec file to Pyinstaller in pseudo-code:
pyinstaller (run mode options) (.spec file)
Basic start compiler command to build.spec:
pyinstaller build.spec
Establishing a debugging loop with Pyinstaller
Debug mode can be set from the command line:
pyinstaller -debug build.spec
OR
by passing debug=True to the EXE() function of the build.spec. I used the second option. See my full build.spec file at the bottom of this post. Pyinstaller displayed a lot of error messages while compiling my app, but it still compiled into a working .exe.
To see your app's error message, run the resulting your_app.exe from the command line.
You can find it in the 'dist' folder that pyinstaller creates when you pass it the build.spec file. Set the dist folder as your working directory, type your_app.exe and hit enter. Once you are reading and fixing error messages, you're well on the way to creating your own desktop app.
Flushing sys.stdout/Python printing for Windows:
Python's design requires some code to play nice with Windows when it prints a statement. Simply add this to your .py file. I used write mode 'w'. What worked for me: don't pass 0 to fdopen().
This was contrary to Gooey's instructions.
| import os
import sys
nonbuffered_stdout = os.fdopen(sys.stdout.fileno(), 'w')
sys.stdout = nonbuffered_stdout
|
Fetching the local user's system information:
In order to run on any user's system, we need to grab their local file paths. I accomplished this by referencing the sys._MEIPASS via the below code I found from a Stack Overflow post.
| def resource_path(relative_path):
""" Get absolute path to resource, works for dev and for PyInstaller"""
base_path = getattr(sys, '_MEIPASS', os.path.dirname(os.path.abspath(__file__)))
return os.path.join(base_path, relative_path)
temp_folder_path = getattr(sys, '_MEIPASS', os.getcwd())
relative_path = resource_path(temp_folder_path)
|
Adding library-specific patches:
- Below is the fix I found for the Pandas library, which I added to my build.spec:
| hiddenimports=['pandas._libs.tslibs.timedeltas','pandas._libs.tslibs.nattype',
'pandas._libs.tslibs.np_datetime','pandas._libs.skiplist']
|
2. The Gooey library needs some special code added to the build.spec for its images and languages data.
More details are found in this blog post,
written by the author of the Gooey library.
Last, but not least: don't panic.
Compiling Python to Windows binary code sounds like a dauntingtask, but it wasn't nearly as complex as I feared. The folks behind Pyinstaller have done a great job of making it possible and, dare I say, simple. Stay calm, drink some coffee, dig in and welcome the challenge with a trial and error mentality. I was able to figure it out over the span of one Saturday. Good luck.
Useful Resources:
- Pyinstaller Github - If Things Go Wrong
- Pyinstaller Documentation:Using Pyinstaller Arguments
- Gooey Pyinstaller Instructions
- Pandas hiddenimports Discussion
Caveats:
- You should compile the program on the operating system it will be run on. There are options for creating a multi-os compatible package, but I did not attempt them.
- Windows 7 is proven to work with Pyinstaller, per the documentation. It's also what I am running on. Other Windows systems older than Windows 10 may be less reliable.
- I experienced trouble when passing arguments from the command line to pyinstaller and have so far been unable to get the console window to hide. Neither the -w, --windowed, --noconsole arguments seemed to work. I will update if I am able to find a solution.
- Now that I am testing my compiled app, I am seeing 10x performance slowdowns when running as the final .exe vs. the original .py file. But at least the program runs correctly and still does so relatively quickly.
- I also received the error: "Fatal error: Pyinstaller does not include a pre-compiled bootloader for your platform." I fixed this by upgrading to the latest version of Pyinstaller:
pip3 install --upgrade pyinstaller
My full build.spec file, modified from here :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31 | # -*- mode: python -*-
import gooey
gooey_root = os.path.dirname(gooey.__file__)
gooey_languages = Tree(os.path.join(gooey_root, 'languages'), prefix = 'gooey/languages')
gooey_images = Tree(os.path.join(gooey_root, 'images'), prefix = 'gooey/images')
a = Analysis(['your_program.py'],
pathex=['C:\\Python36\\Lib\\site-packages\\your_appdir'],
hiddenimports=['pandas._libs.tslibs.timedeltas', 'pandas._libs.tslibs.np_datetime', 'pandas._libs.tslibs.nattype', 'pandas._libs.skiplist'],
hookspath=None,
runtime_hooks=None,
)
options = [('u', None, 'OPTION')]
a.datas += [('program_icon.ico', 'C:\\Python36\\Lib\\site-packages\\your_appdir\\program_icon.ico', 'DATA'),
('config_icon.png', 'C:\\Python36\\Lib\\site-packages\\your_appdir\\config_icon.png','DATA')]
pyz = PYZ(a.pure)
exe = EXE(pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
options,
gooey_languages,
gooey_images,
name='ADD YOUR APP NAME HERE',
debug=True,
strip=False,
upx=True,
console=True,
icon=os.path.join('program_icon.ico'))
|
Sep 14, 2018
Goal: recreate my resume in the dark Atom text editor theme
(background and fonts). Sub-goal: find a color eyedropper to grab
the actual color values of the Atom layout.
Approach #1: find an Atom eyedropper package to grab the colors. My
first thought was to find the easiest solution, within the packages of
my Atom text editor. After searching Atom's packages, the two best
potential solutions were "an-color-eyedropper" and "color picker" . The an-color-eyedropper
description sounds perfect: "A simple "real" color picker. By "real" I
mean it's able to pick colors anywhere on any screen."
Color picker
an color eyedropper
Unfortunately it failed to install and displayed the error, "Unable to
download 400 Bad Request Repository inaccessible". It seems to rely on
the "python" Atom package which is now deprecated. I was unable to find
a repo anywhere by googling.
Color picker has
easy-to-follow instructions and installed with no problem. It allows you
to quickly select any color visually with sliders. Then the RGB or
Hexadecimal values of your color are added as text in the editor in
proper format. However, we are looking for a color grabber to pull
colors from a screen object. This is more of a productivity enhancing
and color exploration tool for programmers. On to Python options.
Approach #2: Use the python tkcolorpicker package to grab the colors.
The first thing I found on Google was tkcolorpicker,
a package that uses the tkinter
library. I couldn't tell exactly what it was, so let's find out. First,
install via pip install:
python -m pip install tkcolorpicker
Then run the below script. Cool gadget for sure, but also not quite
what I was looking to use. It allows selection of a color with
sliders or input values, similar to Atom's color picker, but for user
input rather than color picking. Nice little tool. :D
| import tkinter as tk
import tkinter.ttk as ttk
from tkcolorpicker import askcolor
root = tk.Tk()
style = ttk.Style(root)
style.theme_use("clam")
hex_code, RGB_code = askcolor((255, 255, 0), root)
print(hex_code, RGB_code)
root.mainloop()
|
askcolor() returns a tuple with both the RGB and hex codes selected by
the user. Above, we are unpacking that tuple into the hex_code and
RGB_code variables.
Approach #3: Use the Python eyedropper package to grab the colors.
I then found eyedropper
for Windows, which has a minimalist repository and offers a simple
approach to desktop eyedropper functionality. Install eyedropper via
pip:
python -m pip install eyedropper
Hover your mouse over the object you want to grab the color from (in my
case, the Atom text editor background). Alternatively, I was able to run
eyedropper from the command line by entering:
py -m eyedropper
Mission possible. Then I hit ctrl+v in a text file and there was the
hex code for my Atom background. Some of the colors that eyedropper
grabbed were nearly identical to those in the Atom text editor dark
theme. Others were not quite the same. I made slight eyeball adjustments
to the colors for some of the fonts.
Microsoft Word uses RGB codes but eyedropper gave us hex. To convert, I
found this website practical and
quick. Alternatively, you could convert a hex code to RGB with python:
| hex_code = input("Enter hex: ").lstrip("#")
RGB_code = tuple(int(hex_code[i : i + 2], 16) for i in (0, 2, 4))
print("RGB =", RGB_code)
|
Bonus: use pd.read_clipboard()
docs
to get the hex codes.
Once eyedropper sends the color values to your system's clipboard, there
are multiple ways to access them. This alternative uses pandas.
Installing pandas and pyperclip with pip:
| python -m pip install pandas
python -m pip install pyperclip
|
On Linux, install xclip or xsel
sudo apt-get install xclip
To get the clipboard contents with pandas:
| import pandas as pd
hex_code_df = pd.read_clipboard()
print(hex_code_df.head())
|
Supplementary Notes and Links
Aug 25, 2018
GUI stands for "Graphical User Interface", aka the part of a program designed for human interaction. Adding a GUI to a Python
script allows anyone to run it without having to code or use the command line.
There are several GUI libraries in Python. A few I have heard of are Tkinter (comes in the standard library), wxPython, PyQT,
easygui, DearPyGui and PySimpleGUI. I explored Tkinter
back when I first got into Python. It was more intricate and offered more control over the look of your app, and took longer to pick up. Gooey is more of a pre-packaged GUI library.
The Gooey Github page was most useful to me and helped me to do what I needed.
The script posted in this blog helped as well. I needed to enable a human to
supply three files and enter a number. Gooey was a good match for this. The library has two branches:
- some basic widgets piggyback off the argparse library
- another part of the library uses a function called the GooeyParser. The GooeyParser offers more advanced widgets, like a file chooser. This was exactly what I was looking to use to pull in files for my script.
Installing Gooey
Argparse comes stock with Python. You can install Gooey via the pip installer. Open command prompt or terminal and enter:
python -m pip install Gooey
Below is a basic argparse/Gooey combination script. The argparse version offers a handful of widgets such as checkboxes and dropdown, but I had trouble getting them to work with the GooeyParser (used in 2nd script).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 | from argparse import ArgumentParser
from gooey import Gooey
@Gooey(program_name='Report Generator', default_size=(575, 600))
def get_args():
"""Demonstrating python's vars built-in to store arguments in a python dict."""
parser = ArgumentParser(description='A simple argument parser', epilog='This is where you might put example usage')
parser.add_argument('Name', action='store', required=True, help='Help text for option X')
parser.add_argument('Email', help='Help text for option Y', default=False)
parser.add_argument('Campaign Number', help='Help text for option Z', type=int)
parser.add_argument('Campaign Segment', choices=['A', 'B','All'], default='a', nargs='?')
user_inputs = vars(parser.parse_args())
print(user_inputs)
name = user_inputs['Name']
campaign_number = user_inputs['Campaign Number']
return parser.parse_args()
if __name__ == '__main__':
get_args()
|
Side note: Check out Python's vars() built-in function above.
It returns the input data as a dictionary called user_inputs. Then we can get the values via the dictionary's keys. Pretty nifty!
The @Gooey() part of the code is an advanced function known as a decorator in Python.
Put simply, decorators are functions that modify the function to which they are attached.
Below is my script that uses the more advanced GooeyParser for its "FileChooser" widget. Gooey allows you to group widgets together
and set how many widgets per line with the gooey_options={} parameter.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 | from gooey import Gooey, GooeyParser
@Gooey(program_name='Email Campaign Reporting Generator', default_size=(575, 600))
def get_args():
"""Adding two argument groups, each accepting two arguments. Using gooey_options to set layout."""
parser = GooeyParser(description='Export campaign report spreadsheets and upload below.')
top_group = parser.add_argument_group(gooey_options={'show_border': False,'columns': 1})
top_group.add_argument('Contact List', help='Upload Send List (.xlsx)', widget='FileChooser')
top_group.add_argument('Opens List', help='Upload Opens List (.xlsx)', widget='FileChooser')
top_group.add_argument('Unsubscribe List', help='Upload Unsubscribe List (.xlsx)', widget='FileChooser')
bottom_group = parser.add_argument_group(gooey_options={'show_border': False,'columns': 1, 'required':False})
bottom_group.add_argument('Campaign ID', action='store', help="Number found in the Campaign 'Reports' tab")
bottom_group.add_argument('Campaign Segment', action='store', help='Enter A, B, or All. All lists supplied must match segment.')
user_inputs = vars(parser.parse_args())
name = user_inputs['Name']
return parser.parse_args()
if __name__ == '__main__':
get_args()
|
Overall, Gooey knows what it wants to be, an easy to use GUI framework for Python. It does it well. Here's a screenshot of my program's shiny GUI:
Now that I have a GUI on top of my program and it delivers the expected output file, I'm hoping to take it one step further by packaging it up as a Windows .exe file.
This would allow it to run as a desktop app on any Windows computer without the need to install Python or library dependencies.
I've only begun exploring options to do this but a few libraries I've heard of are pyinstaller, cx_Freeze and Py2Exe. Updates coming if I figure it out. Cheers :D
Update: I did figure out how to compile my Gooey app to a Windows application with Pyinstaller. You can read more on how I did it here.
Jul 15, 2018
Findstr is the Windows alternative to GREP, which runs on the
Unix operating system.
Findstr searches files with regular expressions
and seems useful for string matching within files and directories. It is one of over 280 command prompt commands.
Here's the official Windows Documentation
and some Linux vs. Windows Examples.
Update: Windows announced that Grep and several other Unix command line tools will be added to Windows 10.
This is a new alternative to findstr.
This findstr command returns all lines containing an '@' in a text file.
findstr @ test.txt
I was happy to see Findstr's convenient help menu:
findstr -?
Regular expressions are so powerful. It's nice to have this utility within the command prompt.
I am hoping to get to know some of the other 280 command prompt commands.
I've previously explored regex with Python. This Python regex example finds all words in a text file containing '@' symbols:
| import re
# read the file to string + regex email search
with open('test.txt', 'r') as fhand:
string = fhand.read()
# this regex returns a python list of emails:
emails = re.findall('(\S*@\S+)', string)
print(emails)
|
For more command prompt nuggets, check out my more recent post:
Exploring Windows Command Line Tools, Batch Files and Remote Desktop Connection.
May 12, 2018
Introduction
Thanks to pandas, I have automated some data cleaning and file reading processes at my job. Here are some terms and code that have been useful or interesting to me after 2 years of exploration. I also checked out "Python for Data Analysis" from theChicago Public Library.
If I could suggest anything to be successful with pandas, it is repetition. I use it nearly every day at work. Dive into its API documentation. There are tons of useful tools there, laid out with meticulous detail and examples. I began learning pandas with this PyCon 2015 tutorial from Brandon Rhodes, it's informative and entertaining! (It's a little dated now but I still recommend it.) The Reproducible Data Analysis in Jupyter video series by Jake VanderPlas is also a great example of pandas-related workflows.
Table of Contents
- Pythonpandas Installation and Version Compatibility
- Welcome to pandas 1.0
- Data Wrangling, Exploration and Broadcasting
- Series.str & Series.dt accessors
- apply, applymap, lambda and map
- featuring pandas.to_markdown()
- SQL operations with df.merge() and pandas.read_sql()
- pandas.read_clipboard()
- converting between Series and DataFrame
- Turning json API responses into a dataframe with pandas.json_normalize()
- Plotting Visualizations with matplotlib
- Supplementary Resources and Links
(1) Python + pandas Installation and Version Compatibility
Python 3.6 and higher can install pandas 1.0.
Installing Python 3.8 on Windows
For Windows installation, see the python docs for an installer, "Using Python on Windows".
Installing Python 3.8 on Ubuntu
Follow these steps to download and install Python 3.8 in the Ubuntu terminal. To upgrade to pandas 1.0, I installed Python 3.8, the latest stable release, "from source" on Ubuntu 16.04.
If you intend to use pandas.to_markdown() on Ubuntu, it might save you trouble to pre-emptively install the '_bz2' library before you build your Python from source.
On Ubuntu, I ran into ModuleNotFoundError: No module named '_bz2' and fixed by entering in the terminal:
sudo apt-get install libbz2-dev
I also saw this message when completing install:
The necessary bits to build these optional modules were not found. To find the necessary bits, look in setup.py in detect-modules() for the module's name.
If you need to re-build Python on Ubuntu, enter:
cd /home/erick/Python-3.8.0/
./configure --enable-loadable-sqlite-extensions && make && sudo make install
I installed missing _bz2 and _sqllite3 modules then re-built with these commands.
Installing Older pandas Versions on Ubuntu
The version downloaded with this command is about 6 months behind the current version. For me, this installed pandas 0.17 on Ubuntu:
sudo apt-get install python3-pandas
As of February 2020, this command installs pandas version 0.24 with pip when used with Python 3.5 on Linux Ubuntu 16.04:
python3.5 -m pip install pandas
If pandas is already installed, you can upgrade with pip:

python -m pip install --upgrade pandas
To check if pip is installed:
python -m pip list
Best Practice: Virtual Environments
Create a virtual environment with your new Python version. venv wasn't included in my Python 3.8 installation on Ubuntu 16.04, so I installed virtualenv:
python -m pip --user install virtualenv
Let's create a new virtual environment. Enter in terminal or command prompt:
virtualenv -p python3.8 add_env_name_here
Now, activate your new virtual environment on Linux:
source add_env_name_here/bin/activate
Or activate environment on Windows:
cd add_env_name_here\scripts & activate
"ImportError: Missing optional dependency 'tabulate'. Use pip or conda to install tabulate:" To use pd.to_markdown(), install Tabulate:
python -m pip install tabulate
To use pd.read_clipboard() on Linux, install xclip or xsel:
sudo apt-get install xclip
I also saw a prompt to install pyperclip:
python -m pip install pyperclip
Now install pandas 1.0 and matplotlib in your virtual environment for visualizations.
python3.8 -m pip install pandas
python -m pip install -U matplotlib
(2) Welcome to pandas 1.0
You did it! Welcome to the good life. The basis of pandas is the "dataframe", commonly abbreviated as df, which is similar to a spreadsheet. Another core pandas object is the pandas.Series object, which is similar to a Python list or numpy array. When imported, pandas is aliased as "pd". The pd object allows you to access many useful pandas functions. I'll use it interchangeably with pandas in this post.
The library’s name derives from panel data, a common term for multidimensional data sets encountered in statistics and econometrics.
pandas: a Foundational Python Library for Data Analysis and Statistics
(3) Data Wrangling, Exploration and Broadcasting
Data is commonly read in from file with pd.read_csv()
| import pandas as pd
file_name = 'my_bank_statement.csv'
# you may sometimes need to specify an alternate encoding: encoding = "ISO-8859-1"
df = pd.read_csv(file_name, encoding='utf-8')
print(df.head())
print(df.shape) # returns a tuple: (# of rows, # of columns)
print(df.dtypes)
print(df.info())
|
Create a dataframe from a list of Python lists, named movies below, with pd.DataFrame:
1
2
3
4
5
6
7
8
9
10
11
12
13 | import pandas as pd
column_names = ["Title", "Release Date", "Character", "Actor", "Movie Budget", "Worldwide Gross"]
movies = [["Ocean's 11", "12/7/2001", "Danny Ocean", "George Clooney","$85,000,000"," $450,728,529"],
["Ocean's 11", "12/7/2001", "Tess Ocean", "Julia Roberts","$85,000,000"," $450,728,529"],
["Runaway Bride", "6/30/1999", "Ike Graham", "Richard Gere","$70,000,000","$309,457,509"],
["Runaway Bride", "6/30/1999", "Maggy Carpenter", "Julia Roberts","$70,000,000","$309,457,509"],
["Bonnie and Clyde", "9/1/1967", "Clyde Barrow", "Warren Beaty","$2,500,000", "$70,000,000"],
["Bonnie and Clyde", "9/1/1967", "Bonnie Parker", "Faye Dunaway","$2,500,000", "$70,000,000"]]
df = pd.DataFrame(movies, columns=column_names)
df = df[["Title","Character", "Actor", "Movie Budget", "Worldwide Gross"]]
print(df.to_markdown(showindex=False, tablefmt="simple"))
|
Let's print the table to our terminal with pd.to_markdown() new in pandas version 1.0.0:
Slicing and sorting a dataframe, removing duplicates, and working with datetime objects
- Let's create a new dataframe slice with only two columns
- Drop duplicate movies
- Convert the dates to datetime objects
- Get the year from an array of datetime objects
- Set the year as the dataframe index
| df = pd.DataFrame(movies, columns=column_names)
date_df = df[['Title', 'Release Date']].drop_duplicates(subset=['Title'])
date_df['Release Date'] = pd.to_datetime(date_df['Release Date'])
# create year column using the pd.Series.dt datetime accessor:
date_df['Release Year'] = df['Release Date'].dt.year
date_df = date_df.sort_values(by=['Release Date'])
date_df = date_df.set_index('Release Year')
print(date_df.to_markdown(showindex=False, tablefmt='simple'))
|

Applying Broadcasting in pandas
Broadcasting means to map a function or an arithmetic calculation over an over an array (using apply or map) or dataframe (applymap).
"Summing up, apply works on a row/column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series."
Applying a function to a pandas column
- Convert columns to int and calculate the difference between two columns.
- Let's format those integers back to dollars with python's lambda and pandas' applymap for extra jazz.
| def format_dollars_as_int(dollars):
"""Accepts a dollar formatted string, returns an int."""
number = dollars.replace('$','').replace(',','')
return int(number)
df = pd.DataFrame(movies, columns=column_names)
df = df.drop_duplicates(subset=['Title'])
df[['Movie Budget','Worldwide Gross']] = df[['Movie Budget','Worldwide Gross']].astype(str).applymap(format_dollars_as_int)
df['Movie Net Income'] = df['Worldwide Gross'] - df['Movie Budget']
money_columns = ['Movie Budget', 'Worldwide Gross','Movie Net Income']
df[money_columns] = df[money_columns].applymap(lambda x:'${:,}'.format(x))
|
Creating a new column and writing to a .csv file
- Then add the IMDB ratings of our three films in a new column.
- Finally, write the result to markdown and a csv file.
| # create a new column with the three movies' IMDB ratings
df['IMDB Rating'] = list([7.8,5.5,7.8])
print(df.to_markdown(showindex=False, tablefmt='simple'))
df.to_csv('Movies.csv', index=False)
print(df.Actor.value_counts().to_markdown(tablefmt="github"))
|
See also: pandas.Series.value_counts() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html
Notice for column names without spaces, you can use dot notation instead of brackets:
| # both valid ways to access column by name
df.Actor
df['Actor']
|
Lowercase column names Python's map function:
| df.columns = map(str.lower, df.columns)
|
Strip whitespace from a column of strings with the pandas.Series.str accessor:
| df['Character'] = df['Character'].astype(str).str.strip()
|
Fix pesky leading zero zip codes with str.zfill():
| log_df['zip'] = log_df['zip'].astype(str).str.zfill(5)
|
Get a row by index number pandas.DataFrame.loc[]:
| first_row = df.loc[0, df.columns]
third_row = df.loc[2, df.columns]
|
Filter the df to get rows where the actor is 'Julia Roberts'.
| julia_roberts_movies = df[df.Actor=='Julia Roberts'].reset_index(drop=True)
print(julia_roberts_movies.head())
|
"Get" an item from a column of lists with str.get().
| # returns first item in each cell's list into new column
df['first_item'] = df['items'].str.get(0)
|
Execute SQL-like operations between dataframes with df.merge().
First, use df.copy() to create a new dataframe copy of our actors table above. By default, df.merge() uses an inner join to merge two dfs on a common column. Let's add each film's release year from our date_df to our original actors table, with an inner join based on 'Title':
| actors = df.copy(deep=True)
# slice only the columns we want to merge:
date_df = date_df[['Title','Release Year']]
joined_df = actors.merge(date_df, on='Title', how='inner')
# You can pass the number of rows to see to head. It defaults to 5.
print(joined_df.head(10))
|
Execute database queries with pd.read_sql().
When the chunksize argument is passed, pd.read_sql() returns an iterator. We can use this to iterate through a database with lots of rows.
When combined with DB connection libraries like pyodbc
or SQLAlchemy, you can process a database in chunks. In this example, it's an Access DB connection via pyodbc to process 500,000 rows per chunk.
Pyodbc works on a wide range of other databases also.
uses pd.Series.isin() to check if each email is in the DB.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 | import pandas as pd
import pyodbc
emails = ['[email protected]', '[email protected]', '[email protected]']
connection_string = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\path_to_db\emails_database.accb;'
print(connection_string)
conn = pyodbc.connect(connection_string)
query = """
SELECT *
FROM ADD_TABLE_NAME
"""
dfs = list()
for i, db_chunk in enumerate(pd.read_sql(query, conn, chunksize=500000)):
emails_in_db = db_chunk[db_chunk.Email.isin(emails)]
dfs.append(emails_in_db)
print(i)
emails_in_db = pd.concat(dfs)
emails_in_db.to_csv('DB_Email_Query_Results.csv', index=False)
|
In case you are wondering, enumerate is a python built-in for enumerating, or counting an iterable, e.g. list or generator, as you iterate through it.
Using pd.read_clipboard():
| import pandas as pd
clipboard_contents = pd.read_clipboard()
print(clipboard_contents)
|
Use pd.to_clipboard() to store a dataframe as clipboard text:
| import pandas as pd
truths = ['pandas is great','I love pandas','pandas changed my life']
df = pd.DataFrame([truths], columns=['Truths'])
df = df.to_clipboard(index=False, sep='|')
clipboard_contents = input('Press ctrl-v ')
print(clipboard_contents)
|
Convert the clipboard contents to df with pd.DataFrame() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html:
| import pandas as pd
clipboard_contents = list(input('Press ctrl-v '))
df = pd.DataFrame([clipboard_contents])
print(df.head())
|
If the clipboard dataframe has one column, you could squeeze the clipboard contents into a pd.Series object:
| import pandas as pd
clipboard_text = pd.read_clipboard()
clipboard_contents = list(clipboard_text)
df = pd.DataFrame([clipboard_contents], columns=['Clipboard Data'])
clipboard_series = df.squeeze(axis='columns')
print(type(clipboard_series))
|
Inversely, consider using pandas.Series.to_frame() to convert a Series to a dataframe:
| import pandas as pd
clipboard_contents = pd.Series(input('Press ctrl-v '))
df = clipboard_contents.to_frame()
print(df.head())
|
(4) Turning json API responses into a dataframe with pd.json_normalize()
Update: beginning in pandas 1.0, json_normalize became a top-level pandas namespace. For older pandas versions:
| import pandas as pd
import requests
url = 'pseudo_API.com/endpoint/'
parameters = {'page_size'=100, format='json', api_type='contact_sync' }
response = requests.get(url=url, params=parameters)
data = response.json() # decode response into json
df = pd.json_normalize(data['any_key'])
|
pandas.json_normalize() is now exposed in the top-level namespace. Usage of json_normalize as pandas.io.json.json_normalize is now deprecated and it is recommended to use json_normalize as pandas.json_normalize() instead (GH27586).
What's new in pandas 1.0.0
(5) Plotting Visualizations with matplotlib
Make a bar plot of the movie release year counts using pandas and matplotlib formatting.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 | import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import matplotlib.ticker as ticker
column_names = ["Title", "Release Date", "Character", "Actor"]
rows = [["Ocean's 11", "12/7/2001", "Danny Ocean", "George Clooney"],
["Ocean's 11", "12/7/2001", "Tess Ocean", "Julia Roberts"],
["Runaway Bride", "6/30/1999", "Ike Graham", "Richard Gere"],
["Runaway Bride", "6/30/1999", "Maggy Carpenter", "Julia Roberts"],
["Bonnie and Clyde", "9/1/1967", "Clyde Barrow", "Richard Gere"],
["Bonnie and Clyde", "9/1/1967", "Bonnie Parker", "Julia Roberts"]]
df = pd.DataFrame(rows, columns=column_names)
ax = df.Year.value_counts().plot(x='title', ylim=0, kind='bar', title='Release Year of Movies', rot=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
fig = ax.get_figure()
fig.tight_layout()
fig.savefig('images/Movie_Plot.png')
|
Use Jupyter Notebook to show plot, and/or download plot from command line.
Plot George Clooney's movies over time in a line graph. [Source]
| import matplotlib.pyplot as plt
df = df[df.Actor=='George Clooney']
df.groupby(['Year']).size().plot(ylim=0)
fig = ax.get_figure()
fig.savefig('figure.pdf')
|
(7) Supplementary Resources and Guides
Popular Supporting Libraries and Tools
Supplementary Resources:
Mar 24, 2018
- pytudes: "Python programs for perfecting programming skills."
- Building a video synthesizer in Python
- Running micropython on a microcontroller
- Pandas: Super awesome excel and data analysis library. The talk
Pandas from the Ground Up by Brandon Rhodes is great!
- Fuzzy Wuzzy: fuzzy string matching Python library, written by SeatGeek
- Talk Python to Me, a Python podcast
- Free Programming Books from O'Reilly
Jan 20, 2018
Python comes with a bunch of standard modules. My favorites are
shutil, glob, datetime, time, os (operating system), re (regular
expressions) and webbrowser. The standard library is loaded.
Inevitably, you'll want to install new libraries from Python's rich
ecosystem of external modules. Enter pip, Python's handy package manager
and people's champion.
This post will teach you some Python history, show how to install
pandas, and help you troubleshoot problems if it's not working. You'll
find Windows and Linux commands for venv setup (recommended). With pip,
you'll feel like Neo when installing new modules. Any skill is at your
fingertips. It's like learning kung fu. There's probably a library for
that!
First, Some Python Version Caveats + History
Python 2 reached end of life on January 1st, 2020.
Python 2 has officially been sunset.
Python comes with pip now, no setup is required. But certain versions
such as Python 3.2 or the Python 2.7 that came stock on my improbably
still functioning 2008 black Macbook, for example, may not have it
installed.
In December 2021, Python 3.6 reached "end of life phase". Python 3.6 is "now effectively frozen". Read more in PEP 494. (Released Oct. 2022)
TLDR: use Python 3.7 to 3.11. This blog endorses using the lightning
fast Python version 3.11.
Enter This in Your Terminal
python -m pip install pandas
Pandas is a super useful library for wrangling spreadsheet data, AKA
"tabular" data. If successful, you should see activity that looks
similar to the below screenshot, where I am installing openpyxl, an additional Python
Excel library you'll likely want. You are good to go! This is the part
where you get to feel like Neo! See Installing Python Modules in the Python
Documentation for more detailed instructions.
To view all your installed libraries, enter:
pip list
Write a "requirements.txt" of installed libraries:
pip freeze > requirements.txt
You can list your outdated packages with the --outdated argument:
pip list --outdated
Use pip's -h help command line argument:
pip -h
View your system and user pip config settings:
pip config debug
Supplementary Resources
Congrats on figuring out how to install packages with pip, have fun!
Having issues? Try upgrading your pip version.
python -m pip install --upgrade pip
Try the ensurepip command.
This command will install and upgrade pip to the newest version. New in
Python 3.4:
python -m ensurepip --upgrade
"The ensurepip
package provides support for bootstrapping the pip installer into an
existing Python installation or virtual environment. This
bootstrapping approach reflects the fact that pip is an independent
project with its own release cycle, and the latest available stable
version is bundled with maintenance and feature releases of the
CPython reference interpreter."
- ensurepip Python Documentation
You should follow best practice and create a virtual environment before installing
libraries. venv or virtualenv. To create with venv:
python3 -m venv add_env_name_here
After your environment is created, activate it with the first command
below, then install a library on Ubuntu Linux:
source add_env_path_here/bin activate
python -m pip install pandas
Alternatively, on Windows computers:
cd add_env_path_here\scripts & activate
python -m pip install pandas
Getting the prefix right can be tricky.
In the install command, the prefix is a reference to your Python
executable. You may just need to alter your prefix to call it correctly.
Here are some to try in place of "python". Observe what happens when you
run these command variations. Good luck!
python3 -m pip install pandas
python3.11 -m pip install pandas
py -m pip install pandas
pip3 install pandas
How to Manually Enable the pip Installer
The rest of this post may be useful to you if you are:
- Working on legacy Python 2 or < 3.3 for which pip is not installed.
- Seeking to fix a faulty pip install that is not working properly.
- Curious to know how to manually set up pip.
Assumes Python is already installed. If you're running Windows 10, I
found it easy to install Python from the Windows store.
Download the get-pip.py file.
Go to the link, right click the page and "Save As" a .py file to
download. Then place the file where you want to access it. I placed mine
in C:Python27Libsite-packages
You could also download the file with curl:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.pyt-pip.py
If you are not sure where your site-packages folder is, type python -m
site
into command prompt for file path ideas.
Run the get-pip.py file.
Using command prompt's cd command with a Windows "&" operator to run the
Python file in a Windows command prompt:
cd c:\Python27\Lib\site-packages & python get-pip.py
Or Linux terminal:
cd /Python27/Lib/site-packages && python get-pip.py
You should see some activity in command prompt that shows
installation/updating of "setup" and "wheel". When it finishes, you have
installed pip.
Type into command prompt at the same location:
python -m pip install requests
This installs the Requests module into your Python libraries.
Requests is an http module which is highly regarded almost universally
by the Python community.
Thanks for reading!
Check out these other posts with pip installed library examples:
fix Grammar and Spelling with language_tool_python and textblob
static site generation with pelican
text mojibake mash fixing with ftfy
a guide to making HTTP requests
simple GUI for scripts with gooey
Jan 14, 2018
The basis of many great programs revolve around a simple set of operations:
- Open a file.
- Do something with the file contents.
- Save the new file for the user.
Python is nice and simple for this. Paste the below lines into a text editor and save as a .py file. You need to have Python 3 installed. In the same folder as your .py file, save a .txt file with some words in it. Alright, let's write some code:
| file_name = input("Enter your file name. e.g. words.txt")
file_handle = open(file_name, "r")
lines = file_handle.readlines()
print (lines)
file_handle.close()
|
In line 1, we ask the user to enter their file name with Python's raw_input function. When the program runs, the user enters their text file name with extension. This line stores the name in a variable called file_name.
In line 2, we open your text file and store it in a variable I have named file_handle. Think of the file handle as a bridge between your code and the text file. Quick point about the 'r' above: that tells the program to open the file in "Read" mode. There are several different file modes in programming. Some modes are just for reading an existing file, some are just for writing a new file, and some are capable of both. This Stack Overflow post is well written and details the differences between file modes. Once established, the file handle allows you to read the file's contents or write new contents to the file.
In line 3, we are calling the .readlines() method on our file handle. This method takes the file contents and stores them, line by line, into a list named "lines". An alternative method is .read(), which opens the file and stores its contents as one string. Try switching this out in place of .readlines() to check out the difference.
In line 4, we are printing the stored lines to show them to the user. We now have the file contents, ready to be used however we please.
In line 5, we are closing the file.
Below, we are going to write a new file using the with statement, which is generally accepted as the best way to read or write a file:
with open("Notes.txt", "w") as fhand:
fhand.write("Did you know whales can live up to 90 years?")
In line 1, we are using Python's input function to ask the user what to name the file and storing it in a variable named file_name.
In line 2, we are calling the open function again that we used in the first example, but this time, notice the "w". This indicates that we are opening the file in "write" mode.
In line 3, we are calling the .write() method on our file handle, named save_file, and passing it our text to be saved in our new file.
In line 4, we are closing the file, completing the creation of our new file in the same folder as our .py program file.
Your program is now ready to be run. Double-click your .py file to execute it.
Before learning Python, file operations were a mystery to me. It took me a while to understand this clearly, and I wanted to share. Once you master these basic file operations, programming gets to be a lot more fun. Do try it out for yourself :D
Aug 09, 2017
Here is what I have deduced is the fastest way to get an app up and
running with Node.js. This requires some familiarity with using the
command line. I completed the Codeacademy course "Learn The Command
Line"
before beginning with Node.js. I think it helped me better understand
what the commands are and what they do.
Download and install Node.JS Open the node
command prompt. This was done on a windows machine.
First, create a folder for your app(s):
mkdir node_apps
Change the command prompt directory to your app's folder:
cd \app_name
Creates json file for your app. Fill out applicable info or just hit enter until the file is created.
npm init
Install express.js module in node.js:
npm install express
Install express-generator module in node.js:
npm install express-generator -g
Create the structure for your app and all necessary folders. (views,
css, Javascript, routing, etc.)
express app_name
Ensure all app module dependencies are installed:
npm install
Start your server and web app:
npm start
Go to http://localhost:3000 in a browser. Port 3000 is the default port
of Express. Your app is live in dev environment.
Notes
- I learned most of this from this great blog post.
- The above does not include a database integration. I integrated with
a MongoDB database by following a blog post that has since been removed from the internet.
- This YouTube video
was also very helpful to me for figuring out MongoDB and Node.js integration.
- An HTML shorthand language called jade (aka pug) comes stock within
Express.js. Here's further reading
on the pros and cons.
- All of the above has been from my own studies. I do not claim
anything listed as the most efficient or best way to use Node.js.
This is what has worked for me over the past two days.
- It feels good to whip up a nimble app environment that is capable of
producing and supporting world changing software; Node.js is used by
Netflix, PayPal, Microsoft and Uber.
Aug 05, 2017
I'm finally coming around in my understanding of Javascript. Here are a few quick explanations to help you if you are new to it.
Javascript
The language of the web. Most commonly used as a complement to HTML and CSS to create an interactive website.
JQuery
A popular Javascript library with many powerful commands that are quick and easy to call.
Node.JS
Software that allows you to run Javascript from the command line without being connected to the internet.
Express.JS
A popular Node.JS framework.
Angular
A popular front-end Javascript web framework. There are many out there but this seems to be the one I've heard of the most.
I've begun to see a pattern with programming languages:
- Learn to execute the core building blocks like using variables, loops and functions.
- Learn more advanced libraries, documentation and uses.
- Consider using and learning web frameworks depending on what you're trying to do with the language.
- Practice to hone your knowledge. Build things you like.
I've also noticed that Javascript has been easier to learn than Python was for me, because it's not my first programming language.
The concepts are the same. The syntax can trip me up at times, but I'm currently flying through
Codeacademy's Javascript courses. Sometimes it even seems fun!