Lo-Fi Python

Dec 31, 2021

Phone Number Cleaning Regex + pandas Series Example

This is a solution I worked out recently to strip phone numbers into a uniform format. To install pandas with pip, enter in command prompt:

python -m pip install pandas

The pandas library has regex built in and it's pretty neat! Behold the power of pandas and a regular expression to do trivial telephone tidying:

strip phone formatting with Python
1
2
3
4
5
6
import pandas as pd
s = pd.Series(data=["(010) 001-1010"], name="Phone", dtype="str")
# remove parentheses, hyphens and spaces with pandas + regex
s = s.str.replace(pat="\(|\)|-| ", repl="", regex=True)
print(s)
# resulting number: "0100011010"

Regex is cool.

Grasping the intricacies of what this code is doing feels elegant when you connect the dots.. or pipes. The replace is done via a pandas str accessor. In the pat string, the parentheses are escaped with slashes and separated by pipes "|". They act as an or operator, succinctly chaining multiple characters together for matching and in this case replacing them with nothing. Pretty nifty. If you read the pandas docs, you'll find regex is accessible in different parts of the API. Dive in, it's some of my favorite documentation to snoop. There is so much you can do with pandas. This example demonstrates how its flexible functions get the job done efficiently.

Further Reading:

pandas.Series documentation

pandas str.replace documentation

Source of the famous “Now you have two problems” quote