This is a solution I worked out recently to strip phone numbers into a uniform format. To install pandas with pip, enter in command prompt:
python -m pip install pandas
The pandas library has regex built in and it's pretty neat! Behold the power of pandas and a regular expression to do trivial telephone tidying:
| import pandas as pd
s = pd.Series(data=["(010) 001-1010"], name="Phone", dtype="str")
# remove parentheses, hyphens and spaces with pandas + regex
s = s.str.replace(pat="\(|\)|-| ", repl="", regex=True)
print(s)
# resulting number: "0100011010"
|
Regex is cool.
Grasping the intricacies of what this code is doing feels elegant when you connect the dots.. or pipes. The replace is done via a pandas str accessor. In the pat string, the parentheses are escaped with slashes and separated by pipes "|". They act as an or operator, succinctly chaining multiple characters together for matching and in this case replacing them with nothing. Pretty nifty. If you read the pandas docs, you'll find regex is accessible in different parts of the API. Dive in, it's some of my favorite documentation to snoop. There is so much you can do with pandas. This example demonstrates how its flexible functions get the job done efficiently.
Further Reading:
pandas.Series documentation
pandas str.replace documentation
Source of the famous “Now you have two problems” quote
Findstr is the Windows alternative to GREP, which runs on the Unix operating system.
Findstr searches files with regular expressions and seems useful for string matching within
files and directories. It is one of over 280 command prompt commands.
Here's the official Windows Documentation and
some Linux vs. Windows Examples.
Update: Windows announced that Grep and several other Unix command line tools will be added to Windows 10. This is a new alternative to findstr.
This findstr command returns all lines containing an '@' in a text file.
findstr @ test.txt
I was happy to see Findstr's convenient help menu:
findstr -?
Regular expressions are so powerful. It's nice to have this utility within the command prompt. I am hoping to get to know some of the other 280 command prompt commands.
I've previously explored regex with Python. This Python regex example finds all words in a text file containing '@' symbols:
| import re
# read the file to string + regex email search
with open('test.txt', 'r') as fhand:
string = fhand.read()
# this regex returns a python list of emails:
emails = re.findall('(\S*@\S+)', string)
print(emails)
|
For more command prompt nuggets, check out my more recent post: Exploring Windows Command Line Tools, Batch Files and Remote Desktop Connection.