Strings and Dates
By completing this module you will be able to:
- Manipulate strings in advanced ways
- Use regular expressions to search and transform text
- Work with dates and times using the datetime module
- Perform operations with time intervals
Why Master Text and Dates?
It’s estimated that more than 80% of the world’s data is text: names, addresses, emails, messages, social media posts, documents… As a programmer, you’re going to spend a lot of time working with text.
Think about the everyday tasks of any application:
- Validate that an email has the correct format before sending it
- Clean the data a user types in a form (extra spaces, random capitalization…)
- Extract information from messy text (prices from a webpage, dates from a document)
- Format dates so they display in the user’s language
This chapter will give you the tools to do all of this elegantly and efficiently.
String Manipulation
Imagine you receive data from a web form. The user typed " JOHN SMITH " (with extra spaces and all caps). You need to save it as “John Smith”. Strings in Python come with an arsenal of methods for exactly these situations.
Search Methods
It’s like using Ctrl+F in your text editor, but with superpowers. You can search for where a word is, if the text starts or ends a certain way, how many times something appears…
1text = "Python is a very popular programming language"
2
3# Find substring
4print(text.find("programming")) # 24 (position where it starts)
5print(text.find("Java")) # -1 (not found)
6
7# Check start/end
8print(text.startswith("Python")) # True
9print(text.endswith("language")) # True
10
11# Count occurrences
12print(text.count("a")) # 4Transformation Methods
This is where you clean and transform text. Going back to the form example: you need to remove spaces, convert to uppercase/lowercase, replace characters… Python has a method for each situation.
1text = "hello world"
2
3print(text.upper()) # HELLO WORLD
4print(text.lower()) # hello world
5print(text.capitalize()) # Hello world
6print(text.title()) # Hello World
7print(text.swapcase()) # HELLO WORLD -> hello world 1text = " spaces "
2
3print(text.strip()) # "spaces" (removes both sides)
4print(text.lstrip()) # "spaces " (left only)
5print(text.rstrip()) # " spaces" (right only)
6
7# Padding
8number = "42"
9print(number.zfill(5)) # "00042"
10print(number.center(10)) # " 42 "
11print(number.ljust(10)) # "42 "
12print(number.rjust(10)) # " 42"1text = "I love Python. Python is great."
2
3# Replace all occurrences
4new = text.replace("Python", "Java")
5print(new) # I love Java. Java is great.
6
7# Replace only N occurrences
8new = text.replace("Python", "Java", 1)
9print(new) # I love Java. Python is great. 1# Split a string
2text = "one,two,three,four"
3parts = text.split(",")
4print(parts) # ['one', 'two', 'three', 'four']
5
6# Split with limit
7parts = text.split(",", 2)
8print(parts) # ['one', 'two', 'three,four']
9
10# Split lines
11multiline = "line1\nline2\nline3"
12lines = multiline.splitlines()
13print(lines) # ['line1', 'line2', 'line3']
14
15# Join a list into string
16words = ['Python', 'is', 'great']
17sentence = ' '.join(words)
18print(sentence) # Python is great
19
20path = '/'.join(['home', 'user', 'documents'])
21print(path) # home/user/documentsString Formatting
Imagine you want to display a message like “Hello Ana, your order of $19.99 is ready”. You could do it by concatenating strings with +:
1# The ugly and error-prone way
2message = "Hello " + name + ", your order of $" + str(price) + " is ready"But this is tedious and easy to mess up. Python has a much more elegant way: f-strings (format strings). Simply put an f before the quotes and put variables inside curly braces:
1name = "Ana"
2age = 25
3price = 19.99
4
5# F-strings (recommended, Python 3.6+)
6print(f"Hello {name}, you are {age} years old")
7print(f"Price: ${price:.2f}") # 2 decimals
8
9# Alignment in f-strings
10print(f"{name:<10}") # "Ana " (left)
11print(f"{name:>10}") # " Ana" (right)
12print(f"{name:^10}") # " Ana " (center)
13
14# Numbers
15number = 1234567
16print(f"{number:,}") # 1,234,567
17print(f"{number:_}") # 1_234_567
18
19# Binary, hexadecimal
20x = 255
21print(f"{x:b}") # 11111111 (binary)
22print(f"{x:x}") # ff (hexadecimal)
23print(f"{x:#x}") # 0xff (with prefix)Regular Expressions
Regular expressions (regex) have a reputation for being cryptic and incomprehensible. But in reality, they’re simply advanced search patterns. Think of them as a “metal detector” for text: you describe what shape you’re looking for, and it finds all the matches.
When do you need them? When basic string methods fall short:
- Validate that an email has the format
[email protected] - Extract all prices from a text (numbers followed by $)
- Find dates in any format (12/06/2024, 2024-12-06, December 6th…)
- Replace complex patterns, not just literal text
1import reMain Functions
The re module gives you several tools depending on what you need to do:
Does this pattern exist? Use search to look anywhere in the text, or match to look only at the beginning.
1import re
2
3text = "My email is [email protected]"
4
5# search: looks anywhere
6result = re.search(r'\w+@\w+\.\w+', text)
7if result:
8 print(result.group()) # [email protected]
9
10# match: only at the beginning
11text2 = "123-456-789"
12if re.match(r'\d{3}', text2):
13 print("Starts with 3 digits")How many matches are there? Use findall to get a list with all matches, or finditer if you need more information (position, etc.).
1import re
2
3text = "Prices: $10, $25.50, $100"
4
5# findall: list with all matches
6prices = re.findall(r'\d+\.?\d*', text)
7print(prices) # ['10', '25.50', '100']
8
9# finditer: iterator of Match objects
10for match in re.finditer(r'\d+\.?\d*', text):
11 print(f"Found: {match.group()} at position {match.start()}")Want to replace or split? Use sub to search and replace with patterns, or split to divide by complex patterns.
1import re
2
3# sub: replace with pattern
4text = "Today is 12-06-2024"
5new = re.sub(r'(\d{2})-(\d{2})-(\d{4})', r'\3/\2/\1', text)
6print(new) # Today is 2024/06/12
7
8# Replacement with function
9def double(match):
10 return str(int(match.group()) * 2)
11
12numbers = "The numbers: 5, 10, 15"
13print(re.sub(r'\d+', double, numbers))
14# The numbers: 10, 20, 30
15
16# split with regex
17text = "one;two,three:four"
18parts = re.split(r'[;,:]', text)
19print(parts) # ['one', 'two', 'three', 'four']Regular Expression Syntax
You don’t need to memorize all of this. Use it as reference when building your patterns:
| Pattern | In plain English | Example |
|---|---|---|
| . | “any character” | a.c finds “abc”, “a1c”, “a-c” |
| \d | “a digit” (0-9) | \d{3} finds “123” |
| \w | “a letter, number or _” | \w+ finds “hello_123” |
| \s | “a space” (space, tab, newline) | \s+ finds " " |
| ^ | “starts with…” | ^Hello only if text starts with “Hello” |
| $ | “ends with…” | bye$ only if ends with “bye” |
| * | “zero or more times” | ab* finds “a”, “ab”, “abbb” |
| + | “one or more times” | ab+ finds “ab”, “abbb” (not “a” alone) |
| ? | “optional” (0 or 1 time) | colou?r finds “color” and “colour” |
| {n} | “exactly n times” | \d{4} finds years like “2024” |
| {n,m} | “between n and m times” | \d{2,4} finds “12”, “123”, “1234” |
| [abc] | “any of these” | [aeiou] finds any vowel |
| [^abc] | “any except these” | [^0-9] finds everything except digits |
| (…) | “group and capture” | (\w+)@(\w+) captures user and domain |
| | | “this or that” | cat|dog finds “cat” or “dog” |
Use r'...' (raw strings) for regular expressions. Avoids having to escape backslashes. That is, for example, you can use r'\d+' instead of '\\d+'.
Practical Example: Validate Email
1import re
2
3def validate_email(email):
4 pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
5 return bool(re.match(pattern, email))
6
7# Tests
8print(validate_email("[email protected]")) # True
9print(validate_email("invalid@")) # False
10print(validate_email("[email protected]")) # TrueDates and Times
How many days until your birthday? How long ago was that event? What day of the week will August 15th be? Dates are everywhere, and working with them can be surprisingly complicated.
The problem is that dates are deceptive. It seems simple: year, month, day. But then time zones appear, leap years, months with different numbers of days, differences between formats (is 06/12 June 12th or December 6th?)…
Python has the datetime module to handle all this complexity for you.
1from datetime import datetime, date, time, timedeltaCreating Dates and Times
You can get the current date/time, or create a specific date. It’s like looking at the calendar or pointing to a specific day.
1from datetime import datetime, date, time
2
3# Current date
4today = date.today()
5print(today) # 2024-12-06
6
7# Current date and time
8now = datetime.now()
9print(now) # 2024-12-06 17:30:45.123456
10
11# Create specific date
12birthday = date(1995, 8, 15)
13print(birthday) # 1995-08-15
14
15# Create specific datetime
16event = datetime(2024, 12, 31, 23, 59, 59)
17print(event) # 2024-12-31 23:59:59
18
19# Time only
20hour = time(14, 30, 0)
21print(hour) # 14:30:00Accessing Components
1from datetime import datetime
2
3now = datetime.now()
4
5print(now.year) # 2024
6print(now.month) # 12
7print(now.day) # 6
8print(now.hour) # 17
9print(now.minute) # 30
10print(now.second) # 45
11print(now.weekday()) # 4 (Friday, 0=Monday)Formatting Dates
Here comes one of the classic headaches: date formats. In some countries they write “06/12/2024” (day/month/year), but in the US it would be “12/06/2024” (month/day/year). If your application has international users, you need to control how dates are displayed.
Python uses format codes (like %d for day, %m for month) to convert between dates and text:
graph LR
A["'12/25/2024'<br/>(text)"] -->|strptime| B["datetime<br/>(object)"]
B -->|strftime| C["'December 25, 2024'<br/>(text)"] 1from datetime import datetime
2
3now = datetime.now()
4
5# strftime: from datetime to string (format → text)
6print(now.strftime("%d/%m/%Y")) # 06/12/2024
7print(now.strftime("%H:%M:%S")) # 17:30:45
8print(now.strftime("%A, %d of %B")) # Friday, 06 of December
9
10# strptime: from string to datetime
11text = "2024-12-25 10:30:00"
12christmas = datetime.strptime(text, "%Y-%m-%d %H:%M:%S")
13print(christmas) # 2024-12-25 10:30:00| Code | Meaning | Example |
|---|---|---|
| %Y | Year (4 digits) | 2024 |
| %m | Month (01-12) | 12 |
| %d | Day (01-31) | 06 |
| %H | Hour (00-23) | 17 |
| %M | Minutes (00-59) | 30 |
| %S | Seconds (00-59) | 45 |
| %A | Day of the week | Friday |
| %B | Month name | December |
Date Operations
The most powerful thing about datetime is that you can add and subtract time. What date will it be in 30 days? How many days have passed since your last birthday? For this you use timedelta, which represents a time interval.
1from datetime import datetime, timedelta
2
3now = datetime.now()
4
5# Add/subtract time
6tomorrow = now + timedelta(days=1)
7a_week_ago = now - timedelta(weeks=1)
8in_2_hours = now + timedelta(hours=2)
9
10print(f"Tomorrow: {tomorrow}")
11print(f"A week ago: {a_week_ago}")
12
13# Difference between dates
14date1 = datetime(2024, 1, 1)
15date2 = datetime(2024, 12, 31)
16difference = date2 - date1
17
18print(f"Days in 2024: {difference.days}") # 365
19print(f"Total seconds: {difference.total_seconds()}")Comparing Dates
Dates can be compared directly with <, >, ==, etc. Very useful for checking if something already happened, if we’re within a period, or which of two dates is more recent.
1from datetime import date
2
3date1 = date(2024, 6, 15)
4date2 = date(2024, 12, 25)
5
6print(date1 < date2) # True
7print(date1 == date2) # False
8
9# Check if a date has passed
10today = date.today()
11if date1 < today:
12 print("The date has passed")When to Use Each Tool?
After seeing so many options, here’s a quick guide to choose:
| You need to… | Use… | Example |
|---|---|---|
| Search/replace literal text | String methods (find, replace) | Change “Python” to “Java” |
| Search for complex patterns | Regular expressions (re) | Find all emails |
| Validate format | Regex with match | Is it a valid phone number? |
| Clean user text | String methods (strip, lower) | Remove spaces, normalize |
| Current date/time | datetime.now() | Timestamp for a log |
| Specific date | datetime(year, month, day) | Birth date |
| Add/subtract time | timedelta | “In 30 days” |
| Difference between dates | Subtract datetimes | “X days have passed” |
| Convert text to date | strptime | “12/25/2024” → datetime |
| Convert date to text | strftime | datetime → “December 25” |
Start with string methods for simple operations. Only use regex when you need complex patterns. For dates, always use datetime instead of manipulating strings manually.
Practical Exercises
Write a function that extracts all phone numbers from a text. Expected format: 3 digits - 3 digits - 3 digits (e.g.: 612-345-678)
Create a function that calculates the exact age (years, months, days) given a birth date.
Create a function that receives messy text and normalizes it:
- Remove multiple spaces
- Capitalize sentences
- Remove special characters except basic punctuation