Strings and Dates

Module Objectives

By completing this module you will be able to:

  • Manipulate strings in advanced ways
  • Use regular expressions to search and transform text
  • Work with dates and times using the datetime module
  • Perform operations with time intervals

Why Master Text and Dates?

It’s estimated that more than 80% of the world’s data is text: names, addresses, emails, messages, social media posts, documents… As a programmer, you’re going to spend a lot of time working with text.

Think about the everyday tasks of any application:

  • Validate that an email has the correct format before sending it
  • Clean the data a user types in a form (extra spaces, random capitalization…)
  • Extract information from messy text (prices from a webpage, dates from a document)
  • Format dates so they display in the user’s language

This chapter will give you the tools to do all of this elegantly and efficiently.


String Manipulation

Imagine you receive data from a web form. The user typed " JOHN SMITH " (with extra spaces and all caps). You need to save it as “John Smith”. Strings in Python come with an arsenal of methods for exactly these situations.

Search Methods

It’s like using Ctrl+F in your text editor, but with superpowers. You can search for where a word is, if the text starts or ends a certain way, how many times something appears…

 1text = "Python is a very popular programming language"
 2
 3# Find substring
 4print(text.find("programming"))    # 24 (position where it starts)
 5print(text.find("Java"))           # -1 (not found)
 6
 7# Check start/end
 8print(text.startswith("Python"))   # True
 9print(text.endswith("language"))   # True
10
11# Count occurrences
12print(text.count("a"))             # 4

Transformation Methods

This is where you clean and transform text. Going back to the form example: you need to remove spaces, convert to uppercase/lowercase, replace characters… Python has a method for each situation.

1text = "hello world"
2
3print(text.upper())      # HELLO WORLD
4print(text.lower())      # hello world
5print(text.capitalize()) # Hello world
6print(text.title())      # Hello World
7print(text.swapcase())   # HELLO WORLD -> hello world
 1text = "   spaces   "
 2
 3print(text.strip())      # "spaces" (removes both sides)
 4print(text.lstrip())     # "spaces   " (left only)
 5print(text.rstrip())     # "   spaces" (right only)
 6
 7# Padding
 8number = "42"
 9print(number.zfill(5))    # "00042"
10print(number.center(10))  # "    42    "
11print(number.ljust(10))   # "42        "
12print(number.rjust(10))   # "        42"
1text = "I love Python. Python is great."
2
3# Replace all occurrences
4new = text.replace("Python", "Java")
5print(new)  # I love Java. Java is great.
6
7# Replace only N occurrences
8new = text.replace("Python", "Java", 1)
9print(new)  # I love Java. Python is great.
 1# Split a string
 2text = "one,two,three,four"
 3parts = text.split(",")
 4print(parts)  # ['one', 'two', 'three', 'four']
 5
 6# Split with limit
 7parts = text.split(",", 2)
 8print(parts)  # ['one', 'two', 'three,four']
 9
10# Split lines
11multiline = "line1\nline2\nline3"
12lines = multiline.splitlines()
13print(lines)  # ['line1', 'line2', 'line3']
14
15# Join a list into string
16words = ['Python', 'is', 'great']
17sentence = ' '.join(words)
18print(sentence)  # Python is great
19
20path = '/'.join(['home', 'user', 'documents'])
21print(path)  # home/user/documents

String Formatting

Imagine you want to display a message like “Hello Ana, your order of $19.99 is ready”. You could do it by concatenating strings with +:

1# The ugly and error-prone way
2message = "Hello " + name + ", your order of $" + str(price) + " is ready"

But this is tedious and easy to mess up. Python has a much more elegant way: f-strings (format strings). Simply put an f before the quotes and put variables inside curly braces:

 1name = "Ana"
 2age = 25
 3price = 19.99
 4
 5# F-strings (recommended, Python 3.6+)
 6print(f"Hello {name}, you are {age} years old")
 7print(f"Price: ${price:.2f}")  # 2 decimals
 8
 9# Alignment in f-strings
10print(f"{name:<10}")  # "Ana       " (left)
11print(f"{name:>10}")  # "       Ana" (right)
12print(f"{name:^10}")  # "   Ana    " (center)
13
14# Numbers
15number = 1234567
16print(f"{number:,}")      # 1,234,567
17print(f"{number:_}")      # 1_234_567
18
19# Binary, hexadecimal
20x = 255
21print(f"{x:b}")   # 11111111 (binary)
22print(f"{x:x}")   # ff (hexadecimal)
23print(f"{x:#x}")  # 0xff (with prefix)

Regular Expressions

Regular expressions (regex) have a reputation for being cryptic and incomprehensible. But in reality, they’re simply advanced search patterns. Think of them as a “metal detector” for text: you describe what shape you’re looking for, and it finds all the matches.

When do you need them? When basic string methods fall short:

  • Validate that an email has the format [email protected]
  • Extract all prices from a text (numbers followed by $)
  • Find dates in any format (12/06/2024, 2024-12-06, December 6th…)
  • Replace complex patterns, not just literal text
1import re

Main Functions

The re module gives you several tools depending on what you need to do:

Does this pattern exist? Use search to look anywhere in the text, or match to look only at the beginning.

 1import re
 2
 3text = "My email is [email protected]"
 4
 5# search: looks anywhere
 6result = re.search(r'\w+@\w+\.\w+', text)
 7if result:
 8    print(result.group())  # [email protected]
 9
10# match: only at the beginning
11text2 = "123-456-789"
12if re.match(r'\d{3}', text2):
13    print("Starts with 3 digits")

How many matches are there? Use findall to get a list with all matches, or finditer if you need more information (position, etc.).

 1import re
 2
 3text = "Prices: $10, $25.50, $100"
 4
 5# findall: list with all matches
 6prices = re.findall(r'\d+\.?\d*', text)
 7print(prices)  # ['10', '25.50', '100']
 8
 9# finditer: iterator of Match objects
10for match in re.finditer(r'\d+\.?\d*', text):
11    print(f"Found: {match.group()} at position {match.start()}")

Want to replace or split? Use sub to search and replace with patterns, or split to divide by complex patterns.

 1import re
 2
 3# sub: replace with pattern
 4text = "Today is 12-06-2024"
 5new = re.sub(r'(\d{2})-(\d{2})-(\d{4})', r'\3/\2/\1', text)
 6print(new)  # Today is 2024/06/12
 7
 8# Replacement with function
 9def double(match):
10    return str(int(match.group()) * 2)
11
12numbers = "The numbers: 5, 10, 15"
13print(re.sub(r'\d+', double, numbers))
14# The numbers: 10, 20, 30
15
16# split with regex
17text = "one;two,three:four"
18parts = re.split(r'[;,:]', text)
19print(parts)  # ['one', 'two', 'three', 'four']

Regular Expression Syntax

You don’t need to memorize all of this. Use it as reference when building your patterns:

Pattern In plain English Example
. “any character” a.c finds “abc”, “a1c”, “a-c”
\d “a digit” (0-9) \d{3} finds “123”
\w “a letter, number or _” \w+ finds “hello_123”
\s “a space” (space, tab, newline) \s+ finds " "
^ “starts with…” ^Hello only if text starts with “Hello”
$ “ends with…” bye$ only if ends with “bye”
* “zero or more times” ab* finds “a”, “ab”, “abbb”
+ “one or more times” ab+ finds “ab”, “abbb” (not “a” alone)
? “optional” (0 or 1 time) colou?r finds “color” and “colour”
{n} “exactly n times” \d{4} finds years like “2024”
{n,m} “between n and m times” \d{2,4} finds “12”, “123”, “1234”
[abc] “any of these” [aeiou] finds any vowel
[^abc] “any except these” [^0-9] finds everything except digits
(…) “group and capture” (\w+)@(\w+) captures user and domain
| “this or that” cat|dog finds “cat” or “dog”
Raw strings

Use r'...' (raw strings) for regular expressions. Avoids having to escape backslashes. That is, for example, you can use r'\d+' instead of '\\d+'.

Practical Example: Validate Email

 1import re
 2
 3def validate_email(email):
 4    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
 5    return bool(re.match(pattern, email))
 6
 7# Tests
 8print(validate_email("[email protected]"))  # True
 9print(validate_email("invalid@"))          # False
10print(validate_email("[email protected]"))  # True

Dates and Times

How many days until your birthday? How long ago was that event? What day of the week will August 15th be? Dates are everywhere, and working with them can be surprisingly complicated.

The problem is that dates are deceptive. It seems simple: year, month, day. But then time zones appear, leap years, months with different numbers of days, differences between formats (is 06/12 June 12th or December 6th?)…

Python has the datetime module to handle all this complexity for you.

1from datetime import datetime, date, time, timedelta

Creating Dates and Times

You can get the current date/time, or create a specific date. It’s like looking at the calendar or pointing to a specific day.

 1from datetime import datetime, date, time
 2
 3# Current date
 4today = date.today()
 5print(today)  # 2024-12-06
 6
 7# Current date and time
 8now = datetime.now()
 9print(now)  # 2024-12-06 17:30:45.123456
10
11# Create specific date
12birthday = date(1995, 8, 15)
13print(birthday)  # 1995-08-15
14
15# Create specific datetime
16event = datetime(2024, 12, 31, 23, 59, 59)
17print(event)  # 2024-12-31 23:59:59
18
19# Time only
20hour = time(14, 30, 0)
21print(hour)  # 14:30:00

Accessing Components

 1from datetime import datetime
 2
 3now = datetime.now()
 4
 5print(now.year)       # 2024
 6print(now.month)      # 12
 7print(now.day)        # 6
 8print(now.hour)       # 17
 9print(now.minute)     # 30
10print(now.second)     # 45
11print(now.weekday())  # 4 (Friday, 0=Monday)

Formatting Dates

Here comes one of the classic headaches: date formats. In some countries they write “06/12/2024” (day/month/year), but in the US it would be “12/06/2024” (month/day/year). If your application has international users, you need to control how dates are displayed.

Python uses format codes (like %d for day, %m for month) to convert between dates and text:

graph LR
    A["'12/25/2024'<br/>(text)"] -->|strptime| B["datetime<br/>(object)"]
    B -->|strftime| C["'December 25, 2024'<br/>(text)"]
 1from datetime import datetime
 2
 3now = datetime.now()
 4
 5# strftime: from datetime to string (format → text)
 6print(now.strftime("%d/%m/%Y"))        # 06/12/2024
 7print(now.strftime("%H:%M:%S"))        # 17:30:45
 8print(now.strftime("%A, %d of %B"))    # Friday, 06 of December
 9
10# strptime: from string to datetime
11text = "2024-12-25 10:30:00"
12christmas = datetime.strptime(text, "%Y-%m-%d %H:%M:%S")
13print(christmas)  # 2024-12-25 10:30:00
Code Meaning Example
%Y Year (4 digits) 2024
%m Month (01-12) 12
%d Day (01-31) 06
%H Hour (00-23) 17
%M Minutes (00-59) 30
%S Seconds (00-59) 45
%A Day of the week Friday
%B Month name December

Date Operations

The most powerful thing about datetime is that you can add and subtract time. What date will it be in 30 days? How many days have passed since your last birthday? For this you use timedelta, which represents a time interval.

 1from datetime import datetime, timedelta
 2
 3now = datetime.now()
 4
 5# Add/subtract time
 6tomorrow = now + timedelta(days=1)
 7a_week_ago = now - timedelta(weeks=1)
 8in_2_hours = now + timedelta(hours=2)
 9
10print(f"Tomorrow: {tomorrow}")
11print(f"A week ago: {a_week_ago}")
12
13# Difference between dates
14date1 = datetime(2024, 1, 1)
15date2 = datetime(2024, 12, 31)
16difference = date2 - date1
17
18print(f"Days in 2024: {difference.days}")  # 365
19print(f"Total seconds: {difference.total_seconds()}")

Comparing Dates

Dates can be compared directly with <, >, ==, etc. Very useful for checking if something already happened, if we’re within a period, or which of two dates is more recent.

 1from datetime import date
 2
 3date1 = date(2024, 6, 15)
 4date2 = date(2024, 12, 25)
 5
 6print(date1 < date2)   # True
 7print(date1 == date2)  # False
 8
 9# Check if a date has passed
10today = date.today()
11if date1 < today:
12    print("The date has passed")

When to Use Each Tool?

After seeing so many options, here’s a quick guide to choose:

You need to… Use… Example
Search/replace literal text String methods (find, replace) Change “Python” to “Java”
Search for complex patterns Regular expressions (re) Find all emails
Validate format Regex with match Is it a valid phone number?
Clean user text String methods (strip, lower) Remove spaces, normalize
Current date/time datetime.now() Timestamp for a log
Specific date datetime(year, month, day) Birth date
Add/subtract time timedelta “In 30 days”
Difference between dates Subtract datetimes “X days have passed”
Convert text to date strptime “12/25/2024” → datetime
Convert date to text strftime datetime → “December 25”
General Rule

Start with string methods for simple operations. Only use regex when you need complex patterns. For dates, always use datetime instead of manipulating strings manually.


Practical Exercises

Exercise 1: Extract Information from Text

Write a function that extracts all phone numbers from a text. Expected format: 3 digits - 3 digits - 3 digits (e.g.: 612-345-678)

 1import re
 2
 3def extract_phones(text):
 4    pattern = r'\d{3}-\d{3}-\d{3}'
 5    return re.findall(pattern, text)
 6
 7text = """
 8Contacts:
 9- John: 612-345-678
10- Mary: 698-123-456
11- Office: 91-123-4567 (invalid)
12- Peter: 666-777-888
13"""
14
15phones = extract_phones(text)
16print(phones)  # ['612-345-678', '698-123-456', '666-777-888']
Exercise 2: Age Calculator

Create a function that calculates the exact age (years, months, days) given a birth date.

 1from datetime import date
 2
 3def calculate_age(birth_date):
 4    today = date.today()
 5
 6    years = today.year - birth_date.year
 7    months = today.month - birth_date.month
 8    days = today.day - birth_date.day
 9
10    # Adjust if day hasn't come this month
11    if days < 0:
12        months -= 1
13        # Days in previous month
14        prev_month = today.month - 1 if today.month > 1 else 12
15        days_in_month = 30 if prev_month in [4,6,9,11] else 31
16        if prev_month == 2:
17            days_in_month = 28
18        days += days_in_month
19
20    # Adjust if month hasn't come this year
21    if months < 0:
22        years -= 1
23        months += 12
24
25    return years, months, days
26
27# Example
28birthday = date(1995, 8, 15)
29years, months, days = calculate_age(birthday)
30print(f"Age: {years} years, {months} months and {days} days")
Exercise 3: Clean and Normalize Text

Create a function that receives messy text and normalizes it:

  • Remove multiple spaces
  • Capitalize sentences
  • Remove special characters except basic punctuation
 1import re
 2
 3def normalize_text(text):
 4    # Remove special characters (keeping letters, numbers, spaces and punctuation)
 5    text = re.sub(r'[^\w\s.,!?]', '', text)
 6
 7    # Reduce multiple spaces to one
 8    text = re.sub(r'\s+', ' ', text)
 9
10    # Remove spaces at start and end
11    text = text.strip()
12
13    # Capitalize after . ! ?
14    def capitalize_match(match):
15        return match.group(1) + match.group(2).upper()
16
17    text = re.sub(r'([.!?]\s*)(\w)', capitalize_match, text)
18
19    # Capitalize the start
20    if text:
21        text = text[0].upper() + text[1:]
22
23    return text
24
25# Test
26messy_text = "   hello!!!   this    is   A messy** text.  needs   CLEANING   "
27print(normalize_text(messy_text))
28# "Hello! This is A messy text. Needs CLEANING"

Quiz

🎮 Quiz: Strings and Dates

0 / 0
Loading questions...

Previous: Data Structures Next: Files