第一章:列表与字典
1.1 列表的工程级应用
1.1.1 动态数组实现机制
Python列表底层采用动态数组结构,初始分配8个元素空间,当空间不足时按0,4,8,16,25,35...的公式扩容,每次扩容增加约12.5%的容量
通过sys模块可验证扩容过程:
import sys
lst = []
prev_size = 0
for i in range(100):
current_size = sys.getsizeof(lst)
if current_size != prev_size:
print(f"元素数量:{len(lst):<3} 内存占用:{current_size}字节")
prev_size = current_size
lst.append(None)
1.1.2 多维数据处理
嵌套列表实现矩阵运算:
matrix = [
[1,2,3],
[4,5,6],
[7,8,9]
]
# 矩阵转置
transposed = [[row[i] for row in matrix] for i in range(3)]
print(transposed) # [[1,4,7], [2,5,8], [3,6,9]]
# 使用numpy优化
import numpy as np
np_matrix = np.array(matrix)
print(np_matrix.T)
1.1.3 生产级列表操作
# 深拷贝避免引用问题
import copy
original = [[1,2], [3,4]]
copied = copy.deepcopy(original)
# 内存视图优化
data = bytearray(1000)
view = memoryview(data)
partial_view = view[10:20]
partial_view[0] = 0xFF
1.2 字典的架构设计
1.2.1 哈希表实现原理
字典采用开放寻址法解决哈希冲突,当装载因子超过2/3时自动扩容。哈希表结构包含以下字段
struct PyDictKeyEntry {
Py_hash_t me_hash;
PyObject *me_key;
PyObject *me_value;
};
struct PyDictObject {
PyObject_HEAD
Py_ssize_t ma_used;
Py_ssize_t ma_version_tag;
PyDictKeysObject *ma_keys;
PyObject **ma_values;
};
1.2.2 高级字典模式
# 默认值处理
from collections import defaultdict
word_count = defaultdict(int)
for word in document:
word_count[word] += 1
# 有序字典
from collections import OrderedDict
od = OrderedDict()
od['z'] = 1
od['a'] = 2
print(list(od.keys())) # ['z', 'a']
1.2.3 字典视图对象
inventory = {'apple': 100, 'orange': 200}
keys_view = inventory.keys()
values_view = inventory.values()
items_view = inventory.items()
# 动态更新特性
inventory['banana'] = 50
print(list(keys_view)) # ['apple', 'orange', 'banana']
1.3 推导式工程实践
1.3.1 多层嵌套推导式
# 生成式构建三维坐标
coordinates = [(x, y, z)
for x in range(5)
for y in range(5)
for z in range(5)
if x + y > z]
# 字典推导式构建索引
documents = ["hello world", "python programming", "data science"]
index = {word: [i for i, doc in enumerate(documents) if word in doc]
for doc in documents
for word in set(doc.split())}
1.3.2 生成器表达式
# 大文件处理
def process_large_file(file_path):
with open(file_path) as f:
return (process_line(line) for line in f if line.strip())
# 内存优化
sum_of_squares = sum(x**2 for x in range(1000000))
第二章:元组与集合
2.1 元组的系统级应用
2.1.1 内存优化分析
元组在CPython中的内存结构比列表少16字节(存储指针数组),比较相同元素列表:
import sys
lst = [1,2,3]
tup = (1,2,3)
print(sys.getsizeof(lst)) # 88
print(sys.getsizeof(tup)) # 72
2.1.2 不可变数据结构
# 防御性编程
CONFIG = (
('host', '127.0.0.1'),
('port', 3306),
('charset', 'utf8mb4')
)
# 作为字典键
cache = {}
params = (('page',1), ('size',20))
cache[params] = fetch_data(params)
2.1.3 具名元组进阶
from typing import NamedTuple
class Employee(NamedTuple):
id: int
name: str
department: str = 'IT'
emp = Employee(1001, 'Alice')
print(emp._asdict()) # {'id': 1001, 'name': 'Alice', 'department': 'IT'}
2.2 集合的算法实现
2.2.1 哈希集合原理
集合底层采用与字典相同的哈希表实现,但不存储值对象。装载因子阈值0.7,扩容策略与字典不同
# 哈希冲突演示
class BadHash:
def __hash__(self):
return 1
a, b = BadHash(), BadHash()
s = {a, b} # 成功存储两个对象
2.2.2 高性能集合运算
# 千万级数据处理
set1 = set(range(10_000_000))
set2 = set(range(5_000_000, 15_000_000))
# 并行计算优化
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor() as executor:
future1 = executor.submit(lambda: set1 & set2)
future2 = executor.submit(lambda: set1 | set2)
intersection = future1.result()
union = future2.result()
2.2.3 冻结集合应用
# 不可变集合
from types import MappingProxyType
d = {'a': 1}
d_proxy = MappingProxyType(d)
# d_proxy['b'] = 2 # TypeError
# 公式验证
formula = frozenset(['x', 'y'])
variables = {'x': 10, 'y': 20}
eval('x+y', {}, variables)
2.3 数据结构综合案例
2.3.1 社交网络分析
# 用户关系图谱
users = {
1: {'name': 'Alice', 'friends': {2,3}},
2: {'name': 'Bob', 'friends': {1,4}},
3: {'name': 'Charlie', 'friends': {1,5}},
4: {'name': 'David', 'friends': {2}},
5: {'name': 'Eve', 'friends': {3}}
}
# 共同好友分析
def mutual_friends(user_a, user_b):
return users[user_a]['friends'] & users[user_b]['friends']
# 推荐系统
def recommend_friends(user_id):
friends = users[user_id]['friends']
return {f for friend in friends
for f in users[friend]['friends']} - friends - {user_id}
2.3.2 实时交易监控
from collections import deque
class FraudDetection:
def __init__(self, window_size=60):
self.transactions = deque(maxlen=window_size)
self.amount_set = set()
def process_transaction(self, tx):
if tx['amount'] in self.amount_set:
raise FraudAlert("Duplicate amount detected")
if len(self.transactions) >= self.transactions.maxlen:
expired = self.transactions.popleft()
self.amount_set.remove(expired['amount'])
self.transactions.append(tx)
self.amount_set.add(tx['amount'])