目录
一、字符串
1、字符串的定义
在Python中,字符串(Str)是一种数据类型,用于存储一系列字符(文本),字符串可以包含字母、数字、标点符号和特殊字符等。
在Python中,可以通过以下几种方式定义字符串:
1-1、使用单引号(')
s1 = 'Hello, my Python World!'
print(type(s1)) # 输出:<class 'str'>
1-2、使用双引号(")
s1 = "Hello, my Python World!"
print(type(s1)) # 输出:<class 'str'>
1-3、使用三引号('''或""")
# 使用三个单引号:'''
s1 = '''Hello, my Python World!'''
print(type(s1)) # 输出:<class 'str'>
# 使用三个双引号:"""
s1 = """Hello, my Python World!"""
print(type(s1)) # 输出:<class 'str'>
1-4、原始字符串(r'str'或R'str')
# 用r'str'
path1 = r'C:\Users\Username\Documents\file.txt'
print(type(path1)) # 输出:<class 'str'>
# 用r"str"
path2 = r"C:\Users\Username\Documents\file.txt"
print(type(path2)) # 输出:<class 'str'>
# 用r'''str'''
path3 = r'''C:\Users\Username\Documents\file.txt'''
print(type(path3)) # 输出:<class 'str'>
# 用r"""str"""
path4 = r"""C:\Users\Username\Documents\file.txt"""
print(type(path4)) # 输出:<class 'str'>
# 用R'str'
path5 = R'C:\Users\Username\Documents\file.txt'
print(type(path5)) # 输出:<class 'str'>
# 用R"str"
path6 = R"C:\Users\Username\Documents\file.txt"
print(type(path6)) # 输出:<class 'str'>
# 用R'''str'''
path7 = R'''C:\Users\Username\Documents\file.txt'''
print(type(path7)) # 输出:<class 'str'>
# 用R"""str"""
path8 = R"""C:\Users\Username\Documents\file.txt"""
print(type(path8)) # 输出:<class 'str'>
2、字符串的语法
Python中的字符串是由单引号(')、双引号(")或三引号(''' 或 """)括起来的字符或文本,字符串可以是ASCII字符、Unicode字符或两者都有。
3、获取字符串的属性和方法
用dir()函数获取str所有属性和方法的列表
print(dir(str))
# ['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__',
# '__getattribute__', '__getitem__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__',
# '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
# '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold',
# 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha',
# 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper',
# 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex',
# 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
4、获取字符串的帮助信息
用help()函数获取str的帮助信息
Help on class str in module builtins:
class str(object)
| str(object='') -> str
| str(bytes_or_buffer[, encoding[, errors]]) -> str
|
| Create a new string object from the given object. If encoding or
| errors is specified, then the object must expose a data buffer
| that will be decoded using the given encoding and error handler.
| Otherwise, returns the result of object.__str__() (if defined)
| or repr(object).
| encoding defaults to sys.getdefaultencoding().
| errors defaults to 'strict'.
|
| Methods defined here:
|
| __add__(self, value, /)
| Return self+value.
|
| __contains__(self, key, /)
| Return key in self.
|
| __eq__(self, value, /)
| Return self==value.
|
| __format__(self, format_spec, /)
| Return a formatted version of the string as described by format_spec.
|
| __ge__(self, value, /)
| Return self>=value.
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __getitem__(self, key, /)
| Return self[key].
|
| __getnewargs__(...)
|
| __gt__(self, value, /)
| Return self>value.
|
| __hash__(self, /)
| Return hash(self).
|
| __iter__(self, /)
| Implement iter(self).
|
| __le__(self, value, /)
| Return self<=value.
|
| __len__(self, /)
| Return len(self).
|
| __lt__(self, value, /)
| Return self<value.
|
| __mod__(self, value, /)
| Return self%value.
|
| __mul__(self, value, /)
| Return self*value.
|
| __ne__(self, value, /)
| Return self!=value.
|
| __repr__(self, /)
| Return repr(self).
|
| __rmod__(self, value, /)
| Return value%self.
|
| __rmul__(self, value, /)
| Return value*self.
|
| __sizeof__(self, /)
| Return the size of the string in memory, in bytes.
|
| __str__(self, /)
| Return str(self).
|
| capitalize(self, /)
| Return a capitalized version of the string.
|
| More specifically, make the first character have upper case and the rest lower
| case.
|
| casefold(self, /)
| Return a version of the string suitable for caseless comparisons.
|
| center(self, width, fillchar=' ', /)
| Return a centered string of length width.
|
| Padding is done using the specified fill character (default is a space).
|
| count(...)
| S.count(sub[, start[, end]]) -> int
|
| Return the number of non-overlapping occurrences of substring sub in
| string S[start:end]. Optional arguments start and end are
| interpreted as in slice notation.
|
| encode(self, /, encoding='utf-8', errors='strict')
| Encode the string using the codec registered for encoding.
|
| encoding
| The encoding in which to encode the string.
| errors
| The error handling scheme to use for encoding errors.
| The default is 'strict' meaning that encoding errors raise a
| UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
| 'xmlcharrefreplace' as well as any other name registered with
| codecs.register_error that can handle UnicodeEncodeErrors.
|
| endswith(...)
| S.endswith(suffix[, start[, end]]) -> bool
|
| Return True if S ends with the specified suffix, False otherwise.
| With optional start, test S beginning at that position.
| With optional end, stop comparing S at that position.
| suffix can also be a tuple of strings to try.
|
| expandtabs(self, /, tabsize=8)
| Return a copy where all tab characters are expanded using spaces.
|
| If tabsize is not given, a tab size of 8 characters is assumed.
|
| find(...)
| S.find(sub[, start[, end]]) -> int
|
| Return the lowest index in S where substring sub is found,
| such that sub is contained within S[start:end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Return -1 on failure.
|
| format(...)
| S.format(*args, **kwargs) -> str
|
| Return a formatted version of S, using substitutions from args and kwargs.
| The substitutions are identified by braces ('{' and '}').
|
| format_map(...)
| S.format_map(mapping) -> str
|
| Return a formatted version of S, using substitutions from mapping.
| The substitutions are identified by braces ('{' and '}').
|
| index(...)
| S.index(sub[, start[, end]]) -> int
|
| Return the lowest index in S where substring sub is found,
| such that sub is contained within S[start:end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Raises ValueError when the substring is not found.
|
| isalnum(self, /)
| Return True if the string is an alpha-numeric string, False otherwise.
|
| A string is alpha-numeric if all characters in the string are alpha-numeric and
| there is at least one character in the string.
|
| isalpha(self, /)
| Return True if the string is an alphabetic string, False otherwise.
|
| A string is alphabetic if all characters in the string are alphabetic and there
| is at least one character in the string.
|
| isascii(self, /)
| Return True if all characters in the string are ASCII, False otherwise.
|
| ASCII characters have code points in the range U+0000-U+007F.
| Empty string is ASCII too.
|
| isdecimal(self, /)
| Return True if the string is a decimal string, False otherwise.
|
| A string is a decimal string if all characters in the string are decimal and
| there is at least one character in the string.
|
| isdigit(self, /)
| Return True if the string is a digit string, False otherwise.
|
| A string is a digit string if all characters in the string are digits and there
| is at least one character in the string.
|
| isidentifier(self, /)
| Return True if the string is a valid Python identifier, False otherwise.
|
| Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
| such as "def" or "class".
|
| islower(self, /)
| Return True if the string is a lowercase string, False otherwise.
|
| A string is lowercase if all cased characters in the string are lowercase and
| there is at least one cased character in the string.
|
| isnumeric(self, /)
| Return True if the string is a numeric string, False otherwise.
|
| A string is numeric if all characters in the string are numeric and there is at
| least one character in the string.
|
| isprintable(self, /)
| Return True if the string is printable, False otherwise.
|
| A string is printable if all of its characters are considered printable in
| repr() or if it is empty.
|
| isspace(self, /)
| Return True if the string is a whitespace string, False otherwise.
|
| A string is whitespace if all characters in the string are whitespace and there
| is at least one character in the string.
|
| istitle(self, /)
| Return True if the string is a title-cased string, False otherwise.
|
| In a title-cased string, upper- and title-case characters may only
| follow uncased characters and lowercase characters only cased ones.
|
| isupper(self, /)
| Return True if the string is an uppercase string, False otherwise.
|
| A string is uppercase if all cased characters in the string are uppercase and
| there is at least one cased character in the string.
|
| join(self, iterable, /)
| Concatenate any number of strings.
|
| The string whose method is called is inserted in between each given string.
| The result is returned as a new string.
|
| Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'
|
| ljust(self, width, fillchar=' ', /)
| Return a left-justified string of length width.
|
| Padding is done using the specified fill character (default is a space).
|
| lower(self, /)
| Return a copy of the string converted to lowercase.
|
| lstrip(self, chars=None, /)
| Return a copy of the string with leading whitespace removed.
|
| If chars is given and not None, remove characters in chars instead.
|
| partition(self, sep, /)
| Partition the string into three parts using the given separator.
|
| This will search for the separator in the string. If the separator is found,
| returns a 3-tuple containing the part before the separator, the separator
| itself, and the part after it.
|
| If the separator is not found, returns a 3-tuple containing the original string
| and two empty strings.
|
| removeprefix(self, prefix, /)
| Return a str with the given prefix string removed if present.
|
| If the string starts with the prefix string, return string[len(prefix):].
| Otherwise, return a copy of the original string.
|
| removesuffix(self, suffix, /)
| Return a str with the given suffix string removed if present.
|
| If the string ends with the suffix string and that suffix is not empty,
| return string[:-len(suffix)]. Otherwise, return a copy of the original
| string.
|
| replace(self, old, new, count=-1, /)
| Return a copy with all occurrences of substring old replaced by new.
|
| count
| Maximum number of occurrences to replace.
| -1 (the default value) means replace all occurrences.
|
| If the optional argument count is given, only the first count occurrences are
| replaced.
|
| rfind(...)
| S.rfind(sub[, start[, end]]) -> int
|
| Return the highest index in S where substring sub is found,
| such that sub is contained within S[start:end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Return -1 on failure.
|
| rindex(...)
| S.rindex(sub[, start[, end]]) -> int
|
| Return the highest index in S where substring sub is found,
| such that sub is contained within S[start:end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Raises ValueError when the substring is not found.
|
| rjust(self, width, fillchar=' ', /)
| Return a right-justified string of length width.
|
| Padding is done using the specified fill character (default is a space).
|
| rpartition(self, sep, /)
| Partition the string into three parts using the given separator.
|
| This will search for the separator in the string, starting at the end. If
| the separator is found, returns a 3-tuple containing the part before the
| separator, the separator itself, and the part after it.
|
| If the separator is not found, returns a 3-tuple containing two empty strings
| and the original string.
|
| rsplit(self, /, sep=None, maxsplit=-1)
| Return a list of the substrings in the string, using sep as the separator string.
|
| sep
| The separator used to split the string.
|
| When set to None (the default value), will split on any whitespace
| character (including \n \r \t \f and spaces) and will discard
| empty strings from the result.
| maxsplit
| Maximum number of splits.
| -1 (the default value) means no limit.
|
| Splitting starts at the end of the string and works to the front.
|
| rstrip(self, chars=None, /)
| Return a copy of the string with trailing whitespace removed.
|
| If chars is given and not None, remove characters in chars instead.
|
| split(self, /, sep=None, maxsplit=-1)
| Return a list of the substrings in the string, using sep as the separator string.
|
| sep
| The separator used to split the string.
|
| When set to None (the default value), will split on any whitespace
| character (including \n \r \t \f and spaces) and will discard
| empty strings from the result.
| maxsplit
| Maximum number of splits.
| -1 (the default value) means no limit.
|
| Splitting starts at the front of the string and works to the end.
|
| Note, str.split() is mainly useful for data that has been intentionally
| delimited. With natural text that includes punctuation, consider using
| the regular expression module.
|
| splitlines(self, /, keepends=False)
| Return a list of the lines in the string, breaking at line boundaries.
|
| Line breaks are not included in the resulting list unless keepends is given and
| true.
|
| startswith(...)
| S.startswith(prefix[, start[, end]]) -> bool
|
| Return True if S starts with the specified prefix, False otherwise.
| With optional start, test S beginning at that position.
| With optional end, stop comparing S at that position.
| prefix can also be a tuple of strings to try.
|
| strip(self, chars=None, /)
| Return a copy of the string with leading and trailing whitespace removed.
|
| If chars is given and not None, remove characters in chars instead.
|
| swapcase(self, /)
| Convert uppercase characters to lowercase and lowercase characters to uppercase.
|
| title(self, /)
| Return a version of the string where each word is titlecased.
|
| More specifically, words start with uppercased characters and all remaining
| cased characters have lower case.
|
| translate(self, table, /)
| Replace each character in the string using the given translation table.
|
| table
| Translation table, which must be a mapping of Unicode ordinals to
| Unicode ordinals, strings, or None.
|
| The table must implement lookup/indexing via __getitem__, for instance a
| dictionary or list. If this operation raises LookupError, the character is
| left untouched. Characters mapped to None are deleted.
|
| upper(self, /)
| Return a copy of the string converted to uppercase.
|
| zfill(self, width, /)
| Pad a numeric string with zeros on the left, to fill a field of the given width.
|
| The string is never truncated.
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| maketrans(...)
| Return a translation table usable for str.translate().
|
| If there is only one argument, it must be a dictionary mapping Unicode
| ordinals (integers) or characters to Unicode ordinals, strings or None.
| Character keys will be then converted to ordinals.
| If there are two arguments, they must be strings of equal length, and
| in the resulting dictionary, each character in x will be mapped to the
| character at the same position in y. If there is a third argument, it
| must be a string, whose characters will be mapped to None in the result.
5、字符串的用法
5-1、capitalize()方法
# capitalize()方法:将字符串的第一个字符转换为大写,其余字符转换为小写,且不影响原始字符串
s = "hello, my Pyton World!"
s_capitalized = s.capitalize()
print(s_capitalized) # 输出: Hello, my pyton world!
print(s) # 输出: hello, my Pyton World!
5-2、casefold()方法
# casefold()方法:将字符串中的所有字符转换为小写,并进行额外的折叠映射,以考虑不同语言环境的“大小写不敏感”比较,这与lower()方法有些相似,
# 但casefold()提供了更彻底的折叠,使得字符串在比较时更加“平等”
# 该方法不影响原始字符串
s = "Hello, my Python World!"
s_casefolded = s.casefold()
print(s_casefolded) # 输出:hello, my python world!
print(s) # 输出: Hello, my Python World!
5-3、center()方法
# 1、方法:str.center
# 2、语法:str.center(width[, fillchar])
# 3、参数:
# 3-1、width(必需):整数,表示新字符串的总宽度
# 3-2、fillchar(可选):单个字符,用于填充新字符串的空白部分,默认为空格
# 4、功能:将字符串居中,并在其两侧填充指定的字符(默认为空格)
# 5、返回值:一个新的字符串
# 6、说明:
# 6-1、如果原始字符串的长度大于或等于width,则center()方法会返回原始字符串的副本
# 6-2、此方法不影响原始字符串
# 7、示例:
# 原始字符串
s = "myelsa"
# 使用center方法,总宽度为10,默认填充字符为空格
s_centered = s.center(10)
print(s_centered) # 输出:' myelsa '
# 使用center方法,总宽度为10,指定填充字符为'*'
s_centered_with_star = s.center(10, '*')
print(s_centered_with_star) # 输出: '**myelsa**'
# 注意原始字符串s并没有改变
print(s) # 输出: 'myelsa'
5-4、count()方法
# 1、方法:str.count
# 2、语法:str.count(sub[, start[, end]])
# 3、参数:
# 3-1、sub:要搜索的子字符串
# 3-2、start(可选):开始搜索的索引位置;默认为0,即字符串的开始
# 3-3、end(可选):结束搜索的索引位置(不包括该位置);默认为字符串的末尾
# 4、功能:用于计算子字符串在字符串中出现的次数
# 5、返回值:一个非负整数
# 6、说明:
# 6-1、如果子字符串在字符串中不存在,则返回0
# 6-2、当使用start和end参数时,count()方法只会计算在指定范围内子字符串出现的次数(遵守左闭右开原则)
# 7、示例:
# 原始字符串
s = "hello world, hello everyone"
# 计算子字符串 "hello" 出现的次数
count_hello = s.count("hello")
# 输出结果
print(count_hello) # 输出: 2
# 计算子字符串 "world" 出现的次数
count_world = s.count("world")
# 输出结果
print(count_world) # 输出: 1
# 计算子字符串 "python" 出现的次数(不存在)
count_python = s.count("python")
# 输出结果
print(count_python) # 输出: 0
# 使用start和end参数
count_hello_start_end = s.count("hello", 7, 20) # 只搜索索引7到20之间的部分(不包括20)
# 输出结果
print(count_hello_start_end) # 输出: 1,因为第二个 "hello" 在这个范围内
5-5、encode()方法
# 1、方法:str.encode
# 2、语法:str.encode(encoding='utf-8', errors='strict')
# 3、参数:
# 3-1、encoding:指定字符编码的名称,默认为'utf-8';Python支持多种字符编码,但最常用的是'utf-8',它能够表示任何Unicode字符
# 3-2、errors:指定如何处理编码错误。默认是'strict',表示如果无法编码某个字符,则抛出UnicodeEncodeError;
# 其他可能的值包括'ignore'(忽略无法编码的字符)、'replace'(用问号?替换无法编码的字符)和'xmlcharrefreplace'(使用XML字符引用替换无法编码的字符)
# 4、功能:用于将字符串转换为字节串(bytes)
# 5、返回值:一个字节串(bytes)
# 6、说明:
# 7、示例:
# 原始字符串
s = "Hello, World!"
# 编码为UTF-8字节串
b_utf8 = s.encode('utf-8')
# 打印字节串
print(b_utf8) # 输出: b'Hello, World!'
# 尝试使用不支持的编码
try:
b_latin1 = s.encode('latin1') # 如果字符串中只包含Latin-1字符,这将成功
print(b_latin1) # 输出:b'Hello, World!'
except UnicodeEncodeError as e:
print(f"编码错误: {e}")
# 使用'replace'处理编码错误
s_with_non_ascii = "Hello, 世界!"
b_with_replacement = s_with_non_ascii.encode('ascii', 'replace')
# 打印带有替换字符的字节串
print(b_with_replacement) # 输出: b'Hello, ??!'
5-6、endswith()方法
# 1、方法:str.endswith
# 2、语法:str.endswith(suffix[, start[, end]])
# 3、参数:
# 3-1、suffix(必须):表示要检查的后缀
# 3-2、start(可选):指定开始检查的起始位置(索引);默认为0,即字符串的开始
# 3-3、end(可选):指定结束检查的结束位置(索引);默认为字符串的长度,即字符串的末尾
# 4、功能:用于检查字符串是否以指定的后缀结束
# 5、返回值:一个布尔值
# 6、说明:如果字符串以指定的后缀结束,则返回 True,否则返回 False
# 7、示例:
# 原始字符串
s = "Hello, World!"
# 检查字符串是否以"World!"结尾
is_endswith_world = s.endswith("World!")
print(is_endswith_world) # 输出: True
# 检查字符串是否以"Hello"结尾(不是)
is_endswith_hello = s.endswith("Hello")
print(is_endswith_hello) # 输出: False
# 检查字符串从索引5开始到结束是否以"World!"结尾(是)
is_endswith_world_from_index5 = s.endswith("World!", 7) # 注意:索引7是"W"的位置,但不影响结果
print(is_endswith_world_from_index5) # 输出: True
# 检查字符串从索引0开始到索引10是否以"ello"结尾(不是)
is_endswith_ello = s.endswith("ello", 0, 11) # 注意:索引11超出了字符串长度,但不影响检查到索引10的部分
print(is_endswith_ello) # 输出: False