fluent-python

Chapter 01: The Python Data Model
Chapter 02: An Array of Sequences
Chapter 03: Dictionaries and Sets
Chapter 04: Text versus Bytes
Chapter 05: First-Class Functions

Chapter 01: The Python Data Model

python的最大特质就是consistency,当你在python上面工作了一段时间以后,你可以通过 "猜测"来完成学习新的feature
但是,如果你原来学习过其他的oo语言,那么当你开始使用python的时候会有些困惑,比如, 为什么使用len(collection)而不是collection.len()
而这种看起来很奇怪的东西(其实就是Python data model的一个例子),一旦理解了以后, 可以对我们理解Pythonic起到关键作用.
所谓的python data model,它描述了一套API,如果你的object也能符合这个API,那么你的object也可以和python其他的语言特性融合的很好
无论是web开发还是GUI开发,都会有很多的framework,当你使用一个framework的时候, 你其实是实现了很多的method,让framework来call
如果你把python看成是一个framework的话,那么data model其实规定了我们经常需要实现的函数的接口,比如后面要说的__getitem__()
Python解析器在遇到某些syntax的时候,就去调用某些接口,这些接口往往是以两个下划线开始和结尾的,比如在遇到obj[key]的时候,解析器就会去调用__getitem__.你想和python 其他object玩得好,你的的object如果是线性存储的,那么也要实现"这些特殊接口"
"这些特殊接口"能够覆盖的语言特性有:
- Iteration
- Collections
- Attribute access
- Function and method invocation
- Object creation and destruction
- String representation and formatting
- Managed context(with block)
"特殊接口"其实真正的名字叫做magic method,但是其真正的完整发音应该是"under-under-getitem-under-under" 显然这个名字太长了,所以业界比较喜欢用的称呼是"dunder-getitem"(double under get item)的叫法

A Pythonic Card Deck

下面是一个非常简单的例子,我们可以用来证明实现了special method(__getitem__, __len__)的威力

import collections

Card = collections.namedtuple('Card', ['rank', 'suit'])


class FrenchDeck:
    ranks = [str(n) for n in range(2, 11)] + list('JQKA')
    suits = 'spades diamonds clubs hearts'.split()

    def __init__(self):
        self._cards = [
            Card(rank, suit) for suit in self.suits for rank in self.ranks
        ]

    def __len__(self):
        return len(self._cards)

    def __getitem__(self, position):
        return self._cards[position]


def main():
    deck = FrenchDeck()
    print(len(deck))
    print(deck[0])
    print(deck[-1])


if __name__ == '__main__':
    main()


# <===================OUTPUT===================>
# 52
# Card(rank='2', suit='spades')
# Card(rank='A', suit='hearts')

当然了,这个例子最主要的一个令初学者费解的地方就是collections.namedtuple啦, 这个是python的特性:namedtuple可以用来创建只有attribute,而没有custom method 的class,最常见的用法就是database record
```
import collections

Card = collections.namedtuple('Card', ['rank', 'suit'])

print(Card('7', 'diamonds'))

# <===================OUTPUT===================>
# Card(rank='7', suit='diamonds')
```
了解了namedtuple以后,其他的代码比较容易理解:我们通过实现不同的special method 来获得了不同的能力:
- 通过实现__getitem__来获得了[]的能力
- 通过实现__len__来获得了len()的能力
实现了special method不仅仅能够获得"相应"的能力(比如[]),还可以和其他的python library完美的配合,比如当我们需要"随机"取出一个card的时候,我们不需要自己再造一次轮子,我们只要使用python library的random.choice就可以了.记住,能够把random.choice 直接使用,是因为我们实现了special method!
```
In [20]: deck = FrenchDeck()

In [21]: from random import choice

In [22]: choice(deck)
Out[22]: Card(rank='4', suit='spades')

In [23]: choice(deck)
Out[23]: Card(rank='3', suit='hearts')

In [24]: choice(deck)
Out[24]: Card(rank='A', suit='spades')
```

不仅仅如此,获得了"相应"的能力(比如[])之后,这个能力的附加能力也会获得,比如我们通过__getitem__获得了[], 与此同时,我们还获得了slicing.下面就是两个例子:

前三个card

In [25]: deck[:3]
Out[25]:
[Card(rank='2', suit='spades'),
 Card(rank='3', suit='spades'),
 Card(rank='4', suit='spades')]

从第12个开始,每13个取一个deck显示

In [26]: deck[12::13]
Out[26]:
[Card(rank='A', suit='spades'),
 Card(rank='A', suit='diamonds'),
 Card(rank='A', suit='clubs'),
 Card(rank='A', suit='hearts')]

实现了__getitem__之后,deck还变得iterable

In [32]: for card in deck: print(card)
Card(rank='2', suit='spades')
Card(rank='3', suit='spades')
Card(rank='4', suit='spades')
Card(rank='5', suit='spades')
# ...

reverse iterable也拥有了

In [33]: for card in reversed(deck): print(card)
Card(rank='A', suit='hearts')
Card(rank='K', suit='hearts')
Card(rank='Q', suit='hearts')
# ...

实现了__getitem__的好处还有:可以自动识别in operator(虽然要遍历一遍,如果有 __contains__的话,就不用遍历了,会快很多)
```
In [34]: Card('Q', 'hearts') in deck
Out[34]: True

In [35]: Card('7', 'beasts') in deck
Out[35]: False
```
我们还可以实现排序,不过要借助一些函数TODO
虽然FrenchDeck是继承自object,但是它的功能(functionality)却不是通过继承获得的而是通过实现special method(类似于composition)来完成的.实现这些special method让我们自己写的函数和python的library看起来,用起来都很像

How Special Methods Are Used

首先,我们需要明确的是,special method是设计用来被python interpreter来调用的, 而不是程序员!换句话说,你不可以使用my_object.__len__()这种代码,而是应该使用 len(my_object),如果my_object是user defined的class的instance话python interpreter 会为你调用__len__
为什么会说"如果my_object是user defined的class的instance话"这句话呢?这是因为如果不是user defined的class,python可以采用一些优化的手段来避免调用函数,而直接返回field值,因为函数调用更"昂贵".
比如内置的类型,比如list, str,我们写了len(some_list)函数的时候,python interpreter 不会去直接调用__len__,而是直接返回ob_szie这个域
special method也不是每个都是像函数一样被interpreter调用,一个反例就是in operator, 在内部interpreter对于形如i in x的代码,其实是先转换成iter(x),然后再调用x.__iter__()
对于用户来说,唯一一个可能频繁调用的special method就是__init__
对于用户来说,也不要去创建类似__foo__的函数名,因为虽然现在看起来滑稽,但是保不齐以后python真的会实现这样一个函数

Emulating Numeric Types

前面讲了special method会提供类型in operator的功能,其实在python里面,连+ operator 都是使用special method完成的

下面我们就来实现一个二维vector类,能够满足二维加减法,比如vector(2,4) + vector(2,1) 得到vector(4,5)

from math import hypot


class Vector:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

    def __repr__(self):
        return 'Vector(%r, %r)' % (self.x, self.y)

    def __abs__(self):
        return hypot(self.x, self.y)

    def __bool__(self):
        return bool(abs(self))

    def __add__(self, other):
        x = self.x + other.x
        y = self.y + other.y
        return Vector(x, y)

    def __mul__(self, scalar):
        return Vector(self.x * scalar, self.y * scalar)


def main():
    v1 = Vector(2, 4)
    v2 = Vector(2, 1)

    print(v1 + v2)
    v = Vector(3, 4)
    print(abs(v))
    print(v * 3)
    print(abs(v * 3))


if __name__ == '__main__':
    main()


# <===================OUTPUT===================>
# Vector(4, 5)
# 5.0
# Vector(9, 12)
# 15.0

注意,我们的例子除了__init__以外,实现了四个special method,但是在main()函数里面,没有主动调用过一次

String Representation

__repr__这个special method的用法,比较基础,就是内置的repr()函数会调用它,这个 repr()函数是干啥的呢,这个函数是返回某个instance的名字给"解释器"看的,因为是给解释器看的,所以唯一的要求就是要"独一无二"让解释器不会认错人
其实你不实现__repr__也没事的,"解释器"会自己生成一个,让repr唯一的办法就是使用内存地址,所以不实现__repr__的结果就是会出现类似<Vector object ax 0x10e100070> 这种结果
我们上面对__repr__的实现使用了%r来表现我们的attribute,就是对repr使用的一种 "清醒认识",因为只有使用repr,才能区分出Vector(1, 2)和Vector('1', '2')
__str__也是存在的,会被str()调用,不同的是str()是给用户看的,而repr()是给机器看的.
如果你指向实现__str__和__repr__里面的一个,那么请实现后者.因为在找不到__str__ 的情况下,str()会去调用__repr__

Arithmetic Operators

前面我们实现了两个运算符的special method:
- __add__被+ operator调用
- __mul__被* operator调用
需要注意的是,我们的运算符special method都是返回了新的instance,而没有改动 "参与计算"的其他instance,比如self和other,这是和运算函数的预期相同的
还是需要注意的是,我们支持 vector * 3这种运算,但是不支持 3 * vector,这不符合乘法的交换律,这需要到13章的__rmul__来解决

Boolean Value of Custom Type

虽然Python自己有bool type,但是在需要boolean的情形中,它却允许所有的object,这点和java不一样,但是和c却是一样的.
如果我们想知道任何一个object是会被认为是true还是false,我们需要bool(x)
除非__bool__或者__len__被实现,否则,我们会认为user-defined的class是True:
- 如果__bool__实现了的话bool(x)返回__bool__的值
- 如果__len__实现了的话,__len__ 的值为0的话,bool(x)为False

我们的__bool__实现非常简单,判断绝对值是否为0

def __bool__(self):
       return bool(abs(self))

Overview of Special Methods

常见的special method一共有83种,其中

47种用来实现各种operator

category	Method names and related operators
Unary numeric operators	__neg__-,__pos__+,__abs__abs()
Rich comparison operators	__lt__>,__le__<=,__eq__==,__ne__!=,__gt__>,__ge__>=
Arithmetic operators	__add__+,__sub__-,__mul__*,__truediv__/,__floordiv__//,__mod__%,
	__divmod__divmod(),__pow__**,__round_round()
Reversed arithmetic operators	__radd__,__rsub__,__rmul__,__rtruediv__,__rfloordiv__,__rmod__,
	__rdivmod__,__rpow__
Augmented assignment arithmetic operators	__iadd__,__isub__,__imul__,__itruediv__,__ifloordiv__,__imod__,
	__ipow__
Bitwise operators	__invert__~,__lshift__,__and__&,__or__!,__xor__^
Reversed bitwise operators	__rlshift__, __rrshift__, __rand__, __rxor__, __ror__
Augmented assignment bitwise operators	__ilshift__, __irshift__, __iand__, __ixor__, __ior__

36种用来表示非operator

category	Method names(operators excluded)
String/bytes representation	__repr__, __str__, __format__, bytes__
Conversiont to number	__abs__, __bool__, __complex__,__int__,
	__float__,__hash__,__index__
Emulating collections	__len__,__getitem__,__setitem__,__delitem__,__contains__
Iteration	__iter__,__reversed__,__next__
Emulating callables	__call__
Context management	__enter__,__exit__
Instance creation and destruction	__new__,__init__,__del__
Attribute management	__getattr__,__getattribute__,__setattr__,__delattr__,__dir__
Attribute descriptors	__get__,__set__,__delete__
Class services	__prepare__,__instancecheck__,__subclasscheck__

Why Len Is Not a Method

len被实现成了special method,而不是作为一个普通的method,通过object.len来调用显然object.len的样式更加的"OO",但是python毕竟不是ruby,对于python来说实用性比纯洁性更重要
```
practicality beats purity
```
对python来说len(x)的设计更加的"实用",因为len()是被频繁调用的模块,python把它设计成special method以后,可以使用比method call更加经济的方式来调用它–直接读取struct的length field

Chapter 02: An Array of Sequences

Overview of Built-In Sequences

python std提供了一系列丰富的sequence types,它们都是使用C来实现的
从container成员的类型来区分可以分成两类:
- Container sequences:因为存储的是reference,所以可以存储一系列不同类型的成员:
  1. list
  2. tuple
  3. collections.deque
- Flat sequences:因为是把成员数据value(不是reference),所以存储的成员类型必须一致
  1. str
  2. bytes
  3. bytearray
  4. memoryview
  5. array.array
从成员的是否mutable,也可以分成两类:
- 成员可变的:list,bytearray,array.array,collections.deque,memoryview
- 成员不可变的:tuple,str,bytes

List Comprehensions and Generator Expressions

这个题目看起来有点难以理解,因为它说的是两种"快速而优雅"的创建sequence的方法:
- 如果最终的sequence是list的话,我们可以使用list comprehension(简称listcomps)
- 如果最终的sequence是出来list以外的sequence的话,我们就使用generator expression(简称genexps)

List Comprehensions and Readability

先来看一个不使用listcomps的例子

symbols = 'abcde'
codes = []
for symbol in symbols:
    codes.append(ord(symbol))

print(codes)

# <===================OUTPUT===================>
# [97, 98, 99, 100, 101]

看起来也没有那么麻烦,只不过python可以更简略,可以把codes的声明和赋值在一行搞定,这就是所谓的list comprehension(listcomps)

symbols = 'abcde'
codes = [ord(symbol) for symbol in symbols]
print(codes)

# <===================OUTPUT===================>
# [97, 98, 99, 100, 101]

listcomps的优点不仅仅在于其短了一行,还在于其语意的唯一性:listcomps就是用来初始化一个新的list的,而第一种方法里面的for loop却可能有很多种用途,这里只不过是使用了第一种用途:初始化
这也给我们提了醒,listcomps应该永远只用在初始化list的时候,而且同时我们要去 listcomps应该不能超过一行,超过一行的话,从可读性的角度,还是使用for循环比较好

既然说到了跨行,那再多说一句,在python的[], {}, ()里面, line break是被ignore的

In Python code, line breaks are ignored inside pairs of [], {}, or ().
So you can build multiline lists, listcomps, genexps, dictionaries and
the like without using the ugly \ line continuation escape.

Listcomps Versus map and filter

在创建新list这个功能上面listcomps肯定是最好的选择,虽然funtional的filter和 map也可以起到同样的作用,但是显然它们太麻烦了:
- 麻烦的filter+map.其中filter函数的原型是filter(function, iterable),对iterable 里面的所有成员调用function,返回true的组成一个新的iterable返回.map的函数原型就简单了map(function, iterable),对所有的iterable调用function.所以我们函数式的创建一个list的代码如下
```
symbols = '$¢£¥€¤'
beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
print(beyond_ascii)

# <===================OUTPUT===================>
# [162, 163, 165, 8364, 164]
```
- 而listcomps的代码肯定简单的多啦
```
symbols = '$¢£¥€¤'
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
print(beyond_ascii)

# <===================OUTPUT===================>
# [162, 163, 165, 8364, 164]
```
而且更重要的是listcomps的效率并不比"函数式方法"的差

Cartesian Products

笛卡尔积,这个题目比较吓人,其实就是全排列,我们来看看笛卡尔积的定义

假设集合A={a, b}，集合B={0, 1, 2}，则两个集合的笛卡尔积为
{(a, 0), (a, 1), (a, 2), (b, 0), (b, 1), (b, 2)}

我们用一个例子来理解下笛卡尔积,比如我们要做T恤,有两种颜色和三种大小的T恤,所以最常见的使用for循环来初始化的方法如下

colors = ['black', 'white']
sizes = ['S', 'M', 'L']

for color in colors:
    for size in sizes:
        print((color, size))

# <===================OUTPUT===================>
# ('black', 'S')
# ('black', 'M')
# ('black', 'L')
# ('white', 'S')
# ('white', 'M')
# ('white', 'L')

如果使用listcomps,会是一种更加简洁的写法

colors = ['black', 'white']
sizes = ['S', 'M', 'L']

tshirts = [(color, size) for color in colors for size in sizes]
print(tshirts)

# <===================OUTPUT===================>
# [('black', 'S'), ('black', 'M'), ('black', 'L'),
# ('white', 'S'), ('white', 'M'), ('white', 'L')]

listcomps的优势还在于可以在[]内部进行换行提高可读性

colors = ['black', 'white']
sizes = ['S', 'M', 'L']

tshirts = [(color, size) for size in sizes
           for color in colors]
print(tshirts)

# <===================OUTPUT===================>
# [('black', 'S'), ('white', 'S'), ('black', 'M'),
#  ('white', 'M'), ('black', 'L'), ('white', 'L')]

Generator Expressions

如果想初始化tuple, array或者其他类型的sequence,你当然可以先使用listcomps初始化一个list,然后在作为ctor的参赛来初始化一个sequence,比如tuple
```
t_from_l = tuple([n for n in range(5)])
print(t_from_l)

# <===================OUTPUT===================>
# (0, 1, 2, 3, 4)
```
但这样做代价太大,因为某个"临时的list",其主要作用,就是用来初始化另外一个sequence
最经济的创建非list sequence的方法就是generator expression(genexp).

从样子上来看,你肯定可以发现genexp其实就是"去掉中括号"的listcomps:

listcomp:

symbols = 'abcde'
print([ord(symbol) for symbol in symbols])
# <===================OUTPUT===================>
# [97, 98, 99, 100, 101]

genexp for tuple:

symbols = 'abcde'
print(tuple(ord(symbol) for symbol in symbols))
# <===================OUTPUT===================>
# (97, 98, 99, 100, 101)

在上面的例子中tuple()这个ctor只有一个参数,所以我们的genexp是"去掉中括号"的 listcomps,如果某个sequence的ctor有两个参数,那么我们的genexp就是"去掉中括号换上小括号"的listcomps
```
import array
symbols = 'abcde'
print(array.array('I', (ord(symbol) for symbol in symbols)))
# <===================OUTPUT===================>
# array('I', [97, 98, 99, 100, 101])
```

genexp相比于传统的for循环也有优势,比如前面我们使用两层for循环来打印T恤代码如下

colors = ['black', 'white']
sizes = ['S', 'M', 'L']

for color in colors:
    for size in sizes:
        print((color, size))

# <===================OUTPUT===================>
# ('black', 'S')
# ('black', 'M')
# ('black', 'L')
# ('white', 'S')
# ('white', 'M')
# ('white', 'L')

上述做法没有什么不对,但是所有的成员都是先存储在内存里面,然后让print读取的, 如果笛卡尔积的个数比较大(比如各有1000个成员),那么讲使用非常大一块内存.genexp 就能够解决这个问题,因为genexp是每次生成一个就传递给print
```
colors = ['black', 'white']
sizes = ['S', 'M', 'L']

for tshirt in ('%s %s' % (c, s) for c in colors for s in sizes):
    print(tshirt)

# <===================OUTPUT===================>
# black S
# black M
# black L
# white S
# white M
# white L
```

Tuples Are Not Just Immutable Lists

很多的python入门教材都会把tuple描述成"immutable list",但是这只是tuple两个重要特性中的一个,另外一个重要的特性是:records with no field names

Tuples as Records

tuple的成员可以看成是record,但是要从一个抽象的维度去看:
- 首先tuple的成员每个都是可以看成是这个record的field
- 每个tuple的成员还有一个position,这个potion也会给予它特殊的意义,虽然这个意义不像filed name那么好,但也总算是有个区别
如果仅仅把tuple看成是immutable list的话,那么其成员的个数,和每个成员的position 是不重要的,但是你要是把tuple看成是record的话,这两者就都非常重要啦

我们来看看下面的例子,下面例子中的position都有其确定的意义,更改每个tuple里面成员的位置都会是破坏性的

lax_coordinates = (33.9425, -118.408056)
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)
traveler_ids = [('USA', '31195855'),
                ('BAR', 'CE342567'), ('ESP', 'XDA205856')]

for passport in sorted(traveler_ids):
    print('%s/%s' % passport)

for country, _ in traveler_ids:
    print(country)

# <===================OUTPUT===================>
# BAR/CE342567
# ESP/XDA205856
# USA/31195855
# USA
# BAR
# ESP

我们在这个例子里面可以看到,python和tuple相处的特别好:
- % 可以理解tuple,并且知道把tuple"展开"来使用
- for也知道每一次循环的时候,把tuple"展开"返回
python能够理解tuple的原因,在于tuple unpacking mechanism

Tuple Unpacking

tuple unpacking最常见的地方是parallel assignment:也就是把一个iterable的值赋给a tuple of variable:
- 把iterable赋给tuple of variable,那么等号右边肯定是iterable(这里是list), 左边肯定是tuple of varialb
```
a = [1, 2]
(b, c) = a
print(b)
print(c)

# <===================OUTPUT===================>
# 1
# 2
```
- 只要是iterable就可以,那么其实等号的右边也肯定是tuple,只不过是tuple付给tuple 啦
```
a = (1, 2)
(b, c) = a
print(b)
print(c)

# <===================OUTPUT===================>
# 1
# 2
```
- 左边的tuple可以其实不用加括号,那就转化成了我们最常见的tuple unpacking:一个陌生的iterable变量返回给N个值给一个"去掉括号的tuple"
```
a = (1, 2)
b, c = a
print(b)
print(c)

# <===================OUTPUT===================>
# 1
# 2
```
- 如果右边是"纯"的tuple的话,那么右边其实也不用括号,这就转换成了我们常见的 swapping the values of variables without using a temporary variable
```
a = 1
b = 2

b, a = a, b
print(a)
print(b)

# <===================OUTPUT===================>
# 2
# 1
```
- 当tuple作为函数的参数的时候,你必须在传参的时候,明确的告诉function,tuple 的成员要"展开"使用,否则函数是不知道你要怎样使用的.明确"展开"tuple的方法是在tuple前面加一个*
```
def foo(a, b='not set'):
    print(a)
    print(b)


foo((1, 2))
foo(*(1, 2))


# <===================OUTPUT===================>
# (1, 2)
# not set
# 1
# 2
```
- 函数的返回值其实也就是一个变量,如果函数的返回值是一个iterable的话,那么肯定可以是可以利用tuple unpacking把结果传递给多个参数的
```
def foo():
    return [1, 2]


a, b = foo()
print(a)
print(b)

# <===================OUTPUT===================>
# 1
# 2
```
- 函数返回值有时候返回了很多的信息,但是我们不是对所有信息都有兴趣,那么我可以使用variable `_`(注意`_`也是一个变量,这和golang里面是不一样的)来作为 placeholder
```
def foo():
    return [1, 2, 3]


_, a, _ = foo()
print(a)
print(_)

# <===================OUTPUT===================>
# 2
# 3
```
- 有时候我们不确定有多少的返回值，或者说我们希望把某些“连续”的返回值存储在同一个list里面.这个时候,还是用*来提醒python解释器,把"剩下的"都以list的形式都装到某个变量里面
```
a, b, *rest = range(5)
print(a, b, rest)
a, b, *rest = range(3)
print(a, b, rest)
a, b, *rest = range(2)
print(a, b, rest)


# <===================OUTPUT===================>
# 0 1 [2, 3, 4]
# 0 1 [2]
# 0 1 []
```
- *prefix出现且只能出现一次,但是并不一定是在最后,还是可以在其他位置的
```
a, *body, c, d = range(5)
print(a, body, c, d)
*head, b, c, d = range(5)
print(head, b, c, d)

# <===================OUTPUT===================>
# 0 [1, 2] 3 4
# [0, 1] 2 3 4
```

Nested Tuple Unpacking

tuple unpacking更智能一步的可以让我们的receiver tuple nested(当然也就意味着提供tuple的iterable也是nested的)
```
a, (b1, b2), c = [1, [2, 3], 4]
print(b1)
print(b2)

# <===================OUTPUT===================>
# 2
# 3
```
值得注意的是,在python2里面,函数的参数在定义的时候,是允许使用nested tuple的换句话说,如下的函数定义在python2里面是合法的
```
def fn(a, (b, c), d):
    pass
```
但是从python3开始,这种定义方法变得非法了PEP3113详细说明了原因

Named Tuples

tuple的position(0, 1, 2, 3等)为tuple成员value提供了一个比较"模糊"的name,但是在调试的时候,这个name往往不容易辨识,所以python后来又提供了collections.namedtuple 你可以认为这是一个带name的tuple
collections.namedtuple的实现方法比较独特(主要是为了节省内存):namedtuple返回一个tuple的subclass(通过工厂模式),但是带有名字,名字存在了class里面,所以一个namedtuple instance和一个普通的tuple instance的内存大小是一样的

一个namedtuple的例子如下

from collections import namedtuple

City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
print(tokyo)
print(tokyo.population)
print(tokyo.coordinates)
print(tokyo[1])

# <===================OUTPUT===================>
# City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))
# 36.933
# (35.689722, 139.691667)
# JP

可以看到,我们的namedtuple的返回值是一个新的class,namedtuple的参数有两个:
- 第一个是class的名字
- 第二个是一个"空格分割"的字符串,来描述每个域.当然了从前面的例子中我们可以看到,第二个参数还可以是一个字符串数组
```
Card = collections.namedtuple('Card', ['rank', 'suit'])
```
另外所有的namedtuple class,都会有如下三个属性:
- _fields: 就是named tuple所有的name
- _make(): 使用iterable来初始化某个named tuple
- _asdict(): 主要返回利于打印的数据格式

三个属性的例子如下

from collections import namedtuple

City = namedtuple('City', 'name country population coordinates')

print(City._fields)
LatLong = namedtuple('LatLong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
delhi = City._make(delhi_data)
print(delhi)

for key, value in delhi._asdict().items():
    print(key + ":", value)

# <===================OUTPUT===================>
# ('name', 'country', 'population', 'coordinates')
# City(name='Delhi NCR', country='IN', population=21.935, coordinates=LatLong(lat=28.613889, long=77.208889))
# name: Delhi NCR
# country: IN
# population: 21.935
# coordinates: LatLong(lat=28.613889, long=77.208889)

Tuples as Immutable lists

另外,tuple实现了所有的list的函数,除了能更改成员数目的函数,比如add, remove等

tuple没有实现__reverse__,但是这也只是由于为了提高性能,我们可以使用reverse(tuple) 来替代

a = [1, 2, 3]
a.reverse()
print(a)

b = (1, 2, 3)
b = tuple(reversed(b))
print(b)

# <===================OUTPUT===================>
# [3, 2, 1]
# (3, 2, 1)

Slicing

一个list,tuple,str乃至所有的sequence type都有的特性,就是slice操作.这个操作非常有用,甚至影响了其他语言的设计,比如golang

Why Slices and Range Exclude the Last Item

许多语言,比如c和python里面的数组和其他sequence,都使用zero-based的方式来处理. 这样做有很多优点:
- 当只提供stop的时候,可以很容易的知道一个slice的长度,比如my_list[:3]就知道有三个成员.range(3)也是一样的道理
- 当同时提供了stop和start的时候,通过stop-start就知道了成员的个数,比如 my_list[1:3]就是两个成员
- 把一个sequence分成两个的时候,不容易overlap
```
l = [10, 20, 30, 40, 50, 60]
print(l[:2])
print(l[2:])
print(l[:3])
print(l[3:])

# <===================OUTPUT===================>
# [10, 20]
# [30, 40, 50, 60]
# [10, 20, 30]
# [40, 50, 60]
```

Slice Objects

slice容易被人忽视的还在于它可以在start,stop之后拥有一个step选项,用来确定skip 的成员的书面,也就是s[start:stop:step]
step的时候,总是会包含第一个成员,而且step的数目是包括当前的成员的,例子如下
```
s = '1234567'
print(s[::3])

# <===================OUTPUT===================>
# 147
```

step还可以是负数,那么就是总是包含最后一个成员,从后面开始.

s = '1234567'
print(s[::-2])

# <===================OUTPUT===================>
# 7531

step如果是-1的话,会有特殊效果,那就是"翻转"sequence

s = '1234567'
print(s[::-1])

# <===================OUTPUT===================>
# 7654321

[start:stop:step]这种方式只有在[]里面(并且这个[]是index或者subscript作用的时候),才起作用:这种情况下[start:stop:step]会生成一个slice(start, stop, step) object.
而我们的seq[strt:stop:step]在evaluate的时候,会调用special method__getitem__ 像是这样
```
seq.__getitem__(slice(start, stop, step))
```
这是我们第一次遇到slice object,其实slice object非常的有用,它能够有自己的名字,就像excel表格里面的行和列一样,下面来介绍一个slice的应用

假设我们有如下的发票信息,我们只希望优美的打印发票信息中的某两列信息(比如最重要是产品说明个单价),怎么办呢?

0.....6.................................40........52...55........
1909  Pimoroni PiBrella                     $17.50    3    $52.50
1489  6mm Tactile Switch x20                 $4.95    2     $9.90
1510  Panavise Jr. - PV-201                 $28.00    1    $28.00
1601  PiTFT Mini Kit 320x240                $34.95    1    $34.95

办法就是使用slice来命名这些行(也就是string),打印的时候,可以按照我们的需要打印对应的列就可以了

invoice = """
0.....6.................................40........52...55........
1909  Pimoroni PiBrella                     $17.50    3    $52.50
1489  6mm Tactile Switch x20                 $4.95    2     $9.90
1510  Panavise Jr. - PV-201                 $28.00    1    $28.00
1601  PiTFT Mini Kit 320x240                $34.95    1    $34.95
"""

SKU         = slice(0,  6)
DESCRIPTION = slice(6,  40)
UNIT_PRICE  = slice(40, 52)
QUANTITY    = slice(52, 55)
ITEM_TOTAL  = slice(55, None)

for item in invoice.split('\n')[2:-1]:
    print(item[UNIT_PRICE].strip(), item[DESCRIPTION].strip())

# <===================OUTPUT===================>
# ('$17.50', 'Pimoroni PiBrella')
# ('$4.95', '6mm Tactile Switch x20')
# ('$28.00', 'Panavise Jr. - PV-201')
# ('$34.95', 'PiTFT Mini Kit 320x240')

Multidimensional Slicing and Ellipsis

在numpy这个package里面,[]operator也是可以接受使用逗号分隔的index的,比如

import numpy
a = numpy.arange(12)
a.shape = 3, 4
print(a)
print(a[2, 1])

# <===================OUTPUT===================>
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]
# 9

但是built-in的sequence type在python里面确实是只有一维的(one-dimensional)

Assigning to Slices

如果一个sequence是mutable的,那么这个sequence的slice用法可以放在赋值语句的左边,作为receiver.注意,这个receiver的区间的长度,甚至可能和右边值的长度不一样.但是右边必须也得是iterable object,即便只有一个item

l = list(range(10))
print(l)
l[2:5] = [20, 30]
print(l)

##########################################
# TypeError: can only assign an iterable #
# l[2:5] = 100                           #
##########################################

l[2:5] = [100]
print(l)

# <===================OUTPUT===================>
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# [0, 1, 20, 30, 5, 6, 7, 8, 9]
# [0, 1, 100, 6, 7, 8, 9]

如果一个sequence是mutable的,那么这个sequence的slice用法也是可以放到del语句后面的

l = list(range(10))
print(l)
del l[2:5]
print(l)

# <===================OUTPUT===================>
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# [0, 1, 5, 6, 7, 8, 9]

Using + and * with Sequences

sequence支持+operator和*operator:

对于+operator来说,两个操作数要求是同一个类型,否则相加不成功,结果返回一个新的sequence,原来的两个操作数都不会被改动

l1 = [1, 2, 3]
l2 = [4, 5, 6]
print(l1 + l2)
print(l1)
print(l2)

# <===================OUTPUT===================>
# [1, 2, 3, 4, 5, 6]
# [1, 2, 3]
# [4, 5, 6]

对于乘法来说,操作数中一个为sequence,另外一个为整形,结果返回一个新的sequence 原来的两个操作数也都不会被改动

l1 = [1, 2, 3]
print(l1 * 2)
print(2 * l1)
print(l1)

# <===================OUTPUT===================>
# [1, 2, 3, 1, 2, 3]
# [1, 2, 3, 1, 2, 3]
# [1, 2, 3]

对于a * n这种格式(其中n为整数,a为sequence),如果a的成员里面还有sequence,并且 a不是immutable的话,那么这种做法非常的危险,因为同一个list会有多个reference, 很多问题都会随之出现.

下面就是一个常见的错误做法,此种做法的问题在于,最外层的list其实是三个reference, 而且这三个reference是指向的同一个list

a = [['_'] * 3] * 3
print(a)
a[1][2] = '0'
print(a)

# <===================OUTPUT===================>
# [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
# [['_', '_', '0'], ['_', '_', '0'], ['_', '_', '0']]

上面的做法其实是相当于如下

row = ['_'] * 3
board = []
for i in range(3):
    board.append(row)
print(board)
board[1][2] = '0'
print(board)

# <===================OUTPUT===================>
# [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
# [['_', '_', '0'], ['_', '_', '0'], ['_', '_', '0']]

Building Lists of Lists

上面提到的这种创建list of list(nested list)的情况也是很常见的.常见的错误做法我们也在上面做了举例了,正确的解法应该是使用list comprehension

a = [['_'] * 3 for i in range(3)]
print(a)
a[1][2] = '0'
print(a)

# <===================OUTPUT===================>
# [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
# [['_', '_', '_'], ['_', '_', '0'], ['_', '_', '_']]

上面正确的做法其实是相当于

board = []
for i in range(3):
    row = ['_'] * 3
    board.append(row)
print(board)
board[1][2] = '0'
print(board)

# <===================OUTPUT===================>
# [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
# [['_', '_', '_'], ['_', '_', '0'], ['_', '_', '_']]

Augmented Assignment with Sequences

c里面开始就有的augmented assignment operators(比如+=, *=),其special method是 __iadd__
但是__iadd__又不是非得要实现的,如果不实现__iadd__,那么会调用__add__先相加得到一个tmp object,然后把这个tmp object会赋给第一个操作数,这样一来我们会多一个中间对象.对于immutable sequence来说,这个中间对象是不可避免的,所以我们不得不去忍受这个
```
t = (1, 2, 3)
print(id(t))
t *= 2
print(id(t))

# <===================OUTPUT===================>
# 4319626152
# 4318489384
```

但是对于mutable sequence来说,这个中间object是可以避免的,所以内置的list显然是自己实现了__imul__

t = [1, 2, 3]
print(id(t))
t *= 2
print(id(t))

# <===================OUTPUT===================>
# 4304217736
# 4304217736

所以,如果你的类是mutable sequence,那么你肯定要自己实现一下__iadd__和__imul__

A += Assignment Puzzler

有一个关于+=非常著名的问题,下面的结果输出是什么
```
>>> t = (1, 2, [30, 40])
>>> t[2] += [50, 60]
```
选择题,四个选项是:
1. t becomes(1, 2, [30, 40, 50, 60])
2. TypeError: 'tuple' object does not support item assignment
3. Neither
4. Both a and b
如果我们对python有点了解,就可以知道tuple是immutable的,所以,即便tuple的成员自己本身是mutable的,但是我们不可以使用t[n]作为赋值的对象.所以了解到这点的人都会选择b作为答案

但真实的情况是a也是正确的,t获得了想要的新值,原因是什么呢?探求python机制的一个方法是查看python的bytecode

Python 3.6.1 (default, Mar 23 2017, 16:49:06)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> python.el: native completion setup loaded
>>> import dis
>>> dis.dis('s[a] += b')
  1           0 LOAD_NAME                0 (s)
              2 LOAD_NAME                1 (a)
              4 DUP_TOP_TWO
              6 BINARY_SUBSCR
              8 LOAD_NAME                2 (b)
             10 INPLACE_ADD
             12 ROT_THREE
             14 STORE_SUBSCR
             16 LOAD_CONST               0 (None)
             18 RETURN_VALUE

BINARY_SUBSCR这一步的意思,是把s[a]的值(这里是一个referenc)放到TOS(top of stack)
INPLACE_ADD这一步的是做如下处理:TOS += b,TOS里面是我们[30, 40]的ref,这个是一个mutable的object,所以TOS += b成功[30, 40]变成了[30, 40, 50, 60]
STORE_SUBSCR这一步的意思就是s[a] = TOS,因为s是immutable的,所以TypeError
从上面的例子中,我们也可以得到三条教训:
- 在tuple里面存储mutable item不是一个好的注意
- agumented assignment不是一个原子操作
- 使用byte code是非常好的分析python错误的办法

list.sort and sorted Built-In Function

list.sort这个函数是in place的来排序list的,也就是说,没有使用额外的copy

为了不让使用者误解(以为会返回一个新的object), list.sort的返回值是None,这也成为了python API的一个重要的convention

Functions or methods that change an object in place should return
None to make it clear to the caller that the object itself was changed,
and no new object was created.

当然了,使用上面的convention也有缺点:那就是无法将method chain起来.
和list.sort刚好相对的,就是built-in的sorted函数,这个函数不是in place的,所以它返回一个新的object.因为有新的object返回,所以这个sorted可以使用immutable sequence或者是generator来作为参数.
```
>>> sorted((1, 3, 2, 4, 5))
[1, 2, 3, 4, 5]
```

list.sort和sorted都可以接受两个额外的optional的参数:

reverse用来降序排列,默认值是False

>>> l = [1, 3, 2, 4, 5]
>>> l.sort(reverse=True)        # return None
>>> l
[5, 4, 3, 2, 1]

key用来对每个成员调用函数后再排列

>>> sorted(("abc", "defg", "hi"))
['abc', 'defg', 'hi']
>>> sorted(("abc", "defg", "hi"), key=len)
['hi', 'abc', 'defg']

一旦sequence变成sorted的,那么搜索起来就会快很多.python也为我们内置了二分搜索的模块bisect

Managing Ordered Sequences with bisect

Searching with bisect

bisect(haystack, needle) 其实就是在排序好的haystack上面使用二分法寻找needle 的position,这个position是什么意思呢?意思就是我们插入的haystack的位置.换句话说我们bisect的返回值肯定是一个位置,而不会是-1
返回值为-1的二分搜索法只存在于算法题中,现实生活中找不到某个element是没有意义的,如果找到这个element,我们至少希望知道它如果插入到这个排序好的sequence里面的话,位置是什么.换句话说这个返回值可以直接用到haystack.insert里面作为index 然后haystack还是sorted的
```
>>> a = [1, 2, 3, 4]
>>> bisect.bisect(a, 2.5)
2
>>> a.insert(bisect.bisect(a, 2.5), 2.5)
>>> a
[1, 2, 2.5, 3, 4]
```
bisect还自带两个optional参数lo和hi,是指定的需要排序的个数lo默认是0,hi默认是 sequence的长度
```
>>> a = [1, 2, 3, 4, 5]
>>> bisect.bisect(a, 6, lo=0, hi=3)
3
>>> bisect.bisect(a, 6)
5
```
实际上bisect.bisect的真名叫做bisect.bisect_right,它还有一个妹妹叫做bisect_left 两者的区别,只有在needle的数据和haystack里面的数据相同的情况下才能看出,区别是pos,也就是插入的位置不同:
- bisect_right,会返回一个pos,needle插入的话会排在所有"相同值"的最右边.这个是和bisect.bisect等价的
```
>>> a = [1, 2, 3, 4]
>>> bisect.bisect(a, 2.0)
2
>>> bisect.bisect_right(a, 2.0)
2
>>> a.insert(bisect.bisect_right(a, 2.0), 2.0)
>>> a
[1, 2, 2.0, 3, 4]
```
- bisect_left:顾名思义,就是把needle插入到所有"相同值"的最左边
```
>>> a = [1, 2, 3, 4]
>>> bisect.bisect(a, 2.0)
2
>>> bisect.bisect_left(a, 2.0)
1
>>> a.insert(bisect.bisect_left(a, 2.0), 2.0)
>>> a
[1, 2.0, 2, 3, 4]
```

Inserting with bisect.insort

我们前面说了bisect主要是找到needle插入haystack的位置,那么好了,我们能不能把两者结合到一块呢?那就是bisect.insort了,兼顾查找后插入,并且速度更快

下面是一个使用bisect来获取一个随机的数组,但是一直要保证数组顺序是升序

import bisect
import random

SIZE = 7
random.seed(1729)

my_list = []
for i in range(SIZE):
    new_item = random.randrange(SIZE*2)
    bisect.insort(my_list, new_item)
    print('%2d ->' % new_item, my_list)

# <===================OUTPUT===================>
# 10 -> [10]
#  0 -> [0, 10]
#  6 -> [0, 6, 10]
#  8 -> [0, 6, 8, 10]
#  7 -> [0, 6, 7, 8, 10]
#  2 -> [0, 2, 6, 7, 8, 10]
# 10 -> [0, 2, 6, 7, 8, 10, 10]

When a List Is Not the Answer

list由于其非常便捷的特点,在python大有被滥用的趋势,我们需要知道的是,根据情形的不同其实可以选择其他sequence来替代list

Arrays

如果list里面全部都是number的话,最好使用array来代替list.因为这种情况下array 效率更高
array效率高的代价是需要要求内部所有成员的类型一致(和c语言一样).这样一来所有成员的内存大小都一致,也更容易的进行优化:
- 比如对于signed char,传入array的第一个参数是'b',这个是typecode用来限定内存的大小
```
>>> a = array.array('b')
>>> a
array('b')
>>> a.append(12)
>>> a
array('b', [12])
>>> a.append(1234)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: signed char is greater than maximum
```
- 对于float number,typecode是'd'

Memory Views

可以让你在不需要拷贝数据的时候,在不同的sequence之间共享数据,这对于超大数据是非常重要的

NumPy and SciPy

NumPy主要是数据方面的package
SciPy在NumPy基础上开发,又使用了久经考验的C和Fortran库

Deques and Other Queues

虽然list也可以用来做队列,但是其实效率不高,特别是要在list最左侧进行添加删除的时候
collections.deque是一个thread-safe的double-ended的queue.

Chapter 03: Dictionaries and Sets

dict type不仅仅在我们的代码里面广泛使用,它还是Python内部实现的重要组成部分
也正是因为如此,python的dict类型是python虚拟机优化的重点部分

Generic Mapping Types

在python2.6和python3.2以前,存在着一个叫做collections.abc的module,用来作为所有的dict的子类

dict类型有一个局限性,就是它的key必须是hashable的,value不要求.什么是hashable 呢?

An object is hasable if it has a hash value which never changes during its
lifetime(it needs a __hash__() method), and can be compared to other objects
(it need an  __eq__() method).Hasable objects which compare equal must have
the same hash value

所有的atomic immutable类型都是hashable的,比如str, bytes, numeric

>>> hash(1)
1
>>> hash("abc")
-4275028297401921076
>>> hash('d')
-6398895540996775586

frozen set永远是hashable的,这是由其定义决定的
```
>>> hash(frozenset([30, 40]))
-3232971350518600656
```

tuple比较复杂,如果其所有的成员都是hashable的,那么它是hashable的,比如下面的tt 但是如果它含所有mutable的数据,比如tl,那它就不是hashable的

>>> tt = (1, 2, (30, 40))
>>> hash(tt)
8027212646858338501
>>> tl = (1, 2, [30, 40])
>>> hash(tl)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Python Glossary里面有一句是说"All Python's immutable built-in objects are hashable",这是不准确的,因为并不是所有的tuple都是hashable的,比如上面的tl
对于所有User-defined type来说:
- 如果它们没有创建__eq__,那么它是hashable的,因为hash(user_define_instance) 的结果总会是id()的结果
- 如果它们实现了__eq__,那么就要求这个类型的所有的attribute都是immutable的

我们可以使用多种方法来创建一个dict,如下例,这几种方法的效果是等价的

>>> a = dict(one=1, two=2, three=3)
>>> b = {'one': 1, 'two': 2, 'three': 3}
>>> c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
>>> d = dict([('two', 2), ('one', 1), ('three', 3)])
>>> e = dict({'three': 3, 'one': 1, 'two': 2})
>>> a == b == c == d == e
True

dict Comprehensions

从python2.7开始,listcomp的变体distcomp开始作用于dict. 下面是一个distcomp的例子

>>> DIAL_CODES = [(86, 'China'), (91, 'India'), (1, 'United States')]
>>> country_code = { country: code for code, country in DIAL_CODES }  # here is the revere of key and value
>>> country_code
{'China': 86, 'India': 91, 'United States': 1}
>>> {code: country.upper() for country, code in country_code.items() if code > 66}
{86: 'CHINA', 91: 'INDIA'}

Overview of Common Mapping Methods

dict和其另外的两个主要变体defaultdict,OrderedDict有着很多的API,我们统一叫做 mapping API

这里有个特殊的api,叫做upate(),其用法如下

d = {
    "one": 1,
    "two": 2
}

print(d)
d2 = {
    "one": 11,
    "two": 22,
    "three": 33
}

d2.update(d)
print(d2)

# <===================OUTPUT===================>
# {'one': 1, 'two': 2}
# {'one': 1, 'two': 2, 'three': 33}

update(m)调用的过程是典型的duck typing的例子:update会首先测试参数m是不是有 method keys():

如果有,就认为这是一个mapping类型.

如果没有,就假设它是一个(key, value)的pair

d = {
    "one": 11,
    "two": 22,
    "three": 33
}
print(d)
d.update(one=111, two=222, three=333)
print(d)

# <===================OUTPUT===================>
# {'one': 11, 'two': 22, 'three': 33}
# {'one': 111, 'two': 222, 'three': 333}

还有个特殊的api叫做setdefault(),它会极大的提高我们的效率,如下

index = {
    1: "one",
    2: "two",
    3: "three"
}


# use three line to set a value
occur = index.get(4, "")
occur += "four"
index[4] = occur
print(index)

# use only one line to set the value
ff = index.setdefault(5, "five")
print(ff)
print(index)

# <===================OUTPUT===================>
# {1: 'one', 2: 'two', 3: 'three', 4: 'four'}
# five
# {1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}

相比较于setdefault(), d[k]会在找不到key的时候抛出异常,而d.get(k, default)会是更好的选择,至少不用处理异常.当然了如果同时要update的话,setdefault()是首选

setdefault()的返回值就是我们新的value在dict里面的变量,如果变量是ref类型,比如 list的话,我们还可以把初始化和赋值chain起来,更加的简洁

index = {
    1: ["one"],
    2: ["two"],
    3: ["three"]
}


# use three line to set a value
occur = index.get(4, "")
occur += "four"
index[4] = occur
print(index)

# use only one line to set the value
index.setdefault(5, ["five"]).append("plus")
print(index)

# <===================OUTPUT===================>
# {1: ['one'], 2: ['two'], 3: ['three'], 4: 'four'}
# {1: ['one'], 2: ['two'], 3: ['three'], 4: 'four', 5: ['five', 'plus']}

Mappings with Flexible Key Lookup

由于dict里面某个元素不存在这种情况，特别的常见情况就是,某个key不存在的时候, 有个默认值, 前面是通过setdefault来设置一个默认值,而defaultdict的做法是设置一个默认的"调用方法"

import collections
index = collections.defaultdict(list)
print(index)
# do not include "one" before, initialize with the index["one"] = list()
index["one"].append(1)
print(index["one"])

# <===================OUTPUT===================>
# defaultdict(<class 'list'>, {})
# [1]

missing key之所以能够起作用,其背后起作用是special method: __missing__,这个 special method只会被__getitem__所调用

Variations of dict

在collections包里面,有很多dict的变形:
- collections.OrderedDict:特点是保持key插入的顺序
- collections.ChainMap
- collections.Counter:存储着key的数目
- collections.UserDict

Subclassing UserDict

一般来说,如果我们想自己实现一个dict的话,需要去extend UserDict而不是去extend dict

Immutable Mappings

dict默认情况下肯定是mutable的,但是有些时候,你不希望用户去更改你的dict,这个时候immutable的dict就显得很重要了

python3提供了这种dict

from types import MappingProxyType
d = {1: 'A'}
d_proxy = MappingProxyType(d)
print(d_proxy)

#####################################################################
# Traceback (most recent call last):                                #
#   File "c:/Users/hfeng/tmp/one.py", line 5, in <module>           #
#     d_proxy[1] = 'x'                                              #
# TypeError: 'mappingproxy' object does not support item assignment #
#####################################################################
# d_proxy[1] = 'x'

# <===================OUTPUT===================>
# {1: 'A'}

Set Theory

set这个概念是计算机中很常见的概念,但是比较晚才出现在python里面(2.3),其中有两种set:
- mutable版本:就是set
- immutable版本:就是frozenset

set不能有两个相同的元素,所以它的元素必须是hashable的

>>> l = ['spam', 'spam', 'eggs', 'spam']
>>> set(l)
set(['eggs', 'spam'])

set自己不是hashable的,因为其中的元素可能改变,但是frozenset是hashable

set Literals

set的literal模式很简单,就是{1}, {2}这种模式
```
>>> s = {1}
>>> type(s)
<type 'set'>
```
但是注意set没有empty set的literal,比如{}就默认是一个空的dict.空的set需要使用set()才可以. 在python3里面,为了让set也有一个空的literal,特别引进了{…}
```
>>> s = {...}
>>> type(s)
<class 'set'>
```
{1,2,3}这种literal的方式不仅仅是更加易读,而且其速度要超过set([1, 2, 3]),因为python为了让后者能够成功创建,还要去分析它的ctor的参数:一个list

frozenset就没有literal的格式啦,需要使用ctor来完成

>>> frozenset(range(10))
frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9})

Set Comprehensions

listcomp的巨大成功让python在2.7的时候同时为dict和set都加入了comprehension 其中set comprehension用法如下
```
>>> { i for i in range(10, 20) }
{10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
```

Set Operations

set支持基本上所有的数学操作

dict and set Under the Hood

知道dict和set在内部都是使用hash table来实现的,有助于我们理解它们的优点和缺点
在python2里面,keys()返回一个list,这也是我们能想到的最简单的实现
```
>>> m
{1: 11, 2: 22}
>>> m.keys()
[1, 2]
```
但是在python3里面,keys()返回了一个叫做dict_keys的类型,其行为更像是set,而不再是list
```
>>> m
{1: 11, 2: 22}
>>> m.keys()
dict_keys([1, 2])
```

Chapter 04: Text versus Bytes

Character Issues

string的定义,是非常简单的,就是一系列的character
```
A string is a sequence of characters.
```
但是问题就出在这个character上面:
- 在python2里面,这个character被定义成ascii character
- 在python3里面,这个character被定义成unicode character
本书主要介绍python3,但是python2还活生生的占据更多的代码,所以我们还是要拿出非常大的篇幅来介绍python2,但是首先要明确的是,python3的设计肯定教python2要更合理

Python2 problem

python2里面默认的字符串都是str()类型的, 它的str的每个成员都是ascii编码的 character.正因为如此,str就可以看做是byte数组,所以没有bytes(注意是bytes,而不是byte,因为说的是一个串的概念,不是单独一个.python2.6也引入了bytes,但是是 str的alias,完全不是python3里面的bytes的概念)这个类型
```
>>> print(sys.version_info)
sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)
>>> print type(b'byte type does not exist, it is str in python2')
<type 'str'>
```

相比之下,python3里面字符串的类型还叫str(),但是含义变了,每个成员都是unicode 的character.而bytes数组就必须存在了,因为bytes数组和str不是一个东西

>>> print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> print('strings are now utf-8 \u03BCnico\u0394é!')
strings are now utf-8 μnicoΔé!
>>> print(type(b'python3 has bytes type'))
<class 'bytes'>

python2的str因为是bytes的集合,所以它会有一个单独的unicode()类型

>>> print(sys.version_info)
sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)
>>> print type(unicode('this is like a python3 str type'))
<type 'unicode'>

"str默认是ascii编码,拥有unicode类型"都不是python2的错误,python2的错误是允许 unicode类型和str类型进行相加!不同类型的对象相加需要implicit的转换:

在unicode类型的对象里面都是str的情况下,都转换成unicode后相加是没有问题的

>>> print(sys.version_info)
sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)
>>> s = "world"
>>> u"Hello" + s
u'Helloworld'

但是有一天,我们的unicode类型不小心加入了非str的字符串,那么当对方转换成unicode 和我们unicode相加的时候,错误就出现了!

>>> print(sys.version_info)
sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)
>>> s = "世界"
>>> u"Hello" + s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

错误的根源就是这种,"遇到不合适的类型才报错"的机制,在s的内容不再是ascii编码的情况下才暴露问题的机制!

python3则从根本上杜绝了这个问题,因为它不允许implicit的相加,相加两个字符串的类型必须一致

>>> print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> type("hello")
<class 'str'>
>>> type(b"world")
<class 'bytes'>
>>> "hello" + b"world"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: must be str, not bytes

Character Issues Continue

Unicode 规范非常明确的区分了character和byte representation两个概念:
- 在Unicode的语境下面character被叫做code pint,它的range是从0到1,114,111这是非常大的一个数字,迄今只用了十分之一.比如A的Unicode表示方法就是"U+0041"
- 而在计算机中,只存在byte,所有的Unicode最终只能encode成byte,才能在计算机里面存储,转发.encode就是把code point转换成byte sequence的过程, 而byte sequence 有不同的存储方式,所以我们encode的时候都会有一个格式,比如UTF-8 encoding我们前面说的"U+0041"就得到\x41, 再比如UTF-16LE encoding "U+0041"就得到\x41\x00
这里为了帮助记忆来区分下encoding和decoding,Unicode是人类可以看懂的样式,而byte representation是机器可以看懂的格式,所以
- 从byte representation转换成人类可读的unicode就是decoding!(解码,不解,人类怎么读)
- 从unicode(code piont)转换成byte representation(无论是UTF-8也好,UTF-16也好) 就叫做encoding(decoding的反义词)

下面是一个unicode转换成utf-8 byte representation的例子

>>> s = 'café'
>>> len(s)
5
>>> b = s.encode('utf8')
>>> b
b'cafe\xcc\x81'
>>> len(b)
6
>>> b.decode('utf8')
'café'

需要注意的是,我们bytes类型的b是不可以encode的.而unicode类型的s也是不可以decode 的,这就更加不会让我们容易弄错

>>> b.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'
>>> s.decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'

之所以有上面一条,就是因为在python2里面,byte格式和unicode都是可以调用encode和 decode的,结果就非常的混乱

>>> type(s)
<type 'str'>
>>> s.encode('utf8')
'world'
>>> s.decode('utf8')
u'world'
>>> h = u'world'
>>> type(h)
<type 'unicode'>
>>> h.encode('utf8')
'world'
>>> h.decode('utf8')
u'world'

Byte Essentials

前面介绍到了bytes类型,我们说:bytes是一种内置的binary sequences.但是,内置的 binary sequence并不是只有这一种,还有一种叫做bytearray.不同的地方在于:
- bytes是immutable的, 只在python3里面有(python2里面的bytes是str的alias,完全不同于python3里面的bytes)
- bytearray是mutable的, 在python3和python2里面都有
好了,我们下面来重点说说我们的binary sequence(也就是bytes和bytearray).每一个 bytes or bytearray的成员(item)都是一个从0到255的integer!

而str不行,每个str的成员(item)都还是一个string

>>> print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> a = bytes("abc", encoding="utf_8")
>>> type(a)
<class 'bytes'>
>>> type(a[0])
<class 'int'>
>>> s = "abc"
>>> type(s[0])
<class 'str'>

这里还要特别的指出,str类型是唯一一个s== s[:1]的类型,这是为了实用性的考虑随便一个其他类型都不满足这个要求

>>> n = [1, 1, 1]
>>> n[0]
1
>>> n[:1]
[1]
>>> n[0] == n[:1]
False
>>> s = "111"
>>> s[0]
'1'
>>> s[:1]
'1'
>>> s[0] == s[:1]
True

说了这么多关于bytes和str,其实很没条理.如果用golang里面的类型来解释下python3 里面的类型会非常的清晰:
- golang里面的string类型,对应python3里面的str
- golang里面的rune类型,对应python3里面的str(长度为1的str)
- golang里面的[]byte类型,对应python3里面的bytes

在golang里面,一个str可以使用强制类型转换,来看看其rune和byte的长度:

如果str里面都是ascii,则两者长度相同

package main

import (
        "fmt"
        "os"
)

func main() {
        str := "cafe"
        b := []byte(str)
        fmt.Println(b)
        r := []rune(str)
        fmt.Println(r)
        os.Exit(0)
}

// <===================OUTPUT===================>
// [99 97 102 101]
// [99 97 102 101]

如果str里面含有unicode(UTF-8编码)长度最少为16bit的字符,那么长度不同

package main

import (
        "fmt"
        "os"
)

func main() {
        str := "café"
        b := []byte(str)
        fmt.Println(b)
        r := []rune(str)
        fmt.Println(r)
        os.Exit(0)
}

// <===================OUTPUT===================>
// [99 97 102 101 204 129]
// [99 97 102 101 769]

因为有更好的类型系统,golang能够清晰的显示如下两种类型的不同:
- byte这种二进制本质上是为了计算机准备的(所以会有utf-8这种编码格式,因为计算机想省空间)
- rune这种unicode,本质上的为了给人看懂的(所以都是32bit存储的,能存储所有的unicode, 爷不在乎空间,就安装最大的空间为每个字符准备)

bytes是python3里面给计算机准备的,计算机就会要求你告诉它是怎样的存储方式,计算机当然更喜欢比较节省空间的存储方式,这也是为什么UTF-8比较流行的原因,但其实你真的可以选择使用UTF-32,只是占的byte数目较多而已

>>> import sys;print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> u8 = bytes('café', encoding='utf_8')
>>> u32 = bytes('café', encoding='utf_32')
>>> len(u8)
6
>>> len(u32)
24
>>> u8
b'cafe\xcc\x81'
>>> u32
b'\xff\xfe\x00\x00c\x00\x00\x00a\x00\x00\x00f\x00\x00\x00e\x00\x00\x00\x01\x03\x00\x00'

无论使用什么编码方式,区别是使用的byte数目,每个byte里面绝对不可能超过256

>>> for c in u8: print(c)
...
99
97
102
101
204
129
>>> for c in u32: print(c)
...
255
254
0
0
99
0
0
0
97
0
0
0
102
0
0
0
101
0
0
0
1
3
0
0

bytearray是byte的一个mutable扩展,不是常用类型,所以没有为它设计literal syntax

>>> import sys;print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> cafe = bytes('café', encoding='utf_8')
>>> cafe
b'cafe\xcc\x81'
>>> bytearray(b'cafe\xcc\x81')
bytearray(b'cafe\xcc\x81')

binary sequence虽然本质上都是integer,但是如果真的按照integer显示出来,不太便于阅读,所以为了人类,python在byte sequence的显示上面,做了让步:
- 对于ASCII,我们就显示ASCII character
- 对于tab,newline, carriage return,\,分别使用escape sequence:\t,\n,\r,
- 对于所有其他的byte value,统一使用hexadecimal escape sequence比如null byte 就显示成\x00
binary sequence大部分的API和str一样,除了:
- bytes没有两部分API:formatting方面的(为了给人看的,bytes不需要给人看),以及 Unicode方面的,比如isdecimal, isnumeric等等
- bytes自己有一个str没有的class method:从hex digit创建bytes(显然人类看不懂 hex digit,所以也就不可能使用这个作为ctor的参赛)
```
>>> import sys;print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> bytes.fromhex('31 4B CE A9')
b'1K\xce\xa9'
```
除了上面说的fromhex以外,我们还有更加普通的一些创建bytes或是bytearray的方法:
- str和encoding作为参数
- 使用iterable来提供0到255的数字做参数
```
>>> bytes(range(2))
b'\x00\x01'
```
- 使用实现了buffer protocol协议的object做参数,这种从bufer-like的object来创建 byte的方法,有点像low-level的操作
```
>>> import array
>>> numbers = array.array('h', [-2, -1, 0, 1, 2])
>>> octets = bytes(numbers)
>>> octets
b'\xfe\xff\xff\xff\x00\x00\x01\x00\x02\x00'
```
使用所有buffer-like的source来创建bytes或者bytearray的情形下,都会copy这些bytes 相对的是memoryview,它会让你在binary data structure之间来分享内存!
和内存相关的还有一个struct mode,下面来详细介绍

Structs and Memory Views

struct module 提供了如下两个对象的相互转化:
- packed bytes
- a tuple of fields of different types
TODO

Basic Encoders/Decoders

在python里面,有超过100钟的编码格式,每一种都有自己的名字,同时还会有alias,换句话说,如下的字符串都表示utf8编码:
- 'utf_8'
- 'utf8'
- 'utf-8'
- 'U8'
这些编码字符串都可以作为函数的参赛,比如:
- open()
- str.encode()
- bytes.decode()
常见的encoding有:
- latin1
- cp1252
- cp437
- gb2312
- utf-8
- utf-16le

Understanding Encode/Decode Problems

虽然编码错误最常见的形式就是UnicodeError exception,但是其实这个错误肯定是可以分成两类的:
- UnicodeEncodeError(从str变成binary sequence)
- UnicodeDecodeError(从binary sequence变成str)
而且在读取python文件的时候,遇到编码问题一般也会报错

Coping with UnicodeEncodeError

对于大部分的非UTF编码格式(binary)来说,它们其实只是处理了"一部分"的Unicode, 所以text转换成binary的时候,有些text找不到合适的binary表达方式就会报UnicodeEncodeError 的错误(encode是给计算机看)

默认情况下"encode函数+某个编码格式"遇到无法正确的从text到binary的转换的时候,会抛出Error

>>> import sys;print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> city = 'São Paulo'
>>> city.encode('utf_8')
b'Sa\xcc\x83o Paulo'
>>> city.encode('cp437')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u0303' in position 2: character maps to <undefined>

如果不想出现exception,那么我没有三个选择:

传入errors='ignore',忽略这个字符,这一般是非常差的选择

>>> import sys;print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> city.encode('cp437', errors='ignore')
b'Sao Paulo'

传入errors='replace',使用`?`来替代不知道如何转换的text

>>> import sys;print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> city.encode('cp437', errors='replace')
b'Sa?o Paulo'

传入errors='xmlcharrefreplace',使用XML entity来替代

>>> import sys;print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> city.encode('cp437', errors='xmlcharrefreplace')
b'Sa&#771;o Paulo'

Coping with UnicodeDecodeError

就像不是所有的byte都是vaild的ascii一样,也不是所有的byte sequence都是valid 的UTF-8或者UTF-16.遇到binary到text转换失败的时候,就会碰到UnicodeDecodeError

>>> import sys;print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> octets = b'Montr\xe9al'
>>> octets.decode('utf_8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 5: invalid continuation byte

decode的问题在于,不是所有的编码都和UTF-8一样有节操,很多legacy的8-bit编码格式,比如'koi8_r', 'iso8859_1',是可以接受任意的byte sequence的,所以如果一个本来具有text意义的byte sequence使用类似'koi8_r'这种编码格式进行解码,就会出现我们原来在玩日本游戏的时候经常会出现的"乱码"(mojibake)
```
>>> import sys;print(sys.version_info)
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> octets = b'Montr\xe9al'
>>> octets.decode('cp1252')
'Montréal'
>>> octets.decode('koi8_r')
'MontrИal'
```

SyntxtError When Loading Modules with Unexpected Encoding

对于python2来说,其默认的source encoding是ASCII
对于python3来说,其默认的source encoding是UTF-8
无论是ASCII还是UTF-8,都不是那种所有的binary sequence都认的,所以如果有一个 python module里面的byte sequence无法被UTF-8编码所认识(同时你又没有指定编码格式)的时候,你就会得到如下的message
```
SyntaxError: Non-UTF-8 code starting with '\xe1' in file ola.py on line
     1, but no encoding declared; see http://python.org/dev/peps/pep-0263/
     for details
```
鉴于unix下面编辑器都是以UTF-8格式编码,一个可能出现上述问题的原因是你在windows 上面使用cp1252来对自己的源代码进行了encode,而但我们使用decode的时候,默认使用了UTF-8,当然会出错.对此,一个可行的解决方案是在代码最上面写上
```
# coding: cp1252
```

How to Discover the Encoding of a Byte Sequence

如何能够得知某个byte sequence(binary)的编码格式呢?答案是你不能!一个byte sequence 有可能就只是二进制执行的机器代码,如果是某种text编码而来,你获知的方法只能是被告知
如果在没有被告知的情况下,我们可以根据binary里面字符的分布状况来猜编码格式当然了,有程序猜的比我们更准确,那就是chardet(Universal Character Encoding Detector)

BOM: A Useful Gremin

由于UTF-16和UTF-32都使用了超过一个Byte的组合(两个或者四个)来表示一个char, 这种情况下就会涉及到计算机有两种体系:
- little endian
- big endian
为了区分这两种endian,utf-16和utf-32都不得不在自己编码的最前面加上BOM来表示自己的endian:
- 对于little endian来说就是'\xff\xfe'
- 对于big endian来说就是'\xff\xff'
UTF-8的一个巨大的优势就是无论endian怎样(endian只是对binary来说有意义),从某种text产生的byte sequence都是一样的

Handling Text Files

处理text的最佳实践是Unicode sandwich,包含三个要求(都是在python3的语境里面讨论的):
- input的时候,bytes要尽早的转换成str
- 三明治的meat部分,就是你的business logic,而你的business logic要完全的使用 str object来完成.在你的business logic里面你不应该做任何的encode和decode操作
- output的时候,str要尽可能晚的转换成bytes
所有的python web framework都执行了这个最佳实践,所以你在Django里面只需要处理 str就可以了
你需要注意的一个问题是,在你存储的时候,会强制你设置encoding,在你读取的时候, 虽然不强制你设置encoding,但是你最好设置,因为你不设置的话,会使用当前系统默认的encoding,比如在Windows上面就是cp1252,会出现如下的bug(在系统默认encoding是 utf-8的情况下不会发生)
```
>>> open('cafe.txt', 'w', encoding='utf_8').write('café') 4
>>> open('cafe.txt').read()
'cafÃ©'
```

Normalizing Unicode for Saner Comparisons

TODO

Case Folding

TODO

Sorting Unicode Text

TODO

The Unicode Database

TODO

Dual-Mode str and bytes APIs

TODO

Chapter 05: First-Class Functions

在python里面,function是first-class object,所谓first-class object具有如下特质:
- 可以在runtime创建
- 可以赋值给一个variable,或者是data structure里面的variable
- 可以作为一个argument传递给function
- 可以作为一个result从function返回
作为参照,我们可以列出除了function以外,python里面其他的first-class object:
- integer
- string
- dictionary

Treating a Function Like an Object

首先看一个简单的例子,从这个例子中,我们可以看到,python的function其实就是object 它是class function的一个object,还可以有自己的attribute,比如__doc__

def factorial(n):
    '''return n!'''
    return 1 if n < 2 else n * factorial(n-1)

print(factorial(42))
print(factorial.__doc__)
print(type(factorial))

# <===================OUTPUT===================>
# 1405006117752879898543142606244511569936384000000000
# return n!
# <class 'function'>

下面的例子,证明了function object的"first class"特性

>>> fact = factorial
>>> fact(5)
120
>>> map(factorial, range(11))
<map object at 0x10ea31c50>
>>> list(map(factorial, range(11)))
[1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

前面说过的"first class"特性都一一满足:
- 因为function是在console session里面创建的,所以是在runtime创建的
- 成功赋值给变量fact
- 函数fact作为map的参赛
- 可以赋值给变量,就一定能作为返回值返回

Higher-Order Functions

如果一个函数把一个function作为参数,或者是返回一个function作为返回值,那么这个函数就叫做higher-order function.
上面例子中的map就是一个higher-oder函数,它总是会要求一个参数为function

另外一个例子就是sorted,它有一个optional的参赛key,如果传入的话,就必须是一个 function

>>> fruits = ['strawberry', 'fig', 'apple', 'cherry', 'raspberry', 'banana']
>>> sorted(fruits, key=len)
['fig', 'apple', 'cherry', 'banana', 'raspberry', 'strawberry']

这里我们使用的是len()作为key,其实所有的one-argument函数都可以作为key,比如我们下面定义一个函数reverse,只有一个参数(当然参数就是sorted的list里面的内容啦)
```
>>> def reverse(word):
...     return word[::-1]
...
>>> reverse('testing')
'gnitset'
>>> sorted(fruits, key=reverse)
['banana', 'apple', 'fig', 'raspberry', 'strawberry', 'cherry']
```
原来在python里面常见的"符合函数式编程要求"的higher-order function有:
- map
- filter
- reduce
- apply(在python3里面已经被移除)
这些函数虽然看起来很"函数式"但是并不是python3的追求(可能过于的函数式),对python 来说,更好的替代品listcomp和genexp

Modern Replacements for map, filter, and reduce

map, reduce, filter是函数式编程比较核心的几个函数,python在原来也提供了名字完全一致的内置函数,但是后来发现这种方法不是很pythonic:因为需要函数式编程的知识,代码理解起来才不那么费劲
从python3开始,我们使用liscomp和genexp来替代map和reduce,它们是python的常见语法,而且更加的readable:
- listcomp就可以完全的替代map
```
>>> list(map(fact, range(6)))
[1, 1, 2, 6, 24, 120]
>>> [fact(n) for n in range(6)]
[1, 1, 2, 6, 24, 120]
```
- 原来需要filter并且外加lambda的情况,我们还是通过一个简单的if加listcomp就解决了
```
>>> list(map(factorial, filter(lambda n: n % 2, range(6))))
[1, 6, 120]
>>> [factorial(n) for n in range(6) if n % 2]
[1, 6, 120]
```

这里需要注意的是我们是python3,所以我们map,filter的返回值是一个generator

>>> import sys; print('=====Current Python Version: %s.%s=====' % (sys.version_info.major, sys.version_info.minor))
=====Current Python Version: 3.6=====
>>> map(fact, range(6))
<map object at 0x10ea35160>

而在python2时代,map, filter的返回值都是一个list

>>> import sys; print('=====Current Python Version: %s.%s=====' % (sys.version_info.major, sys.version_info.minor))
=====Current Python Version: 2.7=====
>>> map(fact, range(6))
[1, 1, 2, 6, 24, 120]

reduce在python2里面曾经也是一个built-in的函数,和map的地位一样

>>> import sys; print('=====Current Python Version: %s.%s=====' % (sys.version_info.major, sys.version_info.minor))
=====Current Python Version: 2.7=====
>>> from operator import add
>>> reduce(add, range(101))
5050

在python3里面,reduce的地位下降了,不再是built-in的函数,而是一个functools module里面的成员函数,需要import才能使用.原因还是因为python不想太functional 本来sum()就能完成的工作,不要再麻烦reduce啦

>>> import sys; print('=====Current Python Version: %s.%s=====' % (sys.version_info.major, sys.version_info.minor))
=====Current Python Version: 3.6=====
>>> from functools import reduce
>>> from operator import add
>>> reduce(add, range(101))
5050
>>> sum(range(101))
5050

其实reduce的核心思想就是把sequence of values变成一个value,除了sum以外,python 中还有两个built-in的函数有类似的功能:
- all(iterable):如果iterable里面的每个参数都是true,那么就返回true,注意all([]) 也返回True
```
>>> all([1, 2, 3, 4])
True
>>> all([True, 0])
False
>>> all([0,0])
False
>>> all([])
True
```
- any(iterable):如果iterable里面有一个参数是true,那么就返回true,注意any([]) 返回False
```
>>> any([1, 0, 0, 0])
True
>>> any([0, False])
False
>>> any([])
False
```
对于higher-order函数来说,创建一些小的,只用一次的函数的需求非常常见,这也就是 anonymous function存在的理由,后面我们会详细讲述匿名函数

Anonymous Functions

匿名函数在java等语言里面也有出现,其就是所谓"只用一次"的函数,在python里面, "一次性"函数的创建方法是加一个lambda关键字

python的lambda创建出来的"一次性函数"和普通的函数是有区别的,区别在于

lambda function的函数体必须是pure expressions的,换句话说在这个函数
体里面,不能有赋值,也不能使用其他的python statement,比如while, try

鉴于lambda的这种局限性,一般只有在做higher-order函数的参数的时候,才会使用这种形式

>>> fruits = ['strawberry', 'fig', 'apple', 'cherry', 'raspberry', 'banana']
>>> sorted(fruits, key=lambda word: word[::-1])
['banana', 'apple', 'fig', 'raspberry', 'strawberry', 'cherry']

lambda syntax其实只是一种语法糖,完全可以使用def来替代.lambda准确的说是一种 callable object,在python中还有很多这样的object,我们下面来介绍

The Sevent Flavors of Callable Objects

前面说了,lambda其实只是一种callable object,python专门有个built-in的function 来判断"一个object是不是一个callable object"
```
>>> i = 23
>>> callable(i)
False
>>> callable(lambda word: word)
True
```
python的data model文档里面,一共列出了存在于python的总共七种的callable type列举如下,需要注意下面的function和method的区别(function是不属于任何class的,而 method是属于某个class的):
- User-defined function:在非class的区域使用def或者lambda定义的function
- Built-in method: 全局的function,使用c语言实现,比如len()
- Built-in method: 定义在内置class里面的method,使用c语言实现,比如dict.get
- Method: 定义在普通class里面的的method,使用python实现的,而不是那些c实现的
- Class: python里面class创建一个instance的过程,其实是一个"被调用"的过程,这个过程会首先调用__new__method来创建一个instance,然后调用__init__来填充内存里面的各个域.因为没有new operator,所以在python里面实例化就是通过calling class
```
<class 'str'>
>>> callable(str)
True
```
- Class instance: 如果一个class定义了__call__ method,那么这个class创建出来的instance也是可以像function一样被调用的
- Generator functions: 如果一个function或者method使用了yield关键字,那么它的返回值就必然是一个generator object,这种情况下,这个function叫做generator function. 其实说generator function可以callable,倒不如说是generator function 的返回值generator object callable.而generator object之所以能够callable,其奥秘在于yield: yield是类似return的东西,但是它并不是返回,而是进行下一次循环,更重要的是yield是在调用的时候才去执行!这点非常重要,后面我们会讲到,先列一个简单的例子
```
# a generator function that yeilds items instead of returning a list
def firstn(n):
    num = 0
    while num < n:
        yield num
        num += 1


print(sum(firstn(1000000)))     # too big to put the list in the memory

# <===================OUTPUT===================>
# 499999500000
```

User-Defined Callable Types

并不是只有Python function才可以被调用,任何其他的python object在设置一下以后都可以被调用,这个设置就是实现__call__ instance method

import random

class BingoCage:
    def __init__(self, items):
        self._items = list(items)
        random.shuffle(self._items)

    def pick(self):
        try:
            return self._items.pop()
        except IndexError:
            raise LookupError('pick from empty BingoCage')

    def __call__(self):
        return self.pick()

bingo = BingoCage(range(3))
print(bingo.pick())
print(bingo())
print(callable(bingo))


# <===================OUTPUT===================>
# 1
# 0
# True

上面的这个例子略显生硬,因为一个object干啥要把自己变得callable啊,要有原因,要有动力.在python里面,最有动力把自己(一个object)变成callable的原因是:在不同的调用(被call)之间能够保存某些state
```
A class implementing __call__ is an easy way to create function-like
objects that have some internal state that must be kept across invocations
```

python当中,一个这样的例子就是class decorator(不是普通的decorator)

class counted():
    """ counts ho often a function is called """

    def __init__(self, func):
        self.func = func
        self.counter = 0

    def __call__(self, *args, **kwargs):
        self.counter += 1
        return self.func(*args, **kwargs)


@counted
def something():
    pass


for i in range(10):
    something()
print(something.counter)

# <===================OUTPUT===================>
# 10

我们知道,普通的函数就可以作为decorator,为什么要有class decorator呢?就是我们前面说的,我们希望能够保持一点inernal state(比如,调用次数),那么函数是做不到的, class 的class member可以做到(在不同的instance之间记录调用次数).下一步就是让我们的class生成的instance callable,使用的就是实现__call__ special method
closure也可以做到function with internal state,我们后面会介绍

Function Introspection

除了__doc__以外,function object还有很多其他的属性(attribute)

def factorial():
    pass

for one in dir(factorial):
    print(one)

# <===================OUTPUT===================>
# __annotations__
# __call__
# __class__
# __closure__
# __code__
# __defaults__
# __delattr__
# __dict__
# __dir__
# __doc__
# __eq__
# __format__
# __ge__
# __get__
# __getattribute__
# __globals__
# __gt__
# __hash__
# __init__
# __init_subclass__
# __kwdefaults__
# __le__
# __lt__
# __module__
# __name__
# __ne__
# __new__
# __qualname__
# __reduce__
# __reduce_ex__
# __repr__
# __setattr__
# __sizeof__
# __str__
# __subclasshook__

我们挑几个重点的来看一下,比如,让function更像个object的属性__dict__.这个属性其实就是一个真的字典,初始值是空{},每个instance都有,我们先看个简单的,非function 的instance

class User():
    def __init__(self, name):
        self.name = name
    def display(self):
        print(self.name)

u = User('Alice')
print(u.__dict__)
u.display()
u.name = 'Bill'
u.display()

u.age = 23
print(u.name, u.age)
print(u.__dict__)

# <===================OUTPUT===================>
# print(u.__dict__)
# Alice
# Bill
# Bill 23
# {'name': 'Bill', 'age': 23}

function也是一种instance(class function的instance),所以,也能有自己的__dic__, 所以也可以使用这个字典,虽然看起来有点奇怪

def upper_case_name(obj):
    return obj.upper()

print(upper_case_name.__dict__)
upper_case_name.short_description = 'Customer name'
print(upper_case_name.__dict__)
print(upper_case_name.short_description)

# <===================OUTPUT===================>
# {}
# {'short_description': 'Customer name'}
# Customer name

好了,我们再来看看function拥有,而普通的user-defined instance没有的属性

class C:
    pass
obj = C()
def func():
    pass

print(sorted(set(dir(func)) - set(dir(obj))))

# <===================OUTPUT===================>
# ['__annotations__', '__call__', '__closure__', '__code__',
#  '__defaults__', '__get__', '__globals__', '__kwdefaults__',
#  '__name__', '__qualname__']

我们来总结下这些个属性

Name	Type	Description
__annotations__	dict	Parameter and return annotations
__call__	method-wrapper	Implementation of the () operator
__closure__	tuple	function closure, bindings for free variables
__code__	code	function metadata and function body compiled into bytecode
__defaults__	tuple	default values for the formal parameters
__get__	method-wrapper	implementation of the read-only descriptor protocol
__globals__	dict	Global variables of the module where the function is defined
__kwdefaults__	dict	Default values for the keyword-only formal paramters
__name__	str	the function name
__qualname__	str	the qualified function name, e.g. Random.choice

From Positional to Keyword-Only Parameters

python的一大特点就是函数参数的处理非常的方便:

比如一个'*'的可变参数,会把参数以tuple的形式存储在'*'后面的变量里面

def argfunc(*my_args):
    print(type(my_args))
    print(my_args)

argfunc(1, 2, 3, 4, 5)
argfunc()

# <===================OUTPUT===================>
# <class 'tuple'>
# (1, 2, 3, 4, 5)
# <class 'tuple'>
# ()

还有两个'*'的可变参数,会把参数以dict的形式存储在'**'后面的变量里面

def argDict(**my_args):
    print(type(my_args))
    print(my_args)

argDict(a = 'one')

# <===================OUTPUT===================>
# <class 'dict'>
# {'a': 'one'}

当然了,更多的源代码会把两者结合起来使用,也就是源代码里面随处可见的*args, **kwargs

def test(*args, **kwargs):
    print(args)
    print(kwargs)

test(1, 2, a= 23)
# <===================OUTPUT===================>
# (1, 2)
# {'a': 23}

在python3里面,这种便利性得到了进一步的提高,因为python3引入了keyword-only argument

keyword-only argument在python3里面才出现,其出现是为了弥补单'*'和双'*'参数的一个缺陷:无法处理python的关键字作为dict的参数,看例子

def html_tag(*args, **kwargs):
    print(args)
    print(kwargs)


html_tag('div', 'hello', name="img")

###############################
# SyntaxError: invalid syntax #
###############################
# html_tag('div', 'hello', name="img", class='sidebar')


# <===================OUTPUT===================>
# ('div', 'hello')
# {'name': 'img'}

上面的例子中,如果我们想传入一个key是python关键字的情况(class),会失败,因为 python不会理解你的意思,它认为你格式错了
这就是为什么我们引入了keyword-only argument,它有如下要求:
- 在单'*'后面,这个必须满足,而且单'*'必须存在
- 在双'*'前面,这个也必须满足,但是双'*'可以不存在

针对上面的例子,单双'*'都必须有,我们的keyword-only parameter就必须在单双'*' 之间

def html_tag(*args, cls=None, **kwargs):
    if cls is not None:
        kwargs['class'] = cls
    print(args)
    print(kwargs)


html_tag('div', 'hello', name="img")
html_tag('div', 'hello', cls='sidebar', name="img")


# <===================OUTPUT===================>
# ('div', 'hello')
# {'name': 'img'}
# ('div', 'hello')
# {'name': 'img', 'class': 'sidebar'}

其实我们的keyword-only参数并不需要如下两个条件:

并不一定需要双'*',如果我们只有一个dict参数的情况

def f(a, *, b=2):
    print(a, b)

f(1, b=3)

# <===================OUTPUT===================>
# 1 3

我们的keyword-only参数也并不一定需要default value,只是设置了default value 会好一点,让用户知道这个是一个keyword-only参数,而不至于用错

def f(a, *, b):
    print(a, b)

f(1, b=2)
###############################################################
# TypeError: f() takes 1 positional argument but 2 were given #
###############################################################
# f(1, 2)

# <===================OUTPUT===================>
# 1 2