{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#### Кафедра дискретной математики МФТИ\n", "\n", "Никита Волков\n", "\n", "На основе http://www.inp.nsk.su/~grozin/python/\n", "\n", "# Библиотека numpy\n", "\n", "Пакет `numpy` предоставляет $n$-мерные однородные массивы (все элементы одного типа); в них нельзя вставить или удалить элемент в произвольном месте. В `numpy` реализовано много операций над массивами в целом. Если задачу можно решить, произведя некоторую последовательность операций над массивами, то это будет столь же эффективно, как в `C` или `matlab` - львиная доля времени тратится в библиотечных функциях, написанных на `C`.\n", "\n", "## Одномерные массивы" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Можно преобразовать список в массив." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([0, 2, 1]), numpy.ndarray)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([0, 2, 1])\n", "a, type(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`print` печатает массивы в удобной форме." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 2 1]\n" ] } ], "source": [ "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Класс `ndarray` имеет много методов." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'T',\n", " '__abs__',\n", " '__add__',\n", " '__and__',\n", " '__array__',\n", " '__array_finalize__',\n", " '__array_interface__',\n", " '__array_prepare__',\n", " '__array_priority__',\n", " '__array_struct__',\n", " '__array_wrap__',\n", " '__bool__',\n", " '__complex__',\n", " '__contains__',\n", " '__copy__',\n", " '__deepcopy__',\n", " '__delitem__',\n", " '__divmod__',\n", " '__float__',\n", " '__floordiv__',\n", " '__getitem__',\n", " '__iadd__',\n", " '__iand__',\n", " '__ifloordiv__',\n", " '__ilshift__',\n", " '__imatmul__',\n", " '__imod__',\n", " '__imul__',\n", " '__index__',\n", " '__int__',\n", " '__invert__',\n", " '__ior__',\n", " '__ipow__',\n", " '__irshift__',\n", " '__isub__',\n", " '__iter__',\n", " '__itruediv__',\n", " '__ixor__',\n", " '__len__',\n", " '__lshift__',\n", " '__matmul__',\n", " '__mod__',\n", " '__mul__',\n", " '__neg__',\n", " '__or__',\n", " '__pos__',\n", " '__pow__',\n", " '__radd__',\n", " '__rand__',\n", " '__rdivmod__',\n", " '__rfloordiv__',\n", " '__rlshift__',\n", " '__rmatmul__',\n", " '__rmod__',\n", " '__rmul__',\n", " '__ror__',\n", " '__rpow__',\n", " '__rrshift__',\n", " '__rshift__',\n", " '__rsub__',\n", " '__rtruediv__',\n", " '__rxor__',\n", " '__setitem__',\n", " '__setstate__',\n", " '__sub__',\n", " '__truediv__',\n", " '__xor__',\n", " 'all',\n", " 'any',\n", " 'argmax',\n", " 'argmin',\n", " 'argpartition',\n", " 'argsort',\n", " 'astype',\n", " 'base',\n", " 'byteswap',\n", " 'choose',\n", " 'clip',\n", " 'compress',\n", " 'conj',\n", " 'conjugate',\n", " 'copy',\n", " 'ctypes',\n", " 'cumprod',\n", " 'cumsum',\n", " 'data',\n", " 'diagonal',\n", " 'dot',\n", " 'dtype',\n", " 'dump',\n", " 'dumps',\n", " 'fill',\n", " 'flags',\n", " 'flat',\n", " 'flatten',\n", " 'getfield',\n", " 'imag',\n", " 'item',\n", " 'itemset',\n", " 'itemsize',\n", " 'max',\n", " 'mean',\n", " 'min',\n", " 'nbytes',\n", " 'ndim',\n", " 'newbyteorder',\n", " 'nonzero',\n", " 'partition',\n", " 'prod',\n", " 'ptp',\n", " 'put',\n", " 'ravel',\n", " 'real',\n", " 'repeat',\n", " 'reshape',\n", " 'resize',\n", " 'round',\n", " 'searchsorted',\n", " 'setfield',\n", " 'setflags',\n", " 'shape',\n", " 'size',\n", " 'sort',\n", " 'squeeze',\n", " 'std',\n", " 'strides',\n", " 'sum',\n", " 'swapaxes',\n", " 'take',\n", " 'tobytes',\n", " 'tofile',\n", " 'tolist',\n", " 'tostring',\n", " 'trace',\n", " 'transpose',\n", " 'var',\n", " 'view'}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set(dir(a)) - set(dir(object))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Наш массив одномерный." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.ndim" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "В $n$-мерном случае возвращается кортеж размеров по каждой координате." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3,)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`size` - это полное число элементов в массиве; `len` - размер по первой координате (в 1-мерном случае это то же самое)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3, 3)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(a), a.size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`numpy` предоставляет несколько типов для целых (`int16`, `int32`, `int64`) и чисел с плавающей точкой (`float32`, `float64`)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(dtype('int64'), 'int64', 8)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.dtype, a.dtype.name, a.itemsize" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Индексировать массив можно обычным образом." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Массивы - изменяемые объекты." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 3 1]\n" ] } ], "source": [ "a[1] = 3\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Массивы, разумеется, можно использовать в `for` циклах. Но при этом теряется главное преимущество `numpy` - быстродействие. Всегда, когда это возможно, лучше использовать операции над массивами как едиными целыми." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "3\n", "1\n" ] } ], "source": [ "for i in a:\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Массив чисел с плавающей точкой." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([0., 2, 1])\n", "b.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Точно такой же массив." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 2. 1.]\n" ] } ], "source": [ "c = np.array([0, 2, 1], dtype=np.float64)\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Преобразование данных" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n", "[0 2 1]\n", "['0.0' '2.0' '1.0']\n" ] } ], "source": [ "print(c.dtype)\n", "print(c.astype(int))\n", "print(c.astype(str))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Массивы, заполненные нулями или единицами. Часто лучше сначала создать такой массив, а потом присваивать значения его элементам." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0. 0.]\n" ] } ], "source": [ "a = np.zeros(3)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 1 1]\n" ] } ], "source": [ "b = np.ones(3, dtype=np.int64)\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если нужно создать массив, заполненный нулями, длины другого массива, то можно использовать конструкцию" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros_like(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Функция `arange` подобна `range`. Аргументы могут быть с плавающей точкой. Следует избегать ситуаций, когда *(конец-начало)/шаг* - целое число, потому что в этом случае включение последнего элемента зависит от ошибок округления. Лучше, чтобы конец диапазона был где-то посредине шага." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 2 4 6 8]\n" ] } ], "source": [ "a = np.arange(0, 9, 2)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 2. 4. 6. 8.]\n" ] } ], "source": [ "b = np.arange(0., 9, 2)\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Последовательности чисел с постоянным шагом можно также создавать функцией `linspace`. Начало и конец диапазона включаются; последний аргумент - число точек." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 2. 4. 6. 8.]\n" ] } ], "source": [ "a = np.linspace(0, 8, 5)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Последовательность чисел с постоянным шагом по логарифмической шкале от $10^0$ до $10^1$." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 1.77827941 3.16227766 5.62341325 10. ]\n" ] } ], "source": [ "b = np.logspace(0, 1, 5)\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Операции над одномерными массивами\n", "\n", "Арифметические операции проводятся поэлементно." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 3.77827941 7.16227766 11.62341325 18. ]\n" ] } ], "source": [ "print(a + b)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-1. 0.22172059 0.83772234 0.37658675 -2. ]\n" ] } ], "source": [ "print(a - b)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 3.55655882 12.64911064 33.74047951 80. ]\n" ] } ], "source": [ "print(a * b)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 1.12468265 1.26491106 1.06696765 0.8 ]\n" ] } ], "source": [ "print(a / b)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 4. 16. 36. 64.]\n" ] } ], "source": [ "print(a ** 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Когда операнды разных типов, они пиводятся к большему типу." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 3. 5. 7. 9.]\n" ] } ], "source": [ "i = np.ones(5, dtype=np.int64)\n", "print(a + i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`numpy` содержит элементарные функции, которые тоже применяются к массивам поэлементно. Они называются универсальными функциями (`ufunc`)." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(, numpy.ufunc)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sin, type(np.sin)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.90929743 -0.7568025 -0.2794155 0.98935825]\n" ] } ], "source": [ "print(np.sin(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Один из операндов может быть скаляром, а не массивом." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 3. 5. 7. 9.]\n" ] } ], "source": [ "print(a + 1)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 4. 8. 12. 16.]\n" ] } ], "source": [ "print(2 * a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Сравнения дают булевы массивы." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False True True True False]\n" ] } ], "source": [ "print(a > b)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False False False False False]\n" ] } ], "source": [ "print(a == b)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False False False True True]\n" ] } ], "source": [ "c = a > 5\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Кванторы \"существует\" и \"для всех\"." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(True, False)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.any(c), np.all(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Модификация на месте." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 3. 5. 7. 9.]\n" ] } ], "source": [ "a += 1\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 2. 3.55655882 6.32455532 11.2468265 20. ]\n" ] } ], "source": [ "b *= 2\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 2. 1.18551961 1.26491106 1.6066895 2.22222222]\n" ] } ], "source": [ "b /= a\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "При выполнении операций над массивами деление на 0 не возбуждает исключения, а даёт значения `np.nan` или `np.inf`." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. nan inf -inf]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.5/dist-packages/ipykernel/__main__.py:1: RuntimeWarning: divide by zero encountered in true_divide\n", " if __name__ == '__main__':\n", "/usr/local/lib/python3.5/dist-packages/ipykernel/__main__.py:1: RuntimeWarning: invalid value encountered in true_divide\n", " if __name__ == '__main__':\n" ] } ], "source": [ "print(np.array([0.0, 0.0, 1.0, -1.0]) / np.array([1.0, 0.0, 0.0, 0.0]))" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(nan, inf, nan, 0.0)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.nan + 1, np.inf + 1, np.inf * 0, 1. / np.inf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Сумма и произведение всех элементов массива; максимальный и минимальный элемент; среднее и среднеквадратичное отклонение." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(8.2793423935260435,\n", " 10.708241812210389,\n", " 2.2222222222222223,\n", " 1.1855196066926152,\n", " 1.6558684787052087,\n", " 0.40390033426607452)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.sum(), b.prod(), b.max(), b.min(), b.mean(), b.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Имеются встроенные функции" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.41421356 1.08881569 1.12468265 1.26755256 1.49071198]\n", "[ 7.3890561 3.27238673 3.54277764 4.98627681 9.22781435]\n", "[ 0.69314718 0.17018117 0.23500181 0.47417585 0.7985077 ]\n", "[ 0.90929743 0.92669447 0.95358074 0.99935591 0.79522006]\n", "2.718281828459045 3.141592653589793\n" ] } ], "source": [ "print(np.sqrt(b))\n", "print(np.exp(b))\n", "print(np.log(b))\n", "print(np.sin(b))\n", "print(np.e, np.pi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Иногда бывает нужно использовать частичные (кумулятивные) суммы. В нашем курсе такое пригодится." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 2. 3.18551961 4.45043067 6.05712017 8.27934239]\n" ] } ], "source": [ "print(b.cumsum())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Функция `sort` возвращает отсортированную копию, метод `sort` сортирует на месте." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.18551961 1.26491106 1.6066895 2. 2.22222222]\n", "[ 2. 1.18551961 1.26491106 1.6066895 2.22222222]\n" ] } ], "source": [ "print(np.sort(b))\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.18551961 1.26491106 1.6066895 2. 2.22222222]\n" ] } ], "source": [ "b.sort()\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Объединение массивов." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 3. 5. 7. 9. 1.18551961\n", " 1.26491106 1.6066895 2. 2.22222222]\n" ] } ], "source": [ "a = np.hstack((a, b))\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Расщепление массива в позициях 3 и 6." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[array([ 1., 3., 5.]),\n", " array([ 7. , 9. , 1.18551961]),\n", " array([ 1.26491106, 1.6066895 , 2. , 2.22222222])]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.hsplit(a, [3, 6])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Функции `delete`, `insert` и `append` не меняют массив на месте, а возвращают новый массив, в котором удалены, вставлены в середину или добавлены в конец какие-то элементы." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 3. 5. 7. 9. 1.26491106\n", " 2. 2.22222222]\n" ] } ], "source": [ "a = np.delete(a, [5, 7])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 3. 0. 0. 5. 7. 9.\n", " 1.26491106 2. 2.22222222]\n" ] } ], "source": [ "a = np.insert(a, 2, [0, 0])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 3. 0. 0. 5. 7. 9.\n", " 1.26491106 2. 2.22222222 1. 2. 3. ]\n" ] } ], "source": [ "a = np.append(a, [1, 2, 3])\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Есть несколько способов индексации массива. Вот обычный индекс." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]\n" ] } ], "source": [ "a = np.linspace(0, 1, 11)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.2\n" ] } ], "source": [ "b = a[2]\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Диапазон индексов. Создаётся новый заголовок массива, указывающий на те же данные. Изменения, сделанные через такой массив, видны и в исходном массиве." ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.2 0.3 0.4 0.5]\n" ] } ], "source": [ "b = a[2:6]\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.2 0.3 0.4 0.5]\n" ] } ], "source": [ "b[0] = -0.2\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.1 -0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]\n" ] } ], "source": [ "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Диапазон с шагом 2." ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.1 0.3 0.5 0.7 0.9]\n" ] } ], "source": [ "b = a[1:10:2]\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. -0.1 -0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]\n" ] } ], "source": [ "b[0] = -0.1\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Массив в обратном порядке." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 -0.2 -0.1]\n" ] } ], "source": [ "b = a[len(a):0:-1]\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Подмассиву можно присвоить значение - массив правильного размера или скаляр." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0. -0.2 0.3 0. 0.5 0.6 0. 0.8 0.9 1. ]\n" ] } ], "source": [ "a[1:10:3] = 0\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Тут опять создаётся только новый заголовок, указывающий на те же данные." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.1 -0.2 0.3 0. 0.5 0.6 0. 0.8 0.9 1. ]\n" ] } ], "source": [ "b = a[:]\n", "b[1] = 0.1\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Чтобы скопировать и данные массива, нужно использовать метод `copy`." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.1 0. 0.3 0. 0.5 0.6 0. 0.8 0.9 1. ]\n", "[ 0. 0.1 -0.2 0.3 0. 0.5 0.6 0. 0.8 0.9 1. ]\n" ] } ], "source": [ "b = a.copy()\n", "b[2] = 0\n", "print(b)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Можно задать список индексов." ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.2 0.3 0.5]\n" ] } ], "source": [ "print(a[[2, 3, 5]])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Можно задать булев массив той же величины." ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False True False True False True True False True True True]\n" ] } ], "source": [ "b = a > 0\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.1 0.3 0.5 0.6 0.8 0.9 1. ]\n" ] } ], "source": [ "print(a[b])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2-мерные массивы" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 1.]\n", " [-1. 0.]]\n" ] } ], "source": [ "a = np.array([[0.0, 1.0], [-1.0, 0.0]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.ndim" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 2)" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 4)" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(a), a.size" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-1.0" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[1, 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Атрибуту `shape` можно присвоить новое значение - кортеж размеров по всем координатам. Получится новый заголовок массива; его данные не изменятся." ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 1. 2. 3.]\n" ] } ], "source": [ "b = np.linspace(0, 3, 4)\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4,)" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.shape" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 1.]\n", " [ 2. 3.]]\n" ] } ], "source": [ "b.shape = 2, 2\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Можно растянуть в одномерный массив" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 1. 2. 3.]\n" ] } ], "source": [ "print(b.ravel())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Арифметические операции поэлементные" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 2.]\n", " [ 0. 1.]]\n", "[[ 0. 2.]\n", " [-2. 0.]]\n", "[[ 0. 2.]\n", " [-1. 1.]]\n", "[[ 0. 1.]\n", " [ 1. 2.]]\n", "[[ 0. 2.]\n", " [ 1. 3.]]\n" ] } ], "source": [ "print(a + 1)\n", "print(a * 2)\n", "print(a + [0, 1]) # второе слагаемое дополняется до матрицы копированием строк\n", "print(a + np.array([[0, 2]]).T) # .T - транспонирование\n", "print(a + b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Поэлементное и матричное (только в Python >=3.5) умножение." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 1.]\n", " [-2. 0.]]\n" ] } ], "source": [ "print(a * b)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2. 3.]\n", " [ 0. -1.]]\n" ] } ], "source": [ "print(a @ b)" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-1. 0.]\n", " [-3. 2.]]\n" ] } ], "source": [ "print(b @ a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Умножение матрицы на вектор." ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-1. -1.]\n" ] } ], "source": [ "v = np.array([1, -1], dtype=np.float64)\n", "print(b @ v)" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-2. -2.]\n" ] } ], "source": [ "print(v @ b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если у вас Питон более ранней версии, то для работы с матрицами можно использовать класс `np.matrix`, в котором операция умножения реализуется как матричное умножение." ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[ 2., 3.],\n", " [ 0., -1.]])" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.matrix(a) * np.matrix(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Внешнее произведение $a_{ij}=u_i v_j$" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 2.]\n", "[ 2. 3. 4.]\n" ] } ], "source": [ "u = np.linspace(1, 2, 2)\n", "v = np.linspace(2, 4, 3)\n", "print(u)\n", "print(v)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2. 3. 4.]\n", " [ 4. 6. 8.]]\n" ] } ], "source": [ "a = np.outer(u, v)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Двумерные массивы, зависящие только от одного индекса: $x_{ij}=u_j$, $y_{ij}=v_i$" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 2.]\n", " [ 1. 2.]\n", " [ 1. 2.]]\n", "[[ 2. 2.]\n", " [ 3. 3.]\n", " [ 4. 4.]]\n" ] } ], "source": [ "x, y = np.meshgrid(u, v)\n", "print(x)\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Единичная матрица." ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 0. 0. 0.]\n", " [ 0. 1. 0. 0.]\n", " [ 0. 0. 1. 0.]\n", " [ 0. 0. 0. 1.]]\n" ] } ], "source": [ "I = np.eye(4)\n", "print(I)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Метод `reshape` делает то же самое, что присваивание атрибуту `shape`." ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1.]\n" ] } ], "source": [ "print(I.reshape(16))" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 0. 0. 0. 0. 1. 0. 0.]\n", " [ 0. 0. 1. 0. 0. 0. 0. 1.]]\n" ] } ], "source": [ "print(I.reshape(2, 8))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Строка." ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 1. 0. 0.]\n" ] } ], "source": [ "print(I[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Цикл по строкам." ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 0. 0. 0.]\n", "[ 0. 1. 0. 0.]\n", "[ 0. 0. 1. 0.]\n", "[ 0. 0. 0. 1.]\n" ] } ], "source": [ "for row in I:\n", " print(row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Столбец." ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0. 1. 0.]\n" ] } ], "source": [ "print(I[:, 2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Подматрица." ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 0.]\n", " [ 1. 0.]]\n" ] } ], "source": [ "print(I[0:2, 1:3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Можно построить двумерный массив из функции." ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 0 0 0]\n", " [1 1 1 1]\n", " [2 2 2 2]\n", " [3 3 3 3]]\n", "[[0 1 2 3]\n", " [0 1 2 3]\n", " [0 1 2 3]\n", " [0 1 2 3]]\n", "[[ 0 1 2 3]\n", " [10 11 12 13]\n", " [20 21 22 23]\n", " [30 31 32 33]]\n" ] } ], "source": [ "def f(i, j):\n", " print(i)\n", " print(j)\n", " return 10 * i + j\n", "\n", "print(np.fromfunction(f, (4, 4), dtype=np.int64))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Транспонированная матрица." ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 2.]\n", " [ 1. 3.]]\n" ] } ], "source": [ "print(b.T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Соединение матриц по горизонтали и по вертикали." ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1]\n", " [2 3]]\n", "[[4 5 6]\n", " [7 8 9]]\n", "[[4 5]\n", " [6 7]\n", " [8 9]]\n" ] } ], "source": [ "a = np.array([[0, 1], [2, 3]])\n", "b = np.array([[4, 5, 6], [7, 8, 9]])\n", "c = np.array([[4, 5], [6, 7], [8, 9]])\n", "print(a)\n", "print(b)\n", "print(c)" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1 4 5 6]\n", " [2 3 7 8 9]]\n" ] } ], "source": [ "print(np.hstack((a, b)))" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1]\n", " [2 3]\n", " [4 5]\n", " [6 7]\n", " [8 9]]\n" ] } ], "source": [ "print(np.vstack((a, c)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Сумма всех элементов; суммы столбцов; суммы строк." ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "39\n", "[11 13 15]\n", "[15 24]\n" ] } ], "source": [ "print(b.sum())\n", "print(b.sum(axis=0))\n", "print(b.sum(axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Аналогично работают `prod`, `max`, `min` и т.д." ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9\n", "[7 8 9]\n", "[4 7]\n" ] } ], "source": [ "print(b.max())\n", "print(b.max(axis=0))\n", "print(b.min(axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "След - сумма диагональных элементов." ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.trace(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Многомерные массивы" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[ 0 1 2 3]\n", " [ 4 5 6 7]\n", " [ 8 9 10 11]]\n", "\n", " [[12 13 14 15]\n", " [16 17 18 19]\n", " [20 21 22 23]]]\n" ] } ], "source": [ "X = np.arange(24).reshape(2, 3, 4)\n", "print(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Суммирование (аналогично остальные операции)" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[12 14 16 18]\n", " [20 22 24 26]\n", " [28 30 32 34]]\n", "[ 66 210]\n" ] } ], "source": [ "# суммируем только по нулевой оси, то есть для фиксированных j и k суммируем только элементы с индексами (*, j, k)\n", "print(X.sum(axis=0))\n", "# суммируем сразу по двум осям, то есть для фиксированной i суммируем только элементы с индексами (i, *, *)\n", "print(X.sum(axis=(1, 2)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Линейная алгебра" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-2.0" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.det(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Обратная матрица." ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-1.5 0.5]\n", " [ 1. 0. ]]\n" ] } ], "source": [ "a1 = np.linalg.inv(a)\n", "print(a1)" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 0.]\n", " [ 0. 1.]]\n", "[[ 1. 0.]\n", " [ 0. 1.]]\n" ] } ], "source": [ "print(a @ a1)\n", "print(a1 @ a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Решение линейной системы $au=v$." ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.5 0. ]\n" ] } ], "source": [ "v = np.array([0, 1], dtype=np.float64)\n", "print(a1 @ v)" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.5 0. ]\n" ] } ], "source": [ "u = np.linalg.solve(a, v)\n", "print(u)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Проверим." ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.]\n" ] } ], "source": [ "print(a @ u - v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Собственные значения и собственные векторы: $a u_i = \\lambda_i u_i$. `l` - одномерный массив собственных значений $\\lambda_i$, столбцы матрицы $u$ - собственные векторы $u_i$." ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.56155281 3.56155281]\n" ] } ], "source": [ "l, u = np.linalg.eig(a)\n", "print(l)" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-0.87192821 -0.27032301]\n", " [ 0.48963374 -0.96276969]]\n" ] } ], "source": [ "print(u)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Проверим." ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.00000000e+00 1.66533454e-16]\n", "[ 0.00000000e+00 -4.44089210e-16]\n" ] } ], "source": [ "for i in range(2):\n", " print(a @ u[:, i] - l[i] * u[:, i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Функция `diag` от одномерного массива строит диагональную матрицу; от квадратной матрицы - возвращает одномерный массив её диагональных элементов." ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-0.56155281 0. ]\n", " [ 0. 3.56155281]]\n", "[-0.56155281 3.56155281]\n" ] } ], "source": [ "L = np.diag(l)\n", "print(L)\n", "print(np.diag(L))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Все уравнения $a u_i = \\lambda_i u_i$ можно собрать в одно матричное уравнение $a u = u \\Lambda$, где $\\Lambda$ - диагональная матрица с собственными значениями $\\lambda_i$ по диагонали." ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.00000000e+00 0.00000000e+00]\n", " [ 1.66533454e-16 -4.44089210e-16]]\n" ] } ], "source": [ "print(a @ u - u @ L)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Поэтому $u^{-1} a u = \\Lambda$." ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ -5.61552813e-01 2.77555756e-17]\n", " [ -2.22044605e-16 3.56155281e+00]]\n" ] } ], "source": [ "print(np.linalg.inv(u) @ a @ u)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Найдём теперь левые собственные векторы $v_i a = \\lambda_i v_i$ (собственные значения $\\lambda_i$ те же самые)." ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.56155281 3.56155281]\n", "[[-0.96276969 -0.48963374]\n", " [ 0.27032301 -0.87192821]]\n" ] } ], "source": [ "l, v = np.linalg.eig(a.T)\n", "print(l)\n", "print(v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Собственные векторы нормированы на 1." ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. -0.23570226]\n", " [-0.23570226 1. ]]\n", "[[ 1. 0.23570226]\n", " [ 0.23570226 1. ]]\n" ] } ], "source": [ "print(u.T @ u)\n", "print(v.T @ v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Левые и правые собственные векторы, соответствующие разным собственным значениям, ортогональны, потому что $v_i a u_j = \\lambda_i v_i u_j = \\lambda_j v_i u_j$." ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 9.71825316e-01 0.00000000e+00]\n", " [ -5.55111512e-17 9.71825316e-01]]\n" ] } ], "source": [ "print(v.T @ u)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Интегрирование" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [], "source": [ "from scipy.integrate import quad, odeint\n", "from scipy.special import erf" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [], "source": [ "def f(x):\n", " return np.exp(-x ** 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Адаптивное численное интегрирование (может быть до бесконечности). `err` - оценка ошибки." ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.886226925453 0.8862269254527579 7.101318390472462e-09\n" ] } ], "source": [ "res, err = quad(f, 0, np.inf)\n", "print(np.sqrt(np.pi) / 2, res, err)" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.746824132812 0.7468241328124271 8.291413475940725e-15\n" ] } ], "source": [ "res, err = quad(f, 0, 1)\n", "print(np.sqrt(np.pi) / 2 * erf(1), res, err)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Сохранение в файл и чтение из файла" ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x = np.arange(0, 25, 0.5).reshape((5, 10))\n", "\n", "# Сохраняем в файл example.txt данные x в формате с двумя точками после запятой и разделителем ';'\n", "np.savetxt('example.txt', x, fmt='%.2f', delimiter=';')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Получится такой файл" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.00;0.50;1.00;1.50;2.00;2.50;3.00;3.50;4.00;4.50\n", "5.00;5.50;6.00;6.50;7.00;7.50;8.00;8.50;9.00;9.50\n", "10.00;10.50;11.00;11.50;12.00;12.50;13.00;13.50;14.00;14.50\n", "15.00;15.50;16.00;16.50;17.00;17.50;18.00;18.50;19.00;19.50\n", "20.00;20.50;21.00;21.50;22.00;22.50;23.00;23.50;24.00;24.50\n" ] } ], "source": [ "! cat example.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Теперь его можно прочитать" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]\n", " [ 5. 5.5 6. 6.5 7. 7.5 8. 8.5 9. 9.5]\n", " [ 10. 10.5 11. 11.5 12. 12.5 13. 13.5 14. 14.5]\n", " [ 15. 15.5 16. 16.5 17. 17.5 18. 18.5 19. 19.5]\n", " [ 20. 20.5 21. 21.5 22. 22.5 23. 23.5 24. 24.5]]\n" ] } ], "source": [ "x = np.loadtxt('example.txt', delimiter=';')\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Производительность numpy\n", "\n", "Посмотрим на простой пример --- сумма первых $10^8$ чисел." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4999999950000000\n", "CPU times: user 26.1 s, sys: 0 ns, total: 26.1 s\n", "Wall time: 26.1 s\n" ] } ], "source": [ "%%time\n", "\n", "sum_value = 0\n", "for i in range(10 ** 8):\n", " sum_value += i\n", "print(sum_value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Немного улучшеный код" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4999999950000000\n", "CPU times: user 2.92 s, sys: 233 ms, total: 3.15 s\n", "Wall time: 3.14 s\n" ] } ], "source": [ "%%time\n", "\n", "sum_value = sum(range(10 ** 8))\n", "print(sum_value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Код с использованием функций библиотеки numpy" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4999999950000000\n", "CPU times: user 369 ms, sys: 444 ms, total: 813 ms\n", "Wall time: 846 ms\n" ] } ], "source": [ "%%time\n", "\n", "sum_value = np.arange(10 ** 8).sum()\n", "print(sum_value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Простой и понятный код работает в $30$ раз быстрее!\n", "\n", "Посмотрим на другой пример. Сгенерируем матрицу размера $500\\times1000$, и вычислим средний минимум по колонкам.\n", "\n", "Простой код, но при этом даже использующий некоторые питон-функции" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import scipy.stats as sps" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.004020326267427583\n", "CPU times: user 49 s, sys: 331 ms, total: 49.3 s\n", "Wall time: 49.3 s\n" ] } ], "source": [ "%%time\n", "\n", "N, M = 500, 1000\n", "matrix = []\n", "for i in range(N):\n", " matrix.append([sps.uniform.rvs() for j in range(M)])\n", "\n", "min_col = [min([matrix[i][j] for i in range(N)]) for j in range(M)]\n", "mean_min = sum(min_col) / N\n", "print(mean_min)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Понятный код с использованием функций библиотеки numpy" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0009785151404126257\n", "CPU times: user 35 ms, sys: 0 ns, total: 35 ms\n", "Wall time: 33.4 ms\n" ] } ], "source": [ "%%time\n", "\n", "N, M = 500, 1000\n", "matrix = sps.uniform.rvs(size=(N, M))\n", "mean_min = matrix.min(axis=1).mean()\n", "print(mean_min)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Простой и понятный код работает в 1500 раз быстрее!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Библиотека scipy (модуль scipy.stats)\n", "\n", "Нам пригодится только модуль `scipy.stats`.\n", "Полное описание http://docs.scipy.org/doc/scipy/reference/stats.html" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import scipy.stats as sps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Общий принцип:\n", "\n", "$X$ — некоторое распределение с параметрами `params`\n", "\n", "\n", "Кроме того для непрерывных распределений определены функции\n", "\n", "\n", "А для дискретных\n", "\n", "\n", "Параметры могут быть следующими:\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Для примера сгенерируем выборку размера $N = 200$ из распределения $\\mathscr{N}(1, 9)$ и посчитаем некоторые статистики.\n", "В терминах выше описанных функций у нас $X$ = `sps.norm`, а `params` = (`loc=1, scale=3`)." ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Первые 10 значений выборки:\n", " [ 3.28928372 0.82650155 1.79310223 3.96558151 2.41541782 3.10161325\n", " 2.58963169 1.2317635 4.28081739 -1.77051388]\n", "Выборочное среденее: 0.971\n", "Выборочная дисперсия: 7.847\n" ] } ], "source": [ "sample = sps.norm.rvs(size=200, loc=1, scale=3)\n", "print('Первые 10 значений выборки:\\n', sample[:10])\n", "print('Выборочное среденее: %.3f' % sample.mean())\n", "print('Выборочная дисперсия: %.3f' % sample.var())" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Плотность:\t\t [ 0.10648267 0.12579441 0.13298076 0.12579441 0.10648267]\n", "Функция распределения:\t [ 0.25249254 0.36944134 0.5 0.63055866 0.74750746]\n" ] } ], "source": [ "print('Плотность:\\t\\t', sps.norm.pdf([-1, 0, 1, 2, 3], loc=1, scale=3))\n", "print('Функция распределения:\\t', sps.norm.cdf([-1, 0, 1, 2, 3], loc=1, scale=3))" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Квантили: [-3.93456088 -2.8446547 1. 4.8446547 5.93456088]\n" ] } ], "source": [ "print('Квантили:', sps.norm.ppf([0.05, 0.1, 0.5, 0.9, 0.95], loc=1, scale=3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cгенерируем выборку размера $N = 200$ из распределения $Bin(10, 0.6)$ и посчитаем некоторые статистики.\n", "В терминах выше описанных функций у нас $X$ = `sps.binom`, а `params` = (`n=10, p=0.6`)." ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Первые 10 значений выборки:\n", " [5 7 6 7 3 4 8 7 5 6]\n", "Выборочное среденее: 6.065\n", "Выборочная дисперсия: 2.331\n" ] } ], "source": [ "sample = sps.binom.rvs(size=200, n=10, p=0.6)\n", "print('Первые 10 значений выборки:\\n', sample[:10])\n", "print('Выборочное среденее: %.3f' % sample.mean())\n", "print('Выборочная дисперсия: %.3f' % sample.var())" ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Дискретная плотность:\t [ 0.00000000e+00 1.04857600e-04 2.00658125e-01 0.00000000e+00\n", " 6.04661760e-03]\n", "Функция распределения:\t [ 0.00000000e+00 1.04857600e-04 3.66896742e-01 3.66896742e-01\n", " 1.00000000e+00]\n" ] } ], "source": [ "print('Дискретная плотность:\\t', sps.binom.pmf([-1, 0, 5, 5.5, 10], n=10, p=0.6))\n", "print('Функция распределения:\\t', sps.binom.cdf([-1, 0, 5, 5.5, 10], n=10, p=0.6))" ] }, { "cell_type": "code", "execution_count": 134, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Квантили: [ 3. 4. 6. 8. 8.]\n" ] } ], "source": [ "print('Квантили:', sps.binom.ppf([0.05, 0.1, 0.5, 0.9, 0.95], n=10, p=0.6))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Отдельно есть класс для многомерного нормального распределения.\n", "Для примера сгенерируем выборку размера $N=200$ из распределения $\\mathscr{N} \\left( \\begin{pmatrix} 1 \\\\ 1 \\end{pmatrix}, \\begin{pmatrix} 2 & 1 \\\\ 1 & 2 \\end{pmatrix} \\right)$." ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Первые 10 значений выборки:\n", " [[-1.9861816 -0.94358461]\n", " [ 1.93376109 0.34449948]\n", " [ 1.76689 3.25707287]\n", " [ 1.14967263 -0.71283847]\n", " [ 1.44368489 1.27636574]\n", " [ 1.48994732 2.03350446]\n", " [ 2.02426618 1.21057156]\n", " [ 1.67851671 2.30199687]\n", " [ 1.90705893 2.1001483 ]\n", " [ 2.96734234 2.58021913]]\n", "Выборочное среденее: [ 1.14018367 0.98307564]\n", "Выборочная матрица ковариаций:\n", " [[ 2.10650447 0.94076559]\n", " [ 0.94076559 1.87049463]]\n" ] } ], "source": [ "sample = sps.multivariate_normal.rvs(mean=[1, 1], cov=[[2, 1], [1, 2]], size=200)\n", "print('Первые 10 значений выборки:\\n', sample[:10])\n", "print('Выборочное среденее:', sample.mean(axis=0))\n", "print('Выборочная матрица ковариаций:\\n', np.cov(sample.T))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Некоторая хитрость :)" ] }, { "cell_type": "code", "execution_count": 136, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.25874425 0.97813837 2.04639019 3.0187115 4.05480661 4.94792113\n", " 6.01970204 7.00142419 7.9675934 8.88900013]\n" ] } ], "source": [ "sample = sps.norm.rvs(size=10, loc=np.arange(10), scale=0.1)\n", "print(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Бывает так, что надо сгенерировать выборку из распределения, которого нет в `scipy.stats`.\n", "Для этого надо создать класс, который будет наследоваться от класса `rv_continuous` для непрерывных случайных величин и от класса `rv_discrete` для дискретных случайных величин.\n", "Пример есть на странице http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Для примера сгенерируем выборку из распределения с плотностью $f(x) = \\frac{4}{15} x^3 I\\{x \\in [1, 2] = [a, b]\\}$." ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Первые 10 значений выборки:\n", " [ 1.8838009 1.80617825 1.09789444 1.65771829 1.72582776 1.57311372\n", " 1.7174875 1.99153808 1.90110246 1.69306301]\n", "Выборочное среденее: 1.652\n", "Выборочная дисперсия: 0.064\n" ] } ], "source": [ "class cubic_gen(sps.rv_continuous):\n", " def _pdf(self, x):\n", " return 4 * x ** 3 / 15\n", "cubic = cubic_gen(a=1, b=2, name='cubic')\n", "\n", "sample = cubic.rvs(size=200)\n", "print('Первые 10 значений выборки:\\n', sample[:10])\n", "print('Выборочное среденее: %.3f' % sample.mean())\n", "print('Выборочная дисперсия: %.3f' % sample.var())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если дискретная случайная величина может принимать небольшое число значений, то можно не создавать новый класс, как показано выше, а явно указать эти значения и из вероятности." ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Первые 10 значений выборки:\n", " [3 1 1 3 3 1 3 1 1 1]\n", "Выборочное среденее: 1.725\n", "Частота значений по выборке: 0.575 0.125 0.3\n" ] } ], "source": [ "some_distribution = sps.rv_discrete(name='some_distribution', values=([1, 2, 3], [0.6, 0.1, 0.3]))\n", "\n", "sample = some_distribution.rvs(size=200)\n", "print('Первые 10 значений выборки:\\n', sample[:10])\n", "print('Выборочное среденее: %.3f' % sample.mean())\n", "print('Частота значений по выборке:', (sample == 1).mean(), (sample == 2).mean(), (sample == 3).mean())" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }