使用之前x下载的iris数据集CSV文件.
该段代码以下所有示例都会使用到
# coding: utf-8
from blaze import data
from blaze.utils import example
iris = data(example('iris.csv'))
print(iris.peek())输出:
    sepal_length  sepal_width  petal_length  petal_width      species
0            5.1          3.5           1.4          0.2  Iris-setosa
1            4.9          3.0           1.4          0.2  Iris-setosa
2            4.7          3.2           1.3          0.2  Iris-setosa
3            4.6          3.1           1.5          0.2  Iris-setosa
...1. 获取列数据
两种方式获取单独列的数据:
# 第一种
print(iris.species.peek())
# 第二种
print(iris['species'].peek())输出:
        species
0   Iris-setosa
1   Iris-setosa
2   Iris-setosa
3   Iris-setosa
4   Iris-setosa
5   Iris-setosa
...通过名称列表选择多个列数据
print(iris[['sepal_length', 'species']].peek())输出:
    sepal_length      species
0            5.1  Iris-setosa
1            4.9  Iris-setosa
2            4.7  Iris-setosa
3            4.6  Iris-setosa
4            5.0  Iris-setosa
5            5.4  Iris-setosa
...2. 数学操作
使用数学的操作和功能
from blaze import log
print(log(iris.sepal_length * 10).peek())数学功能像log应该从blaze导入,基于后端它会被转化为np.log、math.log、sqlalchemy.sql.func.log等.
3. Reductions
与许多blaze操作一样,像mean和sum可以作为函数或基本功能使用.
print(iris.sepal_length.mean().peek())
# output: 5.843333333333334
print(mean(iris.sepal_length).peek())
# output: 5.8433333333333344. Split-Apply-Combine
by操作是split-apply-combine计算,常见格式如下:
by(table.grouping_columns, name_1=table.column.reduction(), name_2=table.column.reduction(), ...)  例如:根据species查找最短、最长和p平均的petal长度.
print(by(iris.species, shortest=iris.petal_length.min(), longest=iris.petal_length.max(),
         average=iris.petal_length.mean()).peek())输出:
           species  average  longest  shortest
0      Iris-setosa    1.462      1.9       1.0
1  Iris-versicolor    4.260      5.1       3.0
2   Iris-virginica    5.552      6.9       4.55. 添加新列
使用transform方法添加新列.
from blaze import transform
print(transform(iris, sepal_ratio = iris.sepal_length / iris.sepal_width,
                petal_ratio = iris.petal_length / iris.petal_width).peek())输出:
    sepal_length  sepal_width  petal_length  petal_width      species  petal_ratio  sepal_ratio
0            5.1          3.5           1.4          0.2  Iris-setosa     7.000000     1.457143
1            4.9          3.0           1.4          0.2  Iris-setosa     7.000000     1.633333
2            4.7          3.2           1.3          0.2  Iris-setosa     6.500000     1.468750
...6. 文本匹配
print(iris[iris.species.like('*versicolor')].peek())输出:
    sepal_length  sepal_width  petal_length  petal_width          species
50           7.0          3.2           4.7          1.4  Iris-versicolor
51           6.4          3.2           4.5          1.5  Iris-versicolor
...7. 重命名列名
print(iris.relabel(petal_length='PETAL-LENGTH', petal_width='PETAL-WIDTH').peek())输出:
    sepal_length  sepal_width  PETAL-LENGTH  PETAL-WIDTH      species
0            5.1          3.5           1.4          0.2  Iris-setosa
1            4.9          3.0           1.4          0.2  Iris-setosa
...8. 例子
Blaze可以解决许多数据分析和科学计算中存在的问题,例子如下:
- Combining separate, gzipped csv files - # coding: utf-8 from blaze import odo from blaze.utils import example from pandas import DataFrame print(odo(example('accounts_*.csv.gz'), DataFrame))- 输出: - id name amount 0 1 Alice 100 1 2 Bob 200 2 3 Charlie 300 3 4 Dan 400 4 5 Edith 500
- Split-Apply-Combine - from blaze import by t = data('sqlite:///%s::iris' % example('iris.db')) print(by(t.species, max=t.petal_length.max(), min=t.petal_length.min()).peek())- 输出: - species max min 0 Iris-setosa 1.9 1.0 1 Iris-versicolor 5.1 3.0 2 Iris-virginica 6.9 4.5
 
                        
                        