大数据工具之Superset

大数据工具之Superset

概述

Apache Superset是一个开源的、现代的、轻量级BI分析工具,能够对接多种数据源、拥有丰富的图标展示形式、支持自定义仪表盘,且拥有友好的用户界面,十分易用。

由于Superset能够对接常用的大数据分析工具,如Trino、Hive、Kylin、Druid等,且支持自定义仪表盘,故可作为数仓的可视化工具,应用于数据仓库的ADS!

在这里插入图片描述

官网:https://superset.apache.org/

安装须知

  • Superset 没有对 Windows 的官方支持(这个基本上是废话,谁用Windows做服务器)

  • Superset是由Python语言编写的Web应用,要求Python3.6+ 的环境

  • Superset建议为虚拟机分配至少 8GB 的 RAM,并配置至少 40GB 的硬盘驱动器,以便为操作系统和所有必需的依赖项提供足够的空间

Python环境

安装更新依赖环境

#1、安装相关依赖
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

#2.安装更新gcc:
yum install gcc

#3.Python3.7版本之后需要安装libffi-devel
yum install libffi-devel -y

下载安装Python

因为我们很多情况下因为财力所制,同一开发服务器会安装多个不同版本的Python以应对不同的”客户“,所以建议安装Miniconda,对不同python版本进行切换,而且Superset官方也强烈建议在虚拟环境中安装 Superset!

安装Conda

Miniconda3-latest-Linux-x86_64.sh

#1、执行以下命令,安装 Miniconda,并按照提示进行操作
bash Miniconda3-latest-Linux-x86_64.sh
#2、一直按回车按着别松,出现是否接受协议,输入 yes
Please answer 'yes' or 'no':'
>>> yes
#3、出现确定安装路径,默认是在安装shell脚本目录下
[/root/miniconda3] >>> /opt/module/miniconda3
#4、出现是否进行conda的初始化,输入 yes
Do you wish the installer to initialize Miniconda3
by running conda init? [yes|no]
[no] >>> yes
#5、看到如下表示安装成功
==> For changes to take effect, close and re-open your current shell. <==

If you'd prefer that conda's base environment not be activated on startup,
   set the auto_activate_base parameter to false:

conda config --set auto_activate_base false

Thank you for installing Miniconda3!

#6、取消激活base环境:Miniconda安装完成后每次打开终端都会激活其默认的base环境,我们可通过以下命令,禁止激活默认base环境。
[root@paratera128 ~]# conda config --set auto_activate_base false

#7、配置conda国内镜像,多配几个
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
conda config --set show_channel_urls yes
Python环境配置

conda安装python特别简单,superset最新版本最好选python3.7,python3.8

#1、Python版本指定安装
conda create --name superset python=3.7
#2、激活superset环境,进入conda python3.7环境进行操作,不影响主机的py环境
conda activate superset
#3、退出当前环境
conda deactivatecon
#4、删除虚拟环境
conda env remove -n superset

部署Superset(Docker)

安装启动

#通过git下载superset包,官网提供了Docker-Compose傻瓜式安装方式(分开发配置和生产配置)
[root@paratera128 opt]# git clone https://github.com/apache/superset.git

# 进入项目目录
[root@paratera128 opt]# cd superset

#这种安装方式跟Docker-Compose版本,Docker引擎版本关联非常大,我本地Docker-Compose和Docker版本如下,官网下载的docker-compose.yml文件version需要改成3.6及以下,版本对应关系可以百度:docker与docker-compose版本对应关系
[root@paratera128 ~]# docker --version
Docker version 18.03.1-ce, build 9ee9f40
(superset) [root@paratera128 ~]# docker-compose --version
docker-compose version 1.26.2, build eefe0d31

#启动脚本赋权
[root@paratera128 superset]# chmod 777 docker
[root@paratera128 superset]# cd docker/
[root@paratera128 docker]# ls
docker-bootstrap.sh  docker-ci.sh  docker-frontend.sh  docker-init.sh  frontend-mem-nag.sh  pythonpath_dev  README.md  run-server.sh
[root@paratera128 docker]# chmod 777 *

#拉取镜像、启动实例(可以一步到位)
[root@paratera128 superset]# docker-compose -f docker-compose-non-dev.yml pull
[root@paratera128 superset]# docker-compose -f docker-compose-non-dev.yml up -d
superset_cache is up-to-date
superset_db is up-to-date
Starting superset_worker_beat ... done
Starting superset_app         ... done
Starting superset_worker      ... done
Starting superset_init        ... done

#创建管理用户
[root@paratera128 superset]# docker exec -it superset_app flask fab create-admin
Username [admin]: admin
User first name [admin]: admin
User last name [user]: admin
Email [admin@fab.org]: admin
Password:
Repeat for confirmation:
Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py]
logging was configured successfully
2022-07-26 04:10:42,285:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-26 04:10:42,293:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
  warnings.warn(
Recognized Database Authentications.
Error! User already exists admin

#初始化数据库
[root@paratera128 superset]# docker exec -it superset_app superset db upgrade
Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py]
logging was configured successfully
2022-07-26 04:11:58,693:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-26 04:11:58,700:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
  warnings.warn(
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.

#superset初始化
[root@paratera128 superset]# docker exec -it superset_app superset init
Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py]
logging was configured successfully
2022-07-26 04:12:47,375:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-26 04:12:47,382:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
  warnings.warn(
Syncing role definition
2022-07-26 04:12:50,958:INFO:superset.security.manager:Syncing role definition
Syncing Admin perms
2022-07-26 04:12:50,980:INFO:superset.security.manager:Syncing Admin perms
Syncing Alpha perms
2022-07-26 04:12:51,220:INFO:superset.security.manager:Syncing Alpha perms
Syncing Gamma perms
2022-07-26 04:12:51,391:INFO:superset.security.manager:Syncing Gamma perms
Syncing granter perms
2022-07-26 04:12:51,554:INFO:superset.security.manager:Syncing granter perms
Syncing sql_lab perms
2022-07-26 04:12:51,705:INFO:superset.security.manager:Syncing sql_lab perms
Fetching a set of all perms to lookup which ones are missing
2022-07-26 04:12:51,874:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing
Creating missing datasource permissions.
2022-07-26 04:12:52,034:INFO:superset.security.manager:Creating missing datasource permissions.
Creating missing database permissions.
2022-07-26 04:12:52,044:INFO:superset.security.manager:Creating missing database permissions.
Cleaning faulty perms
2022-07-26 04:12:52,056:INFO:superset.security.manager:Cleaning faulty perms

#下载样例数据(可选)
[root@paratera128 yum]# docker exec -it superset_app superset load_examples

###DockerCompose 配置

#docker-compose 版本、用户、挂在卷变量
x-superset-image: &superset-image apache/superset:latest
x-superset-user: &superset-user root
x-superset-depends-on: &superset-depends-on
  - db
  - redis
x-superset-volumes: &superset-volumes
  # /app/pythonpath_docker will be appended to the PYTHONPATH in the final container
  - ./docker:/app/docker
  - ./superset:/app/superset
  - ./superset-frontend:/app/superset-frontend
  - superset_home:/app/superset_home
  - ./tests:/app/tests

version: "3.6"
services:
#Superset Flask-Caching缓存,其实就是缓存用户用过的一些操作,如:仪表板过滤器状态,探索图表表格数据
  redis:
    image: redis:latest
    container_name: superset_cache
    restart: unless-stopped
    ports:
      - "127.0.0.1:6379:6379"
    volumes:
      - redis:/data
#PostgreSQL数据库,可选
  db:
    env_file: docker/.env
    image: postgres:14
    container_name: superset_db
    restart: unless-stopped
    ports:
      - "127.0.0.1:5432:5432"
    volumes:
      - db_home:/var/lib/postgresql/data
#superset server启动实例
  superset:
    env_file: docker/.env
    image: *superset-image
    container_name: superset_app
    command: ["/app/docker/docker-bootstrap.sh", "app"]
    restart: unless-stopped
    ports:
      - 8088:8088
    user: *superset-user
    depends_on: *superset-depends-on
    volumes: *superset-volumes
    environment:
      CYPRESS_CONFIG: "${CYPRESS_CONFIG}"

volumes:
  superset_home:
    external: false
  db_home:
    external: false
  redis:
    external: false


部署Superset(pip虚拟)

安装启动

#激活superset环境
[root@paratera128 ~]# conda activate superset
(superset) [root@paratera128 ~]#
#安装依赖
yum install -y python-setuptools
yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel

#安装(更新)setuptools 和 pip
pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/

#安装superset
pip install apache-superset -i https://pypi.douban.com/simple/
#指定版本安装
pip install apache-superset –v apache-superset==1.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/

看到如下信息表示安装成功了,WARNING信息忽略,只是提示你使用root账号可能造成权限过大,生产环境不会有这个提示

在这里插入图片描述

初始化管理员

(superset) [root@paratera128 ~]# export FLASK_APP=superset
(superset) [root@paratera128 ~]# flask fab create-admin
Username [admin]: admin
User first name [admin]: admin
User last name [user]: admin
Email [admin@fab.org]: admin
Password:
Repeat for confirmation:
logging was configured successfully
2022-07-25 18:23:46,139:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-25 18:23:46,156:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/flask_caching/__init__.py:120: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
  "Flask-Caching: CACHE_TYPE is set to null, "
Recognized Database Authentications.
Admin User admin created.

初始化数据库

Superset说到底其实就是一个Web应用程序,自带数据库,需要初始化

#更新dataclasses,初始化 superset 数据库
pip install dataclasses
superset db upgrade

若提示:UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
找到python3.7/site-packages/superset/config.py打开编辑:

搜索:“CACHE_TYPE”,全部改成"simple"

基础数据初始化

(superset) [root@paratera128 local]# superset init
logging was configured successfully
2022-07-25 02:24:19,136:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-25 02:24:19,148:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/flask_caching/__init__.py:120: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
  "Flask-Caching: CACHE_TYPE is set to null, "
Syncing role definition
2022-07-25 02:24:27,821:INFO:superset.security.manager:Syncing role definition
Syncing Admin perms
2022-07-25 02:24:27,920:INFO:superset.security.manager:Syncing Admin perms
Syncing Alpha perms
2022-07-25 02:24:28,026:INFO:superset.security.manager:Syncing Alpha perms
Syncing Gamma perms
2022-07-25 02:24:28,410:INFO:superset.security.manager:Syncing Gamma perms
Syncing granter perms
2022-07-25 02:24:28,741:INFO:superset.security.manager:Syncing granter perms
Syncing sql_lab perms
2022-07-25 02:24:29,045:INFO:superset.security.manager:Syncing sql_lab perms
Fetching a set of all perms to lookup which ones are missing
2022-07-25 02:24:29,687:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing
Creating missing datasource permissions.
2022-07-25 02:24:29,769:INFO:superset.security.manager:Creating missing datasource permissions.
Creating missing database permissions.
2022-07-25 02:24:29,776:INFO:superset.security.manager:Creating missing database permissions.
Cleaning faulty perms
2022-07-25 02:24:29,780:INFO:superset.security.manager:Cleaning faulty perms

服务启动

#通过命令模式启动,并设置五个worker节点进程,统一注册到192.168.137.128:8080
(superset) [root@paratera128 local]# gunicorn --workers 5 --timeout 120 --bind 192.168.137.128:8080 "superset.app:create_app()" –daemon
[2022-07-25 02:28:47 -0700] [104753] [INFO] Starting gunicorn 20.0.4
[2022-07-25 02:28:47 -0700] [104753] [INFO] Listening at: http://192.168.137.128:8080 (104753)
[2022-07-25 02:28:47 -0700] [104753] [INFO] Using worker: sync
[2022-07-25 02:28:47 -0700] [104756] [INFO] Booting worker with pid: 104756
[2022-07-25 02:28:47 -0700] [104757] [INFO] Booting worker with pid: 104757
[2022-07-25 02:28:47 -0700] [104758] [INFO] Booting worker with pid: 104758
[2022-07-25 02:28:47 -0700] [104759] [INFO] Booting worker with pid: 104759
[2022-07-25 02:28:47 -0700] [104760] [INFO] Booting worker with pid: 104760
logging was configured successfully

问题解决

补充依赖如下:

pip install flask -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install wtforms_json -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_appbuilder -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_compress -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install celery -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_migrate -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_talisman -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_caching -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install sqlparse -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install bleach -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install markdown -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install numpy -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pandas -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install parsedatetime -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pathlib2 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install simplejson -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install humanize -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install python-geohash -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install polyline -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install geopy -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install sqlalchemy -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install sqlalchemy-utils -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install cryptography -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install backoff -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install msgpack -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pyarrow -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install contextlib2 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install croniter -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install retry -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install isodate -i https://pypi.tuna.tsinghua.edu.cn/simple
#这个地方markupsafe 2.1.1版本会报错,用低版本的2.0.1覆盖掉
(superset) [root@paratera128 superset]# pip show markupsafe
Name: MarkupSafe
Version: 2.1.1
Summary: Safely add untrusted strings to HTML/XML markup.
Home-page: https://palletsprojects.com/p/markupsafe/
Author: Armin Ronacher
Author-email: armin.ronacher@active-4.com
License: BSD-3-Clause
Location: /opt/module/miniconda3/envs/superset/lib/python3.7/site-packages
Requires:
Required-by: Jinja2, Mako, WTForms
(superset) [root@paratera128 superset]# python -m pip install markupsafe==2.0.1

报错:No PIL installation found 解决

(superset) [root@paratera128 local]# pip install pillow -i https://pypi.tuna.tsinghua.edu.cn/simple

(superset) [root@paratera128 local]# superset version
logging was configured successfully
2022-07-25 02:20:07,976:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-25 02:20:07,983:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Superset 1.3.0
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

到这里Superset conda虚拟环境模式安装完成

访问Superset

地址:http://ip:8088

账号密码:admin/admin

连接数据库

在这里插入图片描述

MySQL

在这里插入图片描述

Trino

连接Trino需要安装相关驱动:https://superset.apache.org/docs/databases/installing-database-drivers/

需要先安装pip,并且版本需求比较高,安装后需要更新

[root@paratera128 yum]# yum -y install epel-release
[root@paratera128 yum]# yum -y install python-pip
[root@paratera128 yum]# wget https://bootstrap.pypa.io/pip/2.7/get-pip.py
[root@paratera128 yum]# python3 get-pip.py

#下载驱动
[root@paratera128 yum]# pip install sqlalchemy-trino

#如果是docker部署的superset,还需要把驱动加载到docker容器
[root@paratera128 superset] touch ./docker/requirements-local.txt
[root@paratera128 superset] echo "sqlalchemy-trino" >> ./docker/requirements-local.txt
[root@paratera128 superset] docker-compose -f docker-compose-non-dev.yml build --force-rm
[root@paratera128 superset] docker-compose -f docker-compose-non-dev.yml up

在这里插入图片描述

报表设计

最普通的Table

看图说话

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

柱状图

需求:统计一个月内每天的新老用户数

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

饼图

统计各个频段数据占比

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

面板

我们可以看到以上创建的Chart组件已经保存到同一个面板了

在这里插入图片描述

把Chart拖拽进来即可

在这里插入图片描述

API二次开发

参考文档:https://superset.apache.org/docs/api

比如我们想查询上面创建的四个Charts集合,可以使用这个接口

在这里插入图片描述

不带参数的话就默认输出所有列,所有数据

在这里插入图片描述

  • 13
    点赞
  • 86
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值