移动通信大数据分析——数据挖掘与机器学习实战
全彩印刷,4G/5G无线技术、机器学习和数据挖掘的新研究和新应用。亚信科技董事长田溯宁博士,AT&T(美国电话电报)公司首席科学家大卫·贝兰格博士联袂推荐。
内容简介
本书以4G/5G无线技术、机器学习和数据挖掘的新研究和新应用为基础,对分析方法和案例进行研究;从工程和社会科学的角度,提高读者对行业的洞察力,提升运营商的运营效益。本书利用机器学习和数据挖掘技术,研究移动网络中传统方法无法解决的问题,包括将数据科学与移动网络技术进行完美结合的方法、解决方案和算法。
本书可以作为研究生、本科生、科研人员、移动网络工程师、业务分析师、算法分析师、软件开发工程师等的参考书,具有很强的实践指导意义,是不可多得的专业著作。
作者简介
第一作者简介
欧阳晔 博士
亚信科技首席技术官、高级副总裁
欧阳晔博士目前全面负责亚信科技的技术与产品的研究、开发与创新工作。加入亚信科技之前,欧阳晔博士曾任职于美国第一大移动通信运营商威瑞森电信(Verizon)集团,担任通信人工智能系统部经理,是威瑞森电信的Fellow。欧阳晔博士在移动通信领域拥有丰富的研发与大型团队管理经验,工作中承担过科学家、研究员、研发经理、大型研发团队负责人等多个角色。欧阳晔博士专注于移动通信、数据科学与人工智能领域跨学科研究,致力于5G网络智能化、BSS/OSS融合、通信人工智能、网络切片、MEC、网络体验感知、网络智能优化、5G行业赋能、云网融合等领域的研发创新与商业化。
目录
第1章概述
1.1 电信业大数据分析 ···························1
1.2 电信大数据分析的驱动力 ················2
1.3 大数据分析对电信产业价值链的
益处 ··················································3
1.4 电信大数据的实现范围····················4
1.4.1 网络分析 ···················································5
1.4.2 用户与市场分析 ·······································8
1.4.3 创新的商业模式 ·······································91.5 本书概要 ··········································9
参考文献 ·················································10
第2章电信分析方法论
2.1 回归方法 ········································12
2.1.1 线性回归 ··················································13
2.1.2 非线性回归 ··············································15
2.1.3 特征选择 ··················································16
2.2 分类方法 ········································18
2.2.1 逻辑回归 ··················································18
2.2.2 其他分类方法 ··········································19
2.3 聚类方法 ········································20
2.3.1 K均值聚类 ··············································21
2.3.2 高斯混合模型 ··········································23
2.3.3 其他聚类方法 ··········································24
2.3.4 聚类方法在电信数据中的应用 ·················25
2.4 预测方法 ········································25
2.4.1 时间序列分解 ··········································26
2.4.2 指数平滑模型 ··········································27
2.4.3 ARIMA模型 ············································28
2.5 神经网络和深度学习 ·····················29
2.5.1 神经网络 ··················································29
2.5.2 深度学习 ··················································31
2.6 强化学习 ········································32
2.6.1 模型和策略 ··············································33
2.6.2 强化学习算法 ··········································33
参考文献 ·················································34
XII
XII
第3章 LTE网络性能趋势分析
3.1 网络性能预测策略 ·························39
3.1.1 直接预测策略 ··········································39
3.1.2 分析模型 ··················································39
3.2 网络资源与性能指标之间的关系 ···40
3.2.1 LTE网络KPI与资源之间的关系 ···········40
3.2.2 回归模型 ··················································41
3.3 网络资源预测 ·································43
3.3.1 LTE网络流量与资源预测模型 ···············43
3.3.2 预测网络资源 ··········································43
3.4 评估RRC连接建立的应用 ············46
3.4.1 数据准备与特征选取 ······························46
3.4.2 LTE KPI与网络资源之间的关系推导 ····47
3.4.3 预测RRC连接建立成功率 ·····················49
参考文献 ·················································50
第4章热门设备就绪和返修率分析
4.1 设备返修率与设备就绪的预测
策略 ················································53
4.2 设备返修率和就绪预测模型 ··········54
4.2.1 预测模型的移动通信服务 ························54
4.2.2 参数获取与存储 ······································55
4.2.3 分析引擎 ··················································56
4.3 实现和结果 ·····································58
4.3.1 设备返修率预测 ······································58
4.3.2 设备就绪预测 ··········································62
第5章 VoLTE语音质量评估
5.1 应用POLQA评估语音质量··········68
5.1.1 POLQA标准···········································68
5.1.2 语音质量评价中的可扩展性和
可诊断性 ··················································69
5.2 CrowdMi方法论 ····························69
5.2.1 基于RF特征的分类 ·······························70
5.2.2 网络指标选择与聚类 ······························70
5.2.3 网络指标与POLQA评分之间的关系····70
5.2.4 模型测试 ··················································70
5.3 CrowdMi中的技术细节 ·················71
5.3.1 记录分类 ··················································71
5.3.2 网络指标的选择 ······································71
5.3.3 聚类 ·························································72
5.3.4 回归 ·························································73
5.4 CrowdMi原型设计与试验 ·············74
5.4.1 客户端和服务器架构 ······························74
5.4.2 测试和结果 ··············································76
参考文献 ·················································78
目 录XIII
目 录XIII
第6章移动APP无线资源使用分析
6.1 起因和系统概述 ·····························80
6.1.1 背景和挑战 ··············································80
6.1.2 移动资源管理 ··········································81
6.1.3 系统概述 ··················································82
6.2 AppWiR众包工具 ··························83
6.3 AppWiR挖掘算法 ··························84
6.3.1 网络指标的选择 ······································84
6.3.2 LOESS方法 ············································87
6.3.3 基于时间序列的网络资源使用预测 ·······87
6.4 实现和试验 ·····································88
6.4.1 数据收集与研究 ······································88
6.4.2 结果和准确度 ··········································89
参考文献 ·················································91
第7章电信数据的异常检测
7.1 模型 ················································93
7.1.1 高斯模型 ··················································94
7.1.2 时间依赖的高斯模型 ······························94
7.1.3 高斯混合模型(GMM)·························95
7.1.4 时间依赖的高斯混合模型 ·······················95
7.1.5 高斯概率潜在语义模型(GPLSA)·······95
7.2 模型对比 ········································97
7.2.1 样本定义 ··················································97
7.2.2 异常识别 ··················································98
7.2.3 时间依赖GMM与GPLSA的对比 ·········997.3 仿真与讨论 ···································100
参考文献 ···············································103
第8章基于大数据分析的LTE网络自优化
8.1 SON(自组织网络)···················105
8.2 APP-SON ······································107
8.3 APP-SON架构 ·····························108
8.4 APP-SON算法 ·····························110
8.4.1 匈牙利算法辅助聚类(HAAC)··········111
8.4.2 单位回归辅助聚类数的确定 ·················114
8.4.3 基于DNN的回归·································114
8.4.4 每个小区在时序空间的标签组合 ·········116
8.4.5 基于相似性的参数调整 ·························1168.5 仿真与讨论 ···································117
参考文献 ···············································122
第9章电信数据和市场营销
9.2.1 数据采集和数据类型 ····························130
9.1 电信营销专题 ·······························127
9.2.2 网络的提取和管理 ································131
9.2 社交网络的总体构建 ···················130
9.3 网络结构的度量 ···························133
参考文献 ···············································135
9.4 网络中的消费者行为建模 ············134
第10章传染式客户流失
10.1 问题引入 ·····································138
10.1.1 流失率问题 ··········································138
10.1.2 社交学习和网络效应 ··························139
10.2 网络数据的处理 ·························141
10.3 动态模型 ·····································143
10.3.1 模型介绍 ··············································143
10.3.2 模型的定义 ··········································144
10.3.3 自身经验建模、社交学习和
社交网络效应 ······································146
10.3.4 模型估计 ··············································148
10.4 结果 ············································149
参考文献 ···············································151
第11章基于社交网络的精准营销
11.1 网络效应的渠道 ·························158
11.2 社交网络数据处理 ·····················159
11.3 建模策略问题 ·····························160
11.3.1 线性空间自回归模式 ···························160
11.3.2 社交网络交互模型 ······························162
11.3.3 内生同伴效应 ······································162
11.4 发现与应用 ·································164
11.4.1 结果的解释 ··········································164
11.4.2 基于社交网络的精准营销 ···················165
参考文献 ···············································168
第12章社交影响和动态社交网络结构
12.1 动态模型 ·····································17712.1.1 连续时间马尔可夫模型假设 ···············17712.1.2 模型估计与识别 ··································17912.1.3 网络结构对社交影响的多元分析 ·······18012.2 研究发现总结 ·····························18112.2.1 随机行动者动态网络模型的
估计结果··············································182
12.2.2 元回归分析结果 ··································184
12.2.3 策略模拟 ··············································18812.3 结论 ············································193
参考文献 ···············································194
前言/序言
推 序 一
第五代移动通信(The Fifth-Generation,5G)与人工智能(Artificial Intelligence,
AI)作为21世纪最新的一组通用目的技术(General Purpose Technology,GPT),与
19世纪、20世纪以电力、内燃机、计算机和互联网为主的GPT一样,将极大地促进人
类社会从工业化、信息化到数字化的变革发展。全球通信运营商们,从3G时代开始逐
渐探索自动化与智能化的技术在通信网络与业务生产系统中的应用。结合大数据的发展,
通信生态系统中网络与业务的特征数据得以细粒度地被记录、存留在数据仓库或者数据
湖中。那么对这些数据进行有效、准确的分析,形成主动性与预测性的决策,促进通信
网络与业务运营效率的提升,成为全球通信运营商们数字化转型中一个重要的课题。
在通信运营商生态系统中利用海量数据做自动化与智能化分析,有两条主线在平行
发展。在网络领域,我们称之为网络智能化(Network Intelligence),即在网络基础设
施或应用管理系统中利用统计学、数据科学、人工智能等技术,在网络的规划、建设、
优化、运维的全生命周期中构建敏捷、自动化与智能化的决策与运行机制。网络智能化
的决策与运行机制通常由智能化的信息系统来承载实现。这一智能化新系统既可以作为
网络基础设施的一部分与网络设施融合存在,也可以作为独立的智能化网络信息系统存
在,与网络基础设施通过一套标准化的互联互通规则对网络设施本身进行智能化管理和
运行。在业务领域,我们称之为商业智能(Business Intelligence),即在业务支撑系统
(Business Supporting System,BSS)中利用统计学、数据科学、人工智能等技术,在
业务的运维与运营的全生命周期中构建敏捷、自动化与智能化的决策与运行机制。智能
化的决策机制被注入和融入业务支撑体系的各种生产与运行系统中,例如客户关系管理
(Customer Relationship Management,CRM)、计费系统(Billing System)、经营分析
系统等。
本书作者在移动通信领域拥有丰富的技术管理经验,亲身经历、领导并实践了过去
10年中通信领域的数据科学在美国通信运营商蓬勃发展的历程。本书的内容以数据科学
和移动通信网络理论为基础,应用于运营商真实的业务场景,将通信大数据与机器学习
算法技术深入地应用于通信运营商网络领域与业务领域的各种实际案例中。书中的每一
个通信场景案例都用实证分析和量化数据分析的形式呈现,作者将通信网络与业务领域
的知识与机器学习算法相结合,演绎并推导出量化可执行的决策,为运营商探索数字化
时代以数据驱动网络与业务运营提供了很多宝贵的经验总结。
作为一本在通信大数据领域中技术结合案例分析,并立足于实践的图书,它既适合
广大通信、信息、计算机领域的研究生和运营商与通信业软硬件企业的研发人员学习参
考,也适合对移动通信、数据科学、人工智能技术感兴趣的读者阅读。
田溯宁博士
亚信科技董事长
2020年11月于北京