根据 Venturebeat 的一份数字对比:
Investors dropped $681 million into A.I.-centric startups in Silicon Valley last year.This year, the number will likely reach $1.2 billion. Five years ago, total A.I. investment spiked at roughly $150 million.
但仔细去看,这些资金基本都流向了机器学习领域:
The truth is — artificial intelligence does not exist yet and every single company pretending to have one is in most cases arrogantly re-selling an old concept of machine learning – a technology first introduced in 1959 that truly started to take off in the 90s. Cloud technology, big data and amazing search algorithms finally became the fuel for this rocket. Systems and services could self-improve thanks to insane amount of statistical data pouring their way. But this has nothing to do with A.I.
这些资金和媒体关注度也让机器学习(Machine Learning)似乎成为新的创业热点,但真实情况则是,机器学习对于人才、数据、计算能力的需求非常高。这几年,计算能力的瓶颈已经在云计算的帮助下逐渐变小,研究者或企业完全可以购买基于云端的计算能力,但人才和数据方面的难题还亟待解决。尤其是数据。
ML@B (Machine Learning @ Berkeley) 的两位创始人 Ted Xiao 和 Gautham Kesineni 介绍了当下数据如何制约机器学习:
随着机器学习受到的关注越来越多,其社区支持也不断壮大。网络上有了大量公开数据,种类也极其丰富;许多大公司也秉着开源精神,释出了许多优质数据库(比如 YouTube 8M)。关于一些具体的研究问题,甚至有标准数据库(如 The MNIST Database)用于研究或测试算法性能。
对于科研人员和学生而言,当下可以称得上是机器学习研究的黄金时代。然而对于真正的企业而言,获取一个具体问题的大量数据仍然是算出好的模型的关键。Gautham 告诉我,就目前而言,数据造成的企业差距还没有那么明显;但是有一天,数据也终将成为一种壁垒;而这一天正在慢慢接近。
区分 AI、DL 与 ML
深度学习作为机器学习的一种算法,目前已经和「人工智能」一词的效用等同。绝大多数媒体即便是科技媒体也都搞不清楚机器学习、深度学习、人工智能三者的区别。计算机科学家 Robby Goetschalckx 在 Quora 上详细分析了这三个概念,并将数据科学(Data Science)也纳入其中做对比:
Artificial Intelligence is the name of a very large research field, with numerous branches. Any approach to make a computer behave in a way which can be called intelligent” falls under this field.
Machine Learning is a particular branch in AI. It focuses on algorithms which construct models based on observed data. An essential part is the learning: given different data, you could get a different model. Again, there are many sub-fields and branches, depending on the methodology used, and the problem specification (for example, do you just want to learn which examples are “good” or “bad”, or do you want the algorithm to learn what actions to take in a specific situation?).
Deep Learning is a particular technique in Machine Learning. It uses an Artificial Neural Network with many layers, using particular clever techniques to learn the optimal model parameters. Every individual node in the network represents some “feature” which helps in representing the input or assigning a class to the input. For example, if all the inputs are faces, a node might be “nose width” or “average skin tone” or something like that. Crucially, the people creating the network do not need to put these features in themselves. The algorithm will automatically decide which features work best to accomplish its task.
Data Science is a different field, related to AI. It looks at how algorithms can help to deal with vast amounts of data. It includes visualization of data (how to make a large dataset more easy to visualize for humans), Data Mining (automatically finding interesting patterns in data sets), or learning models (which leads to Machine Learning).
深度学习带来的产品革命
而针对深度学习,Google Brain 团队工程师 Eric Jang 介绍了哪些产品正在被深度学习改造着(顺序不分先后):
定制数据压缩、压缩传感、数据驱动型传感器校准、离线人工智能、人机互动、游戏、艺术助理、非结构化数据挖掘、语音合成。
这里特别强调两个:离线人工智能和语音合成,所谓离线人工智能是可以让包括手机在内的设备以不联网的方式实现某种智能的功能,比如 Google 自家的拍照翻译,苹果的 Siri 也有一些功能,在最新 的iOS 10 里,苹果相册推出的人脸识别也是离线人工智能的方式,但就目前而言,离线人工智能还有两大缺陷:其一精确度不够,毕竟智能手机的计算能力有限;其次,智能手机等移动设备的电池容量制约了其进展,如 iOS 10 相册的人脸识别只能在 iOS 设备充电时才运行。
关于语音合成。这项技术事实上已经相当成熟,国内阿里云也曾展示过自家人工智能产品模仿马云说话的样子。但需警惕的一点,随着人工智能的快速发展,新一轮社会工程学攻击将带来巨大破环里,比如基于语音合成技术的电话诈骗….. 在这里推荐《纽约时报》高级科技记者 Markoff 的一篇深度报道。
深度学习让芯片业走入新的十字路口?
当世界开始被深度学习改变的时候,整个科技的产业链条也开始了某种程度的充足。尤其是随着摩尔定律寿命越来越短的大背景下,《连线》杂志 Cade Metz 表示,自己最近几天被来自咨询公司的人「骚扰」——对方表示可以付费让其分析芯片业的走向——这是因为 Metz 早前报道了 Google 如何为人工智能制造芯片的文章,同时《连线》杂志也在上月长文独家披露了微软围绕 FPGA 的研发历程。咨询公司此举多数是由其背后的客户——芯片厂商所驱动,由此可见芯片业对于人工智能的不确定性。
Metz 坦言:
Today, Internet giants like Google, Facebook, Microsoft, Amazon, and China’s Baidu are exploring a wide range of chip technologies that can drive AI forward, and the choices they make will shift the fortunes of chipmakers like Intel and nVidia. But at this point, even the computer scientists within those online giants don’t know what the future holds.
但在可预见的时间表中,人工智能趋势下的芯片业未来一定包含以下两大趋势:
- 转向深度学习;
- 可扩展到智能手机的新一代芯片;