案例研究:与Deepnets找到希格斯玻色子

In ourprevious blog post,,,,we’ve introduced Deep Neural Networks. In today’s blog post, the second of our series of six, we’ll use BigML’s deep networks, called深网,浏览监督的学习问题。

The Data

The dataset we are investigating is theHiggs datasetfound at theUCI机器学习存储库The problem, as explained in theoriginal Nature Communications paper,,,,is that particle accelerators create a huge amount of data. The Large Hadron Collider, for example, can produce 100 billion collisions in an hour and only a tiny fraction may produce detectable exotic particles like the Higgs boson. So a well-trained model is immensely useful for finding the needle in the haystack.

卢卡斯·泰勒(Lucas Taylor) /塞恩(Cern-ex-9710002-1)http://cdsweb.cern.ch/record/628469

This dataset contains 28 different numeric fields and 11 million rows generated through simulation, but imitating actual collisions. As explained in the paper, these 28 fields come in two kinds. The first 21 of the fields capture low-level kinematic properties measured directly by the detectors in the accelerator. The last seven fields are combinations of the kinematic fields hand-designed by physicists to help differentiate between collisions that result in a Higgs boson and those that do not. So some of these fields are “real” data as measured, and some are constructed from domain-specific knowledge.

在解决机器学习问题时,这种功能工程非常普遍,并且可以大大提高预测的准确性。但是,如果我们没有物理学家来协助这种功能工程,该怎么办?与其他机器学习技术相比Boosted Trees),通过学习自己的高级组合(尤其是在低级字段是数字且连续的)时,DeepNets可以通过低级领域的表现良好。

To try this out, we created a dataset of just the first 1.5 million data points (for speed), and removed the last seven high-level features. We then split the data into a 80% training dataset and 20% testing dataset. Next up we create a Deepnet by using BigML’s automatic structure search.

屏幕截图2017-09-27,上午9.28.36

Deep neural networks can be built in a multitude of ways, but we’ve simplified this by intelligently searching through many possible network structures (number of layers, activation functions, nodes, etc.) and algorithm parameters (the gradient descent optimizer, the learning rate, etc.) and then making an ensemble classifier from the best individual networks. You can manually choose a network structure, tweak the search parameters, or simply leave all those details to BigML (as we are doing here).

The Deepnet

img 1

Once the Deepnet is created, it would be nice to know how well it is performing. To find out, we create an Evaluation using this Deepnet and the 20% testing dataset we split off earlier.

img 2

我们的精度为65.9%,ROC AUC为0.7232。也许我们甚至可以通过延长最大训练时间来改善这些数字。使用带有默认设置的增强树运行,这比我们的结果要好得多。在这里比较他们:

img 3

增压树的精度为59.4%,ROC AUC为0.6394。增压树只是无法像深网那样从这些低级功能中吸收那么多信息。这是一个明确的例子,说明我们何时应该选择深网而不是其他监督学习技术以解决您的分类或回归问题。

Want to know more about Deepnets?

请继续关注更多出版物!在下一篇文章中,我们将解释how to use Deepnets through the BigML Dashboard。此外,如果您有任何疑问,或者想了解有关深网的工作方式的更多信息,请访问dedicated release page。它包括一系列有关DeepNet,BigML仪表板和API文档的六个博客文章,beplay2网页登陆网络研讨会幻灯片以及full webinar recording

3条评论

发表评论

在下面填写您的详细信息,或单击图标登录:

Gravatar
wordpress.com徽标

您正在使用WordPress.com帐户评论。((Log Out/改变

Google photo

您正在使用Google帐户评论。((Log Out/改变

Twitter picture

You are commenting using your Twitter account.((Log Out/改变

Facebook photo

You are commenting using your Facebook account.((Log Out/改变

连接到%s