在分析空间中，有没有什么充满充满希望，炒作，性感和可能令人敬畏的东西，"big data?" 我不't think so.
So what is 大两副牌斗地主 really? No 上 e quite knows.
As I interpret it, 大两副牌斗地主 is the collection of massive databases of structured 和 unstructured data. 的data sources include 传统的 (now considered puny) sources like corporate 企业资源计划 / 客户关系管理 系统和非传统（大量）资源，例如来自每个人或机械传感器的每个技术ping，整个Internet上每个人的所有网络行为，来自医院或大气等模拟源的越来越多的数字两副牌斗地主，以及（好主！）发推文。
That explains why so much of 大两副牌斗地主 talk comes from Oracle, IBM, Microsoft, SAP 和 other vendors. And not so much from practitioners, yet.
我相信大两副牌斗地主的前景以及由此产生的深刻见解。但这不足为奇。一直追溯到2007年，我一直在宣扬远离"small data"点击流两副牌斗地主的世界"bigger data"使用多个两副牌斗地主源在网络上做出更明智的决策的世界。点击流+定性两副牌斗地主+严格的结果统计分析+竞争情报来源的两副牌斗地主深度挖掘+快速实验+更多。
这里's the "更大的网络分析两副牌斗地主" picture from 2007… 多样性！
的大两副牌斗地主 we are dealing with today puts the 2007 picture to shame. We have even more types of data, becoming ever more complex, distributed across multiple existences, 和 we are left with the task of parsing out terabytes of 噪声 to get to a megabyte of 信号.
That last part is what I love to focus 上 , what I worry about, what I think everyone should focus 上 . It is great that we have 大两副牌斗地主. It is greater that we have such amazing promise in that 大两副牌斗地主. It is sucky that almost no 上 e knows what to do with it in the context of driving actual business value.
Hence my interest in 大两副牌斗地主 is not about the zettabytes 要么 Hadoop的 要么 unstructured variables 要么 上 e of the n technical things that seem to dominate 大两副牌斗地主 conversations.
我的兴趣深深地，热情地植根于试图弄清楚如何将大两副牌斗地主一路带到银行（或世界和平）。如何寻找见解？如何组织将使用此两副牌斗地主的组织，以确保它们从中获得及时的价值？如何采取行动？如何找到能够引发不同思维方式的框架，所以我们不't make the mistakes we so brilliantly have made in the world of 小两副牌斗地主?
如果我们不这样做't answer all those 怎么样 大两副牌斗地主 will be a big disappointment.
避免大失所望 怎么样 当我准备主题演讲时 Strata 2012大两副牌斗地主会议 . My goal was to take my TED-ish 15-minute timeslot to present my perspective 上 why driving big action was the big imperative for 大两副牌斗地主.
00:00– 01:15 介绍。我最喜欢的两副牌斗地主来自肯尼亚农民Zack Matere。
01:15– 04:05 Part 1. 的current flawed data 要么 g structure, its challenges, 和 the new optimal 要么 g structure to truly bring big action to 大两副牌斗地主.
04:05 – 06:20 Part 2. A framework, inspired by Donald Rumsfeld, for 大两副牌斗地主 vendors to think about when creating solutions 和 the unique space in which 大两副牌斗地主 分析员s should actually play in (only the "unknown unknowns!").
06:20– 10:25 第3A部分。我的第一个战术示例：如何 神奇地自动 解决了拥有数百万行两副牌斗地主的问题，并且不知道如何找到可能对业务产生巨大影响的15个有价值的行。借力 有趣！
10:25– 15:00 第3B部分。我的第二个战略示例：利用杠杆 预测，挖掘，关联 从两副牌斗地主采集转移到更多 神奇地自动, find trends in the data that truly are the 未知的未知数 和 确定那些趋势的因果关系，以便我们能够以轻快的速度从两副牌斗地主转移到行动。
这里's the keynote…
[你也可以看这个 YouTube上的视频。您'也欢迎您喜欢“赞”，“分享”，“ 鸣叫 ”，“ Facebook”，也可以在YouTube上为其+1。]
It is not my hope to encourage 您 to copy/paste the strategy outlined, 要么 to use the tools shown.
My hope is to simply inspire 您 to think a little differently about 要么 ganization design, share a framework to influence the focus of 您r analysis, 和 find the types of practical solutions that will really spark profitability from all this 大两副牌斗地主.
I welcome 您r feedback 和 thoughts 上 the video 和 the solutions via comments. Please also share 您r experience with 大两副牌斗地主. Any big 要么 small success 您'曾经会启发我们所有人。
Preparing for my keynote also got me thinking about all the implications of 大两副牌斗地主 和 my own longish career in trying to create superb decision support systems. 的database has moved from my floppy disk (true story) to an infinite storage 云 , yet, amazingly, some of the biggest challenges have remained the same.
So 大两副牌斗地主 revolutionaries…
这里 are some 规则 from my experience in the 小两副牌斗地主 world that I've come to believe also apply to the 大两副牌斗地主 world, perhaps even more so. As 您 go about 您r 大两副牌斗地主 journey 您'll meet with even more immense success if 您 consider these valuable life lessons:
1.唐't buy the hype of 大两副牌斗地主 和 throw millions of dollars away. But don't stand still.
Take 15% of 您r decision making budget 和 give it to 上 e really, really smart person (Ninja! OK, Data Scientist) 和 give that person the freedom to experiment in the 云 with 大两副牌斗地主 possibilities for 您r companies.
很便宜你可以做 脏两副牌斗地主仓库 pretty darn fast. You can find all the ugly warts 和 problems. You can be much smarter when 您 start to 主流 大两副牌斗地主 into 您r company, while preserving the data awesomeness that already exists in 您r company.
Structure 您r 大两副牌斗地主 efforts, at least initially, to 失败时更快地失败。唐't build the biggest, baddest 大两副牌斗地主 environment over 32 months, 上 ly to realize it was 您r biggest, baddest mistake.
2. Big thinking about what 大两副牌斗地主 should be solving for is supremely important.
我可以't think of any other time in our lives where we could literally swim endlessly in an ocean of data, without having anything to show for it. 大两副牌斗地主 is that world. If 您 don't know where 您 are going, 您 will get there 和 您'll be miserable (if 您r company has not fired 您 already, in which case 您'会很悲惨和悲伤）。
I've提倡利用诸如 数字营销& Measurement Model在网络环境中，以确保我们进行的分析深入而有力地基于's important to the business. You have to have that 上 e page, even if it is roughly defined by 您r Sr. Management. Have something.
If 您r management refuses, 要么 is not visionary enough to provide 您 with even basic starting points, then build 上 e by 您rself. All it takes is a little business analysis. 这里's my post: Five Steps to Finding a Purpose for 您r Analysis.
When 您 have access to all this data, the answers 您 find will be surprising, the 见解 您 deliver will be brilliant, 和 您r impact 上 the business will be huge. But that can 上 ly happen if there is a model that defines the purpose of 您r sweet 大两副牌斗地主 adventures.
3. The 两副牌斗地主成功的10/90规则 仍然成立。
For every $100 您 have available to invest in making smart decisions, invest $10 in tools 和 vendor services, 和 invest $90 in big brains (aka people, aka analysis ninjas, aka 您!).
I will admit that Oracle 和 IBM 和 SAS 和 solid state drives are very expensive. Nine times that to invest in big brains might seem egregious. Perhaps it is. 让 the 10/90 rule be an inspiration to simply over-invest (way over-invest) in people, because without that investment 大两副牌斗地主 will absolutely, positively, be a big disappointment for 您r company.
Computers 和 artificial intelligence are simply not there yet. Hence 您r BFF is natural intelligence. :)
4.拍摄正确的时间两副牌斗地主， not 即时的 data.
Real time data is almost insane to shoot for because even for the smallest decisions, 您'll have to do a lot of analysis first (5 hours), then present it to 您r superior (1 hour), WHO will add two bullet items 和 send it to a team of people (20 hours), WHO will in turn argue about priorities 和 how much the data is wrong (16 days), but ultimately come to an agreement because the deadline to make the decision passed 7 days ago (20 seconds), 和 send the data to the 大老板 WHO'我将只阅读执行摘要的第一部分（三天），并确定两副牌斗地主正在告诉她与她一直知道的作品相反的东西，而她'会根据自己的直觉（5秒）做出决定，并会采取一些行动（14天）。
Total up those numbers. Was the 即时的 data of any real value?
Ok so that is way over the top. But every company has a complex decision making structure that is time consuming 和 therefore unable to react in 即时的. If 您 can't react in 即时的, why do 您 need 即时的 data?
Understand when is the right time for data in 您r 要么 ganization. Shoot for systems 和 processes that match delivery of data (better still, 见解 ）到该时间范围。您'会减轻压力。您'll focus 上 big, important, strategic things (real time data is really good at driving the best companies to do tactical silly things). And 您'll save a lot of money, because 即时的 everything is really expensive!
这里's 上 e way to check if 您 really need 即时的 data: Does a 人的 have to be involved from data receipt to taking action? If the answer is yes, then 您 don't need 即时的 data, 您 need right time data. If the answer is no (say 您 have intelligence/rules driven automated systems), then 您 need 即时的 data.
5. "Data quality sucks, 克服它."
那是我自2006年6月以来的职位的标题。've come. :)
Multiply all of that a million times when it comes to 大两副牌斗地主. We will have 脏两副牌斗地主 。我们将不知道该如何处理视频或语音文字或（omg！）社交媒体超载。我们将缺少主键。我们将缺少干净的元两副牌斗地主（有时甚至是任何元两副牌斗地主！）。我们将意识到情感分析的底线。我们将为痛苦的业务流程修正而痛苦，这些修正通常会产生良好的两副牌斗地主。
Do the best 您 can in terms of collecting, processing, 和 storing data of the cleanest possible quality. Know when to shift to data analysis. Start making decisions. Make small 上 es at first. (Remember, even they will be revolutionary, as these datasets have never come together!) Make bigger 上 es over time, as 您 understand the limitations of what 您 are dealing with.
这里's the kiss of death: 大两副牌斗地主 implementation projects where the first touch of an Analyst will come 18 months after the project was first conceived. You see, the world would have changed so dramatically in 18 months that nothing 您 possibly spec'ed for不再相关。
6. 消除噪声甚至比发现信号更重要 .
Thus far in the history data analysis the objective for our queries has been trying to find the 信号 amongst all the 噪声 in the data. That has worked very well. We had clean business questions. 的data size was smaller 和 the data set was more complete 和 we often knew what we were looking for. Known knowns 和 known unknowns. (See video above.)
With 大两副牌斗地主, it is so much more important to be magnificent at knowing what to ignore. You must know how to separate out all the 噪声 in the disparate huge datasets to even have a fighting chance to start to look for the 信号.
It is amazing but true. If 您 are not magnificent at knowing what to ignore, 您'll never get a chance to pay attention to the stuff to which 您 should be paying attention.
Your business savvy. Your analytical gut instinct. Tuning 您r algorithms to first ignore 和 then hunt for 见解 . That is what will have a material impact.
Six simple 规则 for 您 revolutionaries to follow to ensure, well, revolutionary success.
If 您 are really thinking 大两副牌斗地主 value, think CEO 和 not CIO/CTO. It will dramatically change the focus of 您r work, in a good way.
一如既往's 您r turn now.
Did 您 find the keynote to be of value? Did 您 find the framework to be of value? Will it drive 您 to change 您r approach to 大两副牌斗地主? With regards to the 规则 以上… is there 上 e rule 以上that is 您r favorite? Is there 上 e that should have been there but is missing? What is the biggest 大两副牌斗地主 advice 您 would share from 您r experience?
Please share 您r wisdom, recommendations, 和 feedback via comments.