中央 在分析空间中,有没有什么充满充满希望,炒作,性感和可能令人敬畏的东西,"big data?" 我不't think so.

So what is 大两副牌斗地主 really? No 上 e quite knows.

As I interpret it, 大两副牌斗地主 is the collection of massive databases of structured 和 unstructured data. 的data sources include 传统的 (now considered puny) sources like corporate 企业资源计划 / 客户关系管理 系统和非传统(大量)资源,例如来自每个人或机械传感器的每个技术ping,整个Internet上每个人的所有网络行为,来自医院或大气等模拟源的越来越多的数字两副牌斗地主,以及(好主!)发推文。


Because so much of the 大两副牌斗地主 talk is 专注 上 the promise of zettabytes of data, 大两副牌斗地主 also tends to be about 大规模并行计算,出色的存储系统,"," Hadoop的 的 MapReduce 和other such deeply technical delights.

That explains why so much of 大两副牌斗地主 talk comes from Oracle, IBM, Microsoft, SAP 和 other vendors. And not so much from practitioners, yet.

我相信大两副牌斗地主的前景以及由此产生的深刻见解。但这不足为奇。一直追溯到2007年,我一直在宣扬远离"small data"点击流两副牌斗地主的世界"bigger data"使用多个两副牌斗地主源在网络上做出更明智的决策的世界。点击流+定性两副牌斗地主+严格的结果统计分析+竞争情报来源的两副牌斗地主深度挖掘+快速实验+更多。

这里's the "更大的网络分析两副牌斗地主" picture from 2007… 多样性!


的大两副牌斗地主 we are dealing with today puts the 2007 picture to shame. We have even more types of data, becoming ever more complex, distributed across multiple existences, 和 we are left with the task of parsing out terabytes of 噪声 to get to a megabyte of 信号.

That last part is what I love to focus 上 , what I worry about, what I think everyone should focus 上 . It is great that we have 大两副牌斗地主. It is greater that we have such amazing promise in that 大两副牌斗地主. It is sucky that almost no 上 e knows what to do with it in the context of driving actual business value.

Hence my interest in 大两副牌斗地主 is not about the zettabytes 要么 Hadoop的 要么 unstructured variables 要么 上 e of the n technical things that seem to dominate 大两副牌斗地主 conversations.

我的兴趣深深地,热情地植根于试图弄清楚如何将大两副牌斗地主一路带到银行(或世界和平)。如何寻找见解?如何组织将使用此两副牌斗地主的组织,以确保它们从中获得及时的价值?如何采取行动?如何找到能够引发不同思维方式的框架,所以我们不't make the mistakes we so brilliantly have made in the world of 小两副牌斗地主?

如果我们不这样做't answer all those 怎么样 大两副牌斗地主 will be a big disappointment.

避免大失所望 怎么样 当我准备主题演讲时 Strata 2012大两副牌斗地主会议 . My goal was to take my TED-ish 15-minute timeslot to present my perspective 上 why driving big action was the big imperative for 大两副牌斗地主.

多亏了Strata联合主席,这是一个巨大的挑战 埃德·邓比尔 阿利斯泰尔 Croll. In this post, I want to share the result with 您.


00:00– 01:15 介绍。我最喜欢的两副牌斗地主来自肯尼亚农民Zack Matere。

01:15– 04:05 Part 1. 的current flawed data 要么 g structure, its challenges, 和 the new optimal 要么 g structure to truly bring big action to 大两副牌斗地主.

04:05 – 06:20 Part 2. A framework, inspired by Donald Rumsfeld, for 大两副牌斗地主 vendors to think about when creating solutions 和 the unique space in which 大两副牌斗地主 分析员s should actually play in (only the "unknown unknowns!").

06:20– 10:25 第3A部分。我的第一个战术示例:如何 神奇地自动 解决了拥有数百万行两副牌斗地主的问题,并且不知道如何找到可能对业务产生巨大影响的15个有价值的行。借力 有趣!

10:25– 15:00 第3B部分。我的第二个战略示例:利用杠杆 预测,挖掘,关联 从两副牌斗地主采集转移到更多 神奇地自动, find trends in the data that truly are the 未知的未知数 确定那些趋势的因果关系,以便我们能够以轻快的速度从两副牌斗地主转移到行动。

这里's the keynote…

[你也可以看这个 YouTube上的视频。您'也欢迎您喜欢“赞”,“分享”,“ 鸣叫 ”,“ Facebook”,也可以在YouTube上为其+1。]

It is not my hope to encourage 您 to copy/paste the strategy outlined, 要么 to use the tools shown.

My hope is to simply inspire 您 to think a little differently about 要么 ganization design, share a framework to influence the focus of 您r analysis, 和 find the types of practical solutions that will really spark profitability from all this 大两副牌斗地主.

I welcome 您r feedback 和 thoughts 上 the video 和 the solutions via comments. Please also share 您r experience with 大两副牌斗地主. Any big 要么 small success 您'曾经会启发我们所有人。

Preparing for my keynote also got me thinking about all the implications of 大两副牌斗地主 和 my own longish career in trying to create superb decision support systems. 的database has moved from my floppy disk (true story) to an infinite storage 云 , yet, amazingly, some of the biggest challenges have remained the same.

So 大两副牌斗地主 revolutionaries…


这里 are some 规则 from my experience in the 小两副牌斗地主 world that I've come to believe also apply to the 大两副牌斗地主 world, perhaps even more so. As 您 go about 您r 大两副牌斗地主 journey 您'll meet with even more immense success if 您 consider these valuable life lessons:

1.唐't buy the hype of 大两副牌斗地主 和 throw millions of dollars away. But don't stand still.

Take 15% of 您r decision making budget 和 give it to 上 e really, really smart person (Ninja! OK, Data Scientist) 和 give that person the freedom to experiment in the 云 with 大两副牌斗地主 possibilities for 您r companies.

很便宜你可以做 脏两副牌斗地主仓库 pretty darn fast. You can find all the ugly warts 和 problems. You can be much smarter when 您 start to 主流 大两副牌斗地主 into 您r company, while preserving the data awesomeness that already exists in 您r company.

Structure 您r 大两副牌斗地主 efforts, at least initially, to 失败时更快地失败。唐't build the biggest, baddest 大两副牌斗地主 environment over 32 months, 上 ly to realize it was 您r biggest, baddest mistake.

2. Big thinking about what 大两副牌斗地主 should be solving for is supremely important.

我可以't think of any other time in our lives where we could literally swim endlessly in an ocean of data, without having anything to show for it. 大两副牌斗地主 is that world. If 您 don't know where 您 are going, 您 will get there 和 您'll be miserable (if 您r company has not fired 您 already, in which case 您'会很悲惨和悲伤)。

I've提倡利用诸如 数字营销& Measurement Model在网络环境中,以确保我们进行的分析深入而有力地基于's important to the business. You have to have that 上 e page, even if it is roughly defined by 您r Sr. Management. Have something.

If 您r management refuses, 要么 is not visionary enough to provide 您 with even basic starting points, then build 上 e by 您rself. All it takes is a little business analysis. 这里's my post: Five Steps to Finding a Purpose for 您r Analysis.

When 您 have access to all this data, the answers 您 find will be surprising, the 见解 您 deliver will be brilliant, 和 您r impact 上 the business will be huge. But that can 上 ly happen if there is a model that defines the purpose of 您r sweet 大两副牌斗地主 adventures.

3. The 两副牌斗地主成功的10/90规则 仍然成立。

For every $100 您 have available to invest in making smart decisions, invest $10 in tools 和 vendor services, 和 invest $90 in big brains (aka people, aka analysis ninjas, aka 您!).

I will admit that Oracle 和 IBM 和 SAS 和 solid state drives are very expensive. Nine times that to invest in big brains might seem egregious. Perhaps it is. 让 the 10/90 rule be an inspiration to simply over-invest (way over-invest) in people, because without that investment 大两副牌斗地主 will absolutely, positively, be a big disappointment for 您r company.

Computers 和 artificial intelligence are simply not there yet. Hence 您r BFF is natural intelligence. :)

4.拍摄正确的时间两副牌斗地主, not 即时的 data.

Real time data is almost insane to shoot for because even for the smallest decisions, 您'll have to do a lot of analysis first (5 hours), then present it to 您r superior (1 hour), WHO will add two bullet items 和 send it to a team of people (20 hours), WHO will in turn argue about priorities 和 how much the data is wrong (16 days), but ultimately come to an agreement because the deadline to make the decision passed 7 days ago (20 seconds), 和 send the data to the 大老板 WHO'我将只阅读执行摘要的第一部分(三天),并确定两副牌斗地主正在告诉她与她一直知道的作品相反的东西,而她'会根据自己的直觉(5秒)做出决定,并会采取一些行动(14天)。

Total up those numbers. Was the 即时的 data of any real value?

Ok so that is way over the top. But every company has a complex decision making structure that is time consuming 和 therefore unable to react in 即时的. If 您 can't react in 即时的, why do 您 need 即时的 data?

Understand when is the right time for data in 您r 要么 ganization. Shoot for systems 和 processes that match delivery of data (better still, 见解 )到该时间范围。您'会减轻压力。您'll focus 上 big, important, strategic things (real time data is really good at driving the best companies to do tactical silly things). And 您'll save a lot of money, because 即时的 everything is really expensive!

这里's 上 e way to check if 您 really need 即时的 data: Does a 人的 have to be involved from data receipt to taking action? If the answer is yes, then 您 don't need 即时的 data, 您 need right time data. If the answer is no (say 您 have intelligence/rules driven automated systems), then 您 need 即时的 data.

5. "Data quality sucks, 克服它."

那是我自2006年6月以来的职位的标题。've come. :)


Multiply all of that a million times when it comes to 大两副牌斗地主. We will have 脏两副牌斗地主 。我们将不知道该如何处理视频或语音文字或(omg!)社交媒体超载。我们将缺少主键。我们将缺少干净的元两副牌斗地主(有时甚至是任何元两副牌斗地主!)。我们将意识到情感分析的底线。我们将为痛苦的业务流程修正而痛苦,这些修正通常会产生良好的两副牌斗地主。


Do the best 您 can in terms of collecting, processing, 和 storing data of the cleanest possible quality. Know when to shift to data analysis. Start making decisions. Make small 上 es at first. (Remember, even they will be revolutionary, as these datasets have never come together!) Make bigger 上 es over time, as 您 understand the limitations of what 您 are dealing with.

这里's the kiss of death: 大两副牌斗地主 implementation projects where the first touch of an Analyst will come 18 months after the project was first conceived. You see, the world would have changed so dramatically in 18 months that nothing 您 possibly spec'ed for不再相关。


6. 消除噪声甚至比发现信号更重要 .


Thus far in the history data analysis the objective for our queries has been trying to find the 信号 amongst all the 噪声 in the data. That has worked very well. We had clean business questions. 的data size was smaller 和 the data set was more complete 和 we often knew what we were looking for. Known knowns 和 known unknowns. (See video above.)

With 大两副牌斗地主, it is so much more important to be magnificent at knowing what to ignore. You must know how to separate out all the 噪声 in the disparate huge datasets to even have a fighting chance to start to look for the 信号.

It is amazing but true. If 您 are not magnificent at knowing what to ignore, 您'll never get a chance to pay attention to the stuff to which 您 should be paying attention.

Your business savvy. Your analytical gut instinct. Tuning 您r algorithms to first ignore 和 then hunt for 见解 . That is what will have a material impact.

Six simple 规则 for 您 revolutionaries to follow to ensure, well, revolutionary success.


If 您 are really thinking 大两副牌斗地主 value, think CEO 和 not CIO/CTO. It will dramatically change the focus of 您r work, in a good way.

一如既往's 您r turn now.

Did 您 find the keynote to be of value? Did 您 find the framework to be of value? Will it drive 您 to change 您r approach to 大两副牌斗地主? With regards to the 规则 以上… is there 上 e rule 以上that is 您r favorite? Is there 上 e that should have been there but is missing? What is the biggest 大两副牌斗地主 advice 您 would share from 您r experience?

Please share 您r wisdom, recommendations, 和 feedback via comments.

Thank 您.


  1. 1

    一如既往,精彩的帖子。去年在一次贸易展览会上,我与Intersect360的Addison Snell进行了类似的讨论,讨论了其中一些问题。

    As a tech geek, I very much get absorbed into all of the hardware 和 software application details when it comes to 大两副牌斗地主. It'很容易迷失在所有这些方面。


    但是有时候's difficult to stay 专注 上 that when 您 realize just how amazing it is that we have that data in the first place. It'非常令人敬畏。

    Thank 您 for taking the time to share!

  2. 2

    的area of 未知的未知数 is ambiguous. Pulling in 大两副牌斗地主 sets often requires 您 to make assumptions 和 smooth over inconsistencies between different data.

    It seems likely that the more data 您 pull in, the more 噪声 there may be 和 , as a result, there may be more chance for folks to misinterpret the data 要么 interpret it differently to support opposing viewpoints.

    That said, do 您 feel that 大两副牌斗地主 can more easily lead folks astray than smaller, more contiguous data sets? How do 您 control for the differences?

    • 3

      乔希: 让 me try to tease out some of the threads in 您r valuable comment…

      First, any data can lead the unaware astray. 我不't think that is privy of big 要么 小两副牌斗地主. Sub optimal thinking in, sub optimal results out. :)

      Second, 您 are right that significantly higher data literacy will be required in our Marketing, Sales, Finance, HR peers, 和 in us Analysts. (I'd在视频开头提到了为什么如此重要。)许多组织还没有出现。现在就开始捍卫僵尸并开始进化的时候!

      最后,连接更多的两副牌斗地主集将带来更多的连接性和解释挑战,但是如果没有它们,就没有多汁的水果可吃,也就不会有巨大的进步。我们可以't, 和 likely won't, stay with a 小两副牌斗地主 set 要么 just 上 e source because that likely means we stay stuck making small 和 perhaps incomplete decisions.

      Thank 您 as always for sharing 您r feedback, I appreciate that so very much.


  3. 4

    那是一个很棒的帖子和演示文稿Avinash。这让我想起了彼得·费德(Peter Fader)所说的话 一个采访 :

    "有些公司只是觉得:如果我们投入足够的钱,如果我们雇用足够的聪明人,他们就会弄清楚如何处理两副牌斗地主。不,那's not the way to go."

  4. 5
    克雷格·伯吉斯 says



    1) "Let'不会成为那种继续浪费时间去追求质量,而不是不断减少回报的人。让'不会成为持久的javascript黑客和sprop变量调整者,而现在却要从两副牌斗地主中传递价值。"

    拥有如此多的可用两副牌斗地主,这是一个持续的斗争。这句话有个共同点:"You are defined more by what 您 don't do, than by what 您 do."它带来了我喜欢的另一个见解:

    2) "If 您 are not magnificent at knowing what to ignore, 您'll never get a chance to pay attention to the stuff to which 您 should be paying attention."

    的issue also lies with what 您 are charged with doing vs. doing what 您 know charges 您! Does it match up with what 您r boss(es)/CIO/CEO thinks 您 should be doing? Often not. This dissonance is like an annoying visual distraction that makes it even tougher to reflect 和 find those nuggets 您 KNOW are buried just below the surface!



  5. 6
    汤姆·克鲁斯 says


    You must have a framework 和 there are wonderful tools that help get away from the 噪声 from some wonder Six Sigma process improvement-type tools to statistical modeling tools that calculate relationship values (like Information Values) 要么 correlations (principle component analyses).

    的most important thing is to make the business link between the data-driven insight 和 the marketing action.

  6. 7

    This is a very serious post 和 hence 我可以'拒绝分享来自@kimwatkins的这张照片…

    Starting small with 大两副牌斗地主!

    大两副牌斗地主 is going to be big. It is important to start them small! :)

    I'm not sure if this is optimal for 您r kids, but there is a message there about it never been too early to start learning.

    Thank 您 Kim!


  7. 8




    海事组织,决定如何/如何利用调查结果始终是一个巨大的挑战。尤其是在企业级,变革不是'些需要的东西'overnight'. It can…还有很多情况下…but it just doesn't :)

    • 9

      游客: I normally reply to every comment I get 上 this blog, but 您 did not leave 您r real email address. But I publicly wanted to thank 您 for taking time to share 您r valuable experience in the comment above.

      It is immensely helpful to hear from people WHO'我去过那里,试图改变世界。 :)

      Thank 您,


  8. 10
    Urvashi Pitre says

    我必须承认对突然的炒作感到惊讶"Big data"当我们很多人在谈论–使用和推动变化–"big data"这么多年了区别可能是我们没有't have a catchy title for it. Saying 多通道 data, 要么 integrated data sources may not have been as sexy as "beeg beeg data!"

    恕我直言,整个推动仍需要继续解决重要的业务问题,提高盈利能力以及使用两副牌斗地主来推动决策。大两副牌斗地主只是完成工作的原材料。推动变化的不是大两副牌斗地主的突然可用性–mainly because it's not that sudden–而是人们是否可以访问有助于他们做出决定的信息。

    我的2美分–would love to hear thoughts from others WHO have been in the data trenches for years.

    • 11

      Urvashi: Having been in the trenches starting with a 0.6 mb database in Access :), working up to a few hundred gigs in Sybase to now crazy amounts in the 云 , 我可以 completely empathize with 您r perspective.


      我确实认为里面有很多新东西"big data."我们处理的两副牌斗地主类型。分析的复杂性。我们用于存储两副牌斗地主(并将其丢弃)的方法。我们可以回答的问题类型,我们以前可能从未回答过的问题。

      我们在IT和业务方面所做的一切都使我们做好了充分利用这一新机会的准备,即使我们可怕的敌人(本文中概述了六个)似乎是我们旧世界的BFF。 :)

      谢谢 so much for adding 您r perspective.


    • 12


      "…when so many of us have been talking about–使用和推动变化–"big data" for so many years…"

      这当然是事实。恕我直言,值得注意的主要含义之一是'time stamp'. From a direct mail perspective, it was extremely difficult to 时标 您r campaigns – unless 您 sent out a little goblin to stalk the mailman!!!.


      虽然实时两副牌斗地主可能很强大(IMO don'认为我们大多数人还没有在资源/带宽方面明智地利用这一点),时间戳的可用性允许通过整合到'traditional'营销两副牌斗地主库。从一个'multichannel'从直销的角度来看,恕我直言,这已经是一个重大变化。

  9. 13
    理查德·赫伦 says


    谢谢 for the succinct summery of 大两副牌斗地主 issues in addition to an impassioned performance captured 上 video.

    Two points really resonated with me: right time data 和 信号/noise issues.

    Real time reactive data use can be very tactical 和 beneficial but rarely, if ever, does it really provide great insight beyond its operational boundaries. Its easy to pop up the next recommended book 要么 song but 您r 要么 ganization doesnt really learn anything from that. Valuable analysis takes a bit more time to create, more time to digest 和 internalize, 和 more time to execute against. 让s not confuse business 规则 要么 simple pattern matching with analysis.

    And the promise of 大两副牌斗地主 is that there is a wealth of little data meandering around inside struggling to be set free. Most of the extremely granular data that we can now collect has little to no informational value. Being able to remove as much 噪声 as possible is the secret of liberating those nuggets of truth. I sometimes think that we actually have the same amount of important data now as we did years ago, 上 ly now we have surrounded it with more junk.




  10. 14

    How can 您 get the weighted sort turned 上 ? I have read that it'不适用于所有Google Analytics(分析)帐户。

    I tried to set up 您r report from the video 和 the weighted sort option does not come 上 . Even after clicking the bounce rate sort.



  11. 17

    #6打回家– as this growing sea of data (maybe universe of data?) continues to keep growing, we have to be able to focus 上 the important segments 和 then zero in 上 the elusive 信号.


  12. 18

    In part 1 of the video 您 make a good case for addressing the 要么 g structure prior to embarking 上 the 大两副牌斗地主 journey. I humbly propose a Rule 0:


    Smash any process that requires decisions to be referred to a central command-and-control authority. If that means smashing the central command-and-control authority then grab 您r pitchfork now.

    If 您'只是两副牌斗地主专家和其他人来决定,现在就离开那里!没有层次结构问题的新贵公司将比您的公司更具竞争力。

  13. 19


    Your keynotes are always provocative 和 wildly entertaining to me. Maslow would deeply appreciate 您r language as it relates to higher needs. Sometimes I think that 您r 见解 和 strategies should be taught to the new 您ng up 和 comers. Precisely because too many of the hippos will wallow in their own worlds of yesterday 和 self-belief.


    I know 您 have 您r startup Market Motive; but can 您 suggest some educational programs as well that can remold the future minds in 您r visions that 您 like 要么 was Market Motive a response to 您r feeling that not enough of this is being taught.

    • 20

      抢: I appreciate the kind words, thank 您.

      Market Motive was most definitely founded because we felt there was a distinct lack of structured curriculum out there that helped create current generation 分析员s –从分析知识的角度和最佳思维过程的角度来看。我每个季度只有一半笑话告诉我的学生:"如果在本课程结束时've thought 您 know to use the data, but not how to apply the right mental model to it then I would have failed!" :)


      1.我必须愿意做所有必要的努力来证明产品的价值。"new world."通常我们只是去传福音。我认为,畅所欲言,展示一个粗糙的原型,深入的替代分析,是很有价值的。因为是具体的。

      2.我喜欢从客户的角度来构架事物。"这是我们可以提供的壮丽喜悦。" "这就是我们将革新他们经验的方式。" "这就是我们提供的好处将带来更多荣耀的原因。"

      3.竞争对手。我毫不动摇地利用直接竞争对手的当前/即将取得的成功来具体说明为什么必须进行更改。没有首席执行官希望他们的自我实现这一点。 :)



  14. 21
    内德·库玛(Ned Kumar) says

    As always, enjoyed 您r post 和 the keynote. I also loved 您r 规则 –我真正感到人们应该注意的两个是"拍摄正确的时间两副牌斗地主,而不是实时两副牌斗地主" 和 "消除噪声甚至比发现信号更重要 ". While all 您r 规则 should be followed, I think even just focusing 上 these two can provide the firm with tons of 可行的见解 that can beef up their bottom line. (Especially 上 the "noise" rule, sometimes analyzing the 噪声 as the 'signal'可以提供关于事件发生的原因和方式的精彩见解)。


    是否有大两副牌斗地主,我认为关键是我们打算如何使用来自各种渠道/来源的两副牌斗地主来推动我们的业务愿景& goals (and why I liked 您r post &谈论)。仅仅因为两副牌斗地主可用并不一定总是意味着必须使用它(imo)—试图加强与两副牌斗地主的关系通常会偏离分析的目的。[我知道每个人都可能不同意我的观点:-)]。我之所以这样说是因为,如果不查看非结构化两副牌斗地主,文本两副牌斗地主等,那么很多人就会感到被排斥在外。—我对他们的第一个问题一直是"Is that data & analysis of that data necessary given 您r business vision 和 objectives?".

    And lastly, I love the quote 您 had. 这里 is another 上 e I like by Guy Laurence, CEO of Vodafone “Data 上 its own is impotent” :-)


  15. 22

    "Shoot for right time data, but not 即时的 data"例子说明,这是一个很好的信息,尤其是,这是一个很大的现实。

    同样是10/90规则,这是非常明显的,但很少有固定的规则'的确知道's we 人的 WHO makes the difference but not tools 要么 data.

    Also good to have 您r latest videos, which I look forward to 上 您tube.

    兰詹·耶拿(Ranjan Jena)

  16. 23



    两副牌斗地主科学以两种主要方式发生。那里'探索性假设分析(请考虑将列和行拖到Excel中的两副牌斗地主透视表中以查找"unknown unknowns"出现。)'报告,这里是结果分析,通常是进入董事会的结果。


    But speeding the 人的-machine interface dramatically increases the performance 和 productivity of an 分析员. Think changing pivot tables when playing "what if." As 您 point out, 90% of 您r budget should go to smart 人的s. Making those 人的s as efficient 和 effective as possible is essential if 您're going to reap something from that investment. Every time a precious 分析员 watches an hourglass 要么 a spinning beachball, 要么 goes to get a coffee while a report runs, 您're squandering her.

    实时报告不是't that useful except as an early warning system for error detection (sales are down a lot today so maybe the site is broken.) But realtime interactivity is more important than ever because the data is unstructured 和 the 分析员's time is precious.

  17. 24


    的focus 上 people rather than tools 和 the focus 上 actionable insight is always important.

    I have 上 ly 上 e thing to add: In my experience 即时的 analytics is important. Some businesses rely heavily 上 即时的 analytics. '老派大众媒体'例如,尝试在线获取最新的嗡嗡声,并将其包含在他们的报纸或电视节目中。


    • 25

      乌尔里希(Ulrich): I want to completely underscore 您r last point: It always depends!

      There are certainly some scenarios where 即时的 can be of value. Especially, as I mentioned, if automation is involved. 的scenario 您're describing with "old school media" is a good 上 e, many 100% automated tool, with a very lite editorial touch, where 即时的 decision making works very well.

      A different example of this is how engines like Google 要么 Bing will use 100s of 信号s 和 information being published in now time, 和 be able to show the most relevant answer.

      但是在几乎所有其他情况下've not had the privilege of seeing companies do much with 即时的 data (even after pumping millions of dollars into getting that data –拥有它感觉很好,很少有生意可做)。


      • 26


        thanks a lot for 您r reply. I always appreciate 您r feedback. I had the luck to see a hand full of cases where companies did great things with the 见解 gathered from 即时的 data.

        But 您 are right –这种情况很少发生。人们常常把钱花在无效的项目上。

        的'有两副牌斗地主感觉很好' –问题是一个普遍的问题,它很昂贵,可以通过选择合适的人来解决。

  18. 27
    内德·库玛(Ned Kumar) says

    @Alistair– I hear 您r underlying thought 上 wanting 即时的 data 和 agree there can be certain benefits to it (if handled correctly 和 for the right reason).

    However,(imo) we should separate out the efficiency of tools, servers, 和 人的-machine interface with the need for high velocity, high frequency data input. Irrespective of the data being 即时的 要么 not, if the 分析员 is not equipped with the right tools 要么 if the processing capabilities are not scaled enough there is bound to be a lot of 'wastage' of man-hours watching the hourglass. And in this I completely agree with 您 that the firm/HIPPO should ensure that they don'不仅要雇用聪明的人才,还要提供合适的环境&使该思维尽其所能的工具。

    我诠释Avinash的方式's rule was that given 您r current business context, is it still worth it to go for 即时的 data? 这里 again I agree with 您 that if 即时的 data is readily available 要么 available with minimal effort 和 there are no processing constraints then yes, the 分析员 can definitely do exploratory analysis 和 even subject that data to some of the cool methodologies to see if any 见解 can be gained (but the question still remains –这些见识可以或将在多久之前用于任何决策?)。

    但是,如果无法即时获得实时两副牌斗地主,或者需要花费大量精力才能提供实时两副牌斗地主 &在大多数情况下,我处理过Avinash可能并不值得。主要是因为他提到的原因。我们可以对实时两副牌斗地主进行所有分析,但是公司(尤其是大型公司)的结构并不构成实时决策。

    没有[附近]的实时两副牌斗地主'real-time' decision capability has no ROI. Insights by itself are a WHOle lot less valuable than 'actionable 见解 '已采取行动:-)

  19. 28
    德贾·武(Deja Vu) says


    I found 您r post quite interesting. I wrote a 文章 that, 上 the surface might be against 您r viewpoint (see the link the website box). But 上 deeper reflection I think we agree 上 several points.


    1.实时两副牌斗地主–正如许多其他人在这里指出的那样,它有其用途。考虑所有的电网,发电厂,制造厂,UPS,FEDEX,航空公司,银行,交易平台,交通控制系统等,等等。大量的功能可用于实时两副牌斗地主决策。这不仅仅是控制。当您有来自数千个传感器的两副牌斗地主流时,您提到的需要14天的决策循环必须进行压缩。因此,花一些时间来开发正确的分析/算法,可以获取有关多个参数的原始两副牌斗地主并将其呈现为可操作的见解,具有巨大的价值。在这些情况下通常也有一个地方"historical data"(超过24小时的两副牌斗地主,甚至可以追溯到更远的时间)。那些也可以分析,但是实时两副牌斗地主具有实时价值。


    3.我同意你的90/10规则。不幸的是,人为因素被视为重复支出,而大多数软件(甚至是作为服务购买的软件)都被资本化并折旧了。而且,很少或根本没有花力在分析的实际接受者上以学习所提供的工具。态度更像"Bring it to me 和 I'll eat"。当行业知识和两副牌斗地主分析融合在一起时,两副牌斗地主分析的真正革命将来临。如果外部提供商有兴趣提供解决方案,则不会发生这种情况。它'当工具简化得足以使普通经理和有限时间的执行人员可以进行自己的分析而无需求助于忍者或两副牌斗地主漏洞时,就会发生这种情况。的"analyst"角色必须消亡。对于软件,我将您的论文重述为10%,对于不正确的问题,应重述90%"data guru",但要培训实际的校长以发现自己的见解。区别就像是进行导游游还是自己使用GPS探索。


    • 29

      德贾·武(Deja Vu) : 我也认为我们之间有广泛的共识。

      1.我'm not saying 即时的 is completely useless. I'm saying that investing in eliminating 人的s is a worthy cause, it is perhaps the 上 ly way to ever make 即时的 work.

      2.我'm sorry 我不't think I understand exactly what 您 are saying here. But I do agree that hiring the IBM consultant might not be the right answer. :)


      我不't think that the Analyst role will die out. I think it will move away from 分析员s being "荣耀的两副牌斗地主推送器"实际进行大型战略艰苦分析以推动大型战略发展"unknown unknown" hard decisions.

      4. GIGO仍然很统治。但是,为了避免这种情况,我们必须在寻求的两副牌斗地主纯度与决策的及时性之间取得平衡。这不是一个简单的电话,但至关重要的是,决策者必须善于随时间推移进行这些电话。


  20. 30


    Thank 您 for encouraging people to spend money 上 intelligent people.

    曾经是'middle manager' in the past, it is really hard to get upper management to understand that. Then they are always mad at us 中层经理s because the good people are quitting 和 we'为工具付出大量金钱,为什么可以'我们无法获得更好的分析吗?!?!因为我'我花所有的时间训练新员工,使他们最终可以自己开始经营,在其他地方获得双倍或三倍的报酬,辞职,然后这一循环继续进行。



  21. 31

    I have recently discovered 您r blog 和 i am an instant fan! 我可以 totally identify with the view that eliminating 噪声 is more important than finding the 信号. As a practioner of advanced analytics myself, i too am overwhelmed by the mindshare that technology has grabbed in this domain, with few success stories, if any.

    I have a couple of questions/comments 和 would like to hear from 您:

    1. Are there any advances in the data mining techniques that lend themselves especially well to 大两副牌斗地主? All talk about statistical significance became somewhat of a moot point as "data-mining"开始处理非常大(按当时的标准)的样本量。我们是否要告别其他封闭的分析技术?

    2. Where do 您 think the biggest probability of success might be for 大两副牌斗地主? Are there underdogs that might run with the ball 和 benefit the most —例如制造,物流等

    3. are 您 aware of any success stories, big 要么 small (no pun intended), with 大两副牌斗地主.

    • 32

      穆库尔 : Quick answers to 您r questions….



      3. Randomly go to any 大两副牌斗地主 vendor, 要么 do a Google query, 和 您'我们会发现他们为许多成功案例铺垫。带着一小粒盐把它们带走,寻找灵感。


  22. 33


    On 即时的, is the assumption being made about processing volumes coming out of innumerable sources in great variety 和 at a greater velocity, ending up in the palm of a decision makers hand, to sip from that ocean, as if swallowing it all?

    实时也可能与分析有关'auto magically'触发其他系统或在流中确定访问者是否应查看A页或B页或Z31页。实时可能是依赖于流内处理或内存计算的亚秒级响应,也可能是减少需要花费数周甚至数小时才能完成的业务任务,从而改变了基本业务模型以便能够提供服务在同一工作日回复。您知道这一点。

    • 34

      斯里达: My perspective 上 即时的 leaves aside any minor processing, storing, 和 , this might be crazy, reporting issues with 即时的. We have bigger, badder systems with every passing day capable of sifting through millions of rows in 即时的.


      挑战在于时间/流程/官僚/人/乞讨之间"look here's some data" to "好的,用洞察力做x" to "done, it's implemented."

      As I mention in the post, as 您 mention above, if 您 can automate the steps from "看,见解x,行动" by eliminating 人的 beings then 您 can do a lot with 即时的 data.

      还请看 阿利斯泰尔's wonderful comment 在这个线程上。我同意他的宝贵观点。


      • 35

        You are the Guru! Seriously! I entirely agree with 您r main submission i.e. system>human>think-think>sit a bit>act maybe/do something else. This is the enemy of 即时的, making it a waste of time literally. Thank 您 for pointing out 阿利斯泰尔's comment. Rightly highlights the importance of 分析员 time getting wasted sifting trove-ful of unstructured data, which is 上 e aspect of 大两副牌斗地主 .

        I should have been clearer in the first place that I implied 即时的 in the same vein as 'on-demand.'除了明显的用例(如系统触发的警报)外,例如Splunk的超级工作人员正在做的事情,我指的是潜在的/下层的更改,当业务流程(事务,工作流)可能由于T +的处理速度更快而发生得更快时, 2可能是T + 1。

        在我公司当前正在使用的系统中,需要验证3亿多个数字中的实体。我们将验证过程从近半天缩短了,报告时间减少到几分钟。这并不像亚秒级的响应时间那样实时,但是它的作用是改变业务流程以及活动的链接方式。一'real time activity'或整个业务流程中的一堆可能整体上改变业务流程。我对技术过分相信,而技术本身就是非政治性的– refer 兰登·温纳摘录's "鲸鱼和反应堆:在高科技时代寻求极限。除了我的繁荣,技术的影响(阅读'real time processing')可以与商务人士的行为现状脱节,直到几处变化迫使他们以较低的价格和更高的质量提供产品或服务。我不只是一个'real time'粉丝男孩,但是技术专家,希望能带来积极的改变。

        Apologies for my attempt at stoking the embers of this discussion when people have already moved 上 . 阿维纳什 您 are my man!

  23. 36
    杰森·路易斯(Jason Luis) says


    I always enjoy 您r enthusiam 和 insight.

    快速评论–我只是做了一个control + F,'t find the word "creativity" anywhere.

    创造力+智力+勤奋将帮助我们在原石中找到钻石,看到其他人所不具备的潜力't 和 help us deliver 上 the promise of 大两副牌斗地主. Smart people with frameworks will deliver expected results inside the box. Creative people will pick the lock 和 open up the opportunities! =)


    • 37

      杰森: That is great feedback, thank 您.

      I concur with 您 about the three elements (C+I+D). There are perhaps a couple more we could add to that list (if we were thinking of Analysts). In this post with the emphasis 上 the 10/90 rule I wanted to stress at a macro level the importance of having "big smart brains"针对这个复杂的任务进行部署。

      我的假设是,如果那么公司确实雇用他们'会雇用合适的人。但这也许是一个有缺陷的假设。 :)


  24. 38

    在一家以两副牌斗地主为生命线的公司工作,这是AWE INSPIRING。

    I immediately shared this with some colleagues, many of WHOm have been also pondering 上 how we drive more 和 more insight leading to meaningful action from the non-stop barrage of data we deal in, day in, day out.- this just fuels the fires of inspiration.

  25. 39
    维格涅什 says


    I am an amateur to the Professional World 和 trying to come to terms with it. Your blog has narrowed down the way I look at things 和 it is a complete Paradigm Shift. I owe 您 for the Insights. Cheers!


  26. 40
    技术加仑 says








  27. 42
    马蒙·马德培(Mamun Mahdeeb) says


    谢谢 to 您 for this nice post. It will be really helpful for us. You give us powerful information about how to use Data from a real-world application perspective.



  1. […]
    “Information is powerful. But it is how we use it that will define us. Understand when is the right time for data in 您r 要么 ganization. Know what to ignore. Move from data to action at light speed.” – 阿维纳什·考希克(Avinash Kaushik), Driving Big Action.

  2. […]
    大两副牌斗地主势在必行:推动大行动, http://www.kaushik.net

  3. […]

  4. […]
    It all brings us back to a post Avinash has put together 上 the way to go forward with 大两副牌斗地主 where he looks at the things that 您 need to think about when implementing 大两副牌斗地主. 的suggestion that 您 spend a small amount of money doing a 小两副牌斗地主 integration is a good 上 e (although the realities of business suggest that companies try to shoe horn bigger systems 上 to prototypes of smaller 上 es when they should start from scratch). 的other thing that is often missed out is the 90/10 rule.

  5. […]
    阿维纳什·考希克(Avinash Kaushik)用他通常的敏锐度在Occam的Razor上发布了“大两副牌斗地主势在必行:推动大行动”。

  6. […]
    在Occam上'在Razor上,Avinash Kaushik提出了避免两副牌斗地主的观点'的缘故,并根据对两副牌斗地主的良好分析寻找采取行动的方针:大两副牌斗地主势在必行:推动大行动

  7. […]
    无论您雇用主题专家,发展自己的专家还是通过应用程序外包问题,两副牌斗地主都只会变成"不合理地有效"通过计算数字后进行的对话。在他的Strata主题演讲中,Avinash Kaushik(@avinash)重新探访了Donald Rumsfeld'关于已知已知,已知未知和未知未知的陈述,并指出"unknown unknowns"是最有趣和最重要的结果所在。那's the territory we'重新输入:两副牌斗地主驱动的结果,这是我们从未想到的。我们只能以面值获得无法解释的结果'只是要使用它们并把它们收起来。没有人那样使用两副牌斗地主。为了获得下一个甚至更有趣的结果,我们需要了解我们的结果的含义。仅当我们了解二阶和三阶结果的基础时,它们才有用're based. And that'主题专家的真正价值:不仅要提出正确的问题,而且要了解结果并找到两副牌斗地主要讲述的故事。结果不错,但是我们可以'别忘了两副牌斗地主最终是关于洞察力的,洞察力与我们根据两副牌斗地主建立的故事密不可分。随着我们使用两副牌斗地主构建日益复杂的系统,这些故事将变得越来越重要。

  8. […]
    #architecture –“大两副牌斗地主势在必行–推动大行动” – 6条原则,包括–

  9. […]
    What are 您 doing with 您r data?
    Are 您 driving closer to 您r goals?

  10. […]
    我今天偶然碰到了Avinash Kaushik撰写的这篇文章: //www.dqnk120.com/avinash/b
    我认为这非常适合这个问题,'big data'因为它似乎更像是一个性感的流行词,并且失去了一些意义。

  11. […]
    根据LinkedIn创始人Reid Hoffman所说,Web 3.0将围绕两副牌斗地主为中心。 WIRED上的一篇最新文章掩盖了这一点。但是我发现这很有趣,因为我还阅读了Google数字营销传播者Avinash Kaushik的博客文章,该文章也有同样的说法。大两副牌斗地主莫两副牌斗地主,莫问题。有几件事会导致两副牌斗地主中断。在Web 1.0和2.0架构中,两副牌斗地主库是单一的,专有的,格式不同的和/或不存在的。现在,由万维网创始人蒂姆·伯纳斯-李(Tim Berners-Lee)在2009年构想的两副牌斗地主库驱动的网络正受到越来越多的关注。

  12. […]
    Turns out he is the father of the term “unknown unknowns” – things we do not know we don’t know – popularized by former secretary of defense Donald Rumsfeld 和 later by 阿维纳什·考希克(Avinash Kaushik) as “the unique space in which 大两副牌斗地主 分析员s should actually play.”

  13. […]
    Possible successes — if 您 will call it that: Web analytics (where data is so easily grown because of the ease of data collection 和 the number of people 和 actions that can be measured) 和 politics (which was, actually, driven quite a bit by web analytics).

  14. […]

  15. […]
    任何进入数字营销和分析领域的人都将很快认识到Avinash Kaushik是该行业的重要思想领袖。 Kaushik在Occam的Razor博客上发表了广泛的文章,内容涉及如何使用大两副牌斗地主来寻找能够及时地推动行动的见解。 Kaushik在他的博客文章“大两副牌斗地主势在必行:推动大行动”中承认大两副牌斗地主的潜力和挑战。

Add 您r Perspective