{"id":1960331,"date":"2014-03-31T20:11:39","date_gmt":"2014-03-31T14:41:39","guid":{"rendered":"http:\/\/stockviz.biz\/index.php\/?p=1960331"},"modified":"2014-09-09T12:32:01","modified_gmt":"2014-09-09T07:02:01","slug":"big-datas-big-blind-spots","status":"publish","type":"post","link":"https:\/\/stockviz.biz\/index.php\/2014\/03\/31\/big-datas-big-blind-spots\/","title":{"rendered":"Big Data&#8217;s Big Blind-spots"},"content":{"rendered":"<div class=\"row-fluid\">\n<div class=\"offset1\"><a href=\"http:\/\/portalvhds29z8xdrqhczq.blob.core.windows.net\/wordpress\/2014\/03\/lost-in-big-data2.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1960341 aligncenter\" alt=\"big data's big problems\" src=\"http:\/\/portalvhds29z8xdrqhczq.blob.core.windows.net\/wordpress\/2014\/03\/lost-in-big-data2.jpg\" width=\"432\" height=\"243\" srcset=\"https:\/\/portalvhds29z8xdrqhczq.blob.core.windows.net\/wordpress\/2014\/03\/lost-in-big-data2.jpg 432w, https:\/\/portalvhds29z8xdrqhczq.blob.core.windows.net\/wordpress\/2014\/03\/lost-in-big-data2-300x168.jpg 300w\" sizes=\"auto, (max-width: 432px) 100vw, 432px\" \/><\/a><\/div>\n<\/div>\n<p>Yesterday, we <a href=\"http:\/\/stockviz.biz\/2014\/03\/31\/models-dont-lie-incorrect-assumptions\/\" target=\"_blank\">discussed<\/a> how theoretical models can be used to draw biased conclusions by using faulty assumptions. If the models then get picked up without an understanding of those assumptions, it leads to expensive mistakes. But are empirical models free from such bias, especially if the data-set is big enough? Absolutely not.<\/p>\n<p>In an article titled &#8220;Big data: are we making a big mistake?&#8221; in the FT, author Tim Harford points out that by merely finding statistical patterns in the data, data scientists are focusing too much on <em>correlation<\/em> and giving short shrift to <em>causation<\/em>.<\/p>\n<div class=\"row-fluid well well-small\">But a theory-free analysis of mere correlations is inevitably fragile. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down.<\/div>\n<p>All the problems that you had in &#8220;small&#8221; data exist in &#8220;big&#8221; data, but they are only tougher to find. When it comes to data, size isn\u2019t everything, you still need to deal with <em>sample error<\/em> and <em>sample bias<\/em>.<\/p>\n<p>For example, it is in principle possible to record and analyse every message on Twitter and use it to draw conclusions about the public mood. But while we can look at all the tweets, Twitter users are not representative of the population as a whole. According to the Pew Research Internet Project, in 2013, US-based Twitter users were disproportionately young, urban or suburban, and black.<\/p>\n<p>Worse still, as the data set grows, it becomes harder to figure out if a pattern is <em>statistically significant<\/em>, i.e., can such a pattern have emerged purely by chance.<\/p>\n<p>The whole article is worth read, plan to spend some time on it: <a href=\"http:\/\/www.ft.com\/cms\/s\/2\/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz2xNnFhsoS\" target=\"_blank\">Big data: are we making a big mistake?<\/a><\/p>\n<h6 class=\"zemanta-related-title\" style=\"font-size: 1em\">Related articles<\/h6>\n<ul class=\"zemanta-article-ul\">\n<li class=\"zemanta-article-ul-li\"><a href=\"http:\/\/stockviz.biz\/index.php\/2013\/12\/30\/placebo-effects-in-music-wine-medicine-and-finance\/\">Placebo Effects in Music, Wine, Medicine and Finance<\/a> (stockviz.biz)<\/li>\n<li class=\"zemanta-article-ul-li\"><a href=\"http:\/\/stockviz.biz\/index.php\/2013\/11\/09\/the-little-book-of-behavioral-investing-the-siren-song-of-stories\/\">The Little Book of Behavioral Investing: The Siren Song of Stories<\/a> (stockviz.biz)<\/li>\n<\/ul>\n<div class=\"zemanta-pixie\" style=\"margin-top: 10px;height: 15px\"><img decoding=\"async\" class=\"zemanta-pixie-img\" style=\"border: none;float: right\" alt=\"\" src=\"http:\/\/img.zemanta.com\/pixy.gif?x-id=c2db53a4-3e2a-48e6-ba70-46a8b0aa8018\" \/><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday, we discussed how theoretical models can be used to draw biased conclusions by using faulty assumptions. If the models then get picked up without an understanding of those assumptions, it leads to expensive mistakes. But are empirical models free from such bias, especially if the data-set is big enough? Absolutely not. In an article &hellip; <\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3471,9],"tags":[2761],"class_list":["post-1960331","post","type-post","status-publish","format-standard","hentry","category-investing-insight","category-your-money","tag-quant","entry"],"_links":{"self":[{"href":"https:\/\/stockviz.biz\/index.php\/wp-json\/wp\/v2\/posts\/1960331","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/stockviz.biz\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/stockviz.biz\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/stockviz.biz\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/stockviz.biz\/index.php\/wp-json\/wp\/v2\/comments?post=1960331"}],"version-history":[{"count":0,"href":"https:\/\/stockviz.biz\/index.php\/wp-json\/wp\/v2\/posts\/1960331\/revisions"}],"wp:attachment":[{"href":"https:\/\/stockviz.biz\/index.php\/wp-json\/wp\/v2\/media?parent=1960331"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/stockviz.biz\/index.php\/wp-json\/wp\/v2\/categories?post=1960331"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/stockviz.biz\/index.php\/wp-json\/wp\/v2\/tags?post=1960331"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}