What the Heck is Big Data and What Do I Do With It?
iLunch Session – February 20th – Ontario Investment Centre
Hosted by Interactive Ontario, this session explained the opportunities for data use—from the changes to the ways that data is being collected, as well as the tools available to search, analyze and visualize that data.
(Moderator) Emmanuel Evdemon - Entertainment Counsel
Entertainment Lawyer at Entertainment Counsel
Ray Sharma - XMG Studio
Founder and President of XMG Studio Inc.
Ali Ghafour - Viafoura
Co-founder of Viafoura
Jimmy Fan - Kontagent
VP of Business Operations at Kontagent
Bilal Khan - OneEleven
Managing Director of OneEleven
Emmanuel: Big Data differs from market intelligence that we used to gather. Technology has allowed us to use much larger amounts of data – more quickly and from different sources as well. Some people have added into the mix a voracity – finding more relevance with the data. Rather than having to compare chosen sets, you’re able to thrown a huge amount of data in and find the trends, correlations, and voracity between different components.
Jimmy: In my business, we primarily help mobile focused business. By providing comprehensive analytics to understand their users, data science insights to make smart decisions, and the in application tools to take action immediately. We’re in 22,000 mobile applications worldwide - covering 400 million monthly active users and tracking 370 billion events per month. That’s where big data comes in for us.
Ray: XMG Studio is a mobile games studio – a startup that is dearest to my heart- and we’re trying to understand what to do with our data. We have a fashion game application – that for 21 months in a row has been top 25 in the US and engrossing. We know what the favourite colour is in January versus July, favourite dress in New York state versus Ontario, and over 95% of our users are female.
Ali: At Viafoura - some of our clients are large media companies and we help them monetize their engagement. We provide the analytics behind the large data that’s gathered and interpret user behaviour from that.
Bilal: One Eleven is Canada’s first big data accelerator. We’re an ecosystem development player – so we are focused on later stage entrepreneurship. We have 3 core applications – a community of top tier entrepreneurship, we’re in the process of building infrastructure that will allow our entrepreneurs access to high process computing powers so they can work through the data sets, and our third application is what I call our center for commercialization, which is really creating the intersection between industry and entrepreneurship and having those two worlds collaborate in the data realm.
Emmanuel: And that’s why they’re our esteemed panelists – that’s amazing. Perhaps we can ask Jimmy, Ali, or Bilal, to discuss the technical perspective definition of big data versus a business one. Is there a difference?
Jimmy: Even if you look at the different definitions on big data online, there is no definitive definition other than some surrounding characteristics. From my technical perspective, data becomes big data when you can no longer use one big computer and an excel spreadsheet and need to start clustering multiple machines to collect, store, process, and query that data. But it’s really the business aspect that helps complete that definition – which is the types of questions you start asking. The questions you start asking become the ones you want to understand more; for instance, with web analytics it can be the most popular pages, links, etc. But that’s really an artifact of the limitations of technology – what you really want to ask is – who are my most valuable users and how do I keep them? So shift the question to content centric analysis to user centric analysis and understand them throughout their lifecycle. It’s about finding patterns about users and take action that will be valuable. So to me that’s my definition.
Ali: Yea – that’s pretty interesting. In the mobile world, they were kind of forced to be better in describing a user centric perspective. I think the web world is trying to catch up with that. So back to the original question – two perspectives on what big data means. So I’ll start with the business focus first – if you have a question you need to ask and can’t find an answer, then you usually have a big data problem. And for technology, if you have to start paying a lot of money to third party providers – then that’s probably big data.
Bilal: Yea from a very non-technical perspective from One Eleven, it’s very hard to define what big data is but you know it when you see it. And from a commercial perspective big data is everything. And I’m not exaggerating when I say that – our cell phones are constantly picking up data about us on a daily basis. All of our map programming that we use; for instance GE will be a major big data player and are building sensors into all their manufacturing equipment.
Ray: I would define big data as finding non-linear relationships within the data - that’s my simple definition. An example I want to give you is the 2012 U.S. election. I was watching a presentation from Obama’s Chief of Staff and they were talking about doing things differently – his thesis was that if they did the same thing in 2008 that they would lose and Obama thought “well…we did OK in 2008 didn’t we…?.” 2008 was the election where social media was big and 2012 was the election where big data came to the forefront. Everyone individual in the U.S. got a customized message from Obama – what they did was they looked at your demographics and your location, cross-referenced it with your social graph (who you’re connected with on your social network…and sometimes you can’t get a lot of information from this). Everyone got a custom message from Obama or either his wife – the single mother living on one side of the street got a different message from the family of four living across the other end. Another infamous example, around 2005-2006 in Canada, company called GoldCorp, they are about a 100$ million dollar company and they were almost out of money. They had a tremendous amount of mining data in this area where there is tons of gold but couldn’t find any. They put a prize as a last hurrah for a half a million dollar prize, put it on the web, and two developers out of Australia said that the gold was here, here, and here – and the gold was. Today GoldCorp is the second most valuable company in the gold capital point of view. These are two good examples but there are lots more.
Bilal: And just to chime in on big data is playing a big role in the controversial space as well. With medical genetics, identifying trends, genetic sequences, cancers before they happen. In Los Angeles, they are starting to use it to determine crime in certain areas. Many of us have watched Minority Report, which is reminiscent of how data is being used. It’s really a fascinating space in society.
Emmanuel: With mining traditionally, their data is being kept under lock and key. And this time it was really different since they had everybody look at it – so a lot of people thought it was crazy at first! So with the opening up of big data to the public, do you see how business with big data is changing? Is that a business trend we are looking at? Companies like Google and others, if they are changing what we traditionally thought of as business?
Ray: My personal belief is that the big data trend is so big, that just like everyone has to have web strategy, everyone will have to have a big data strategy. Every business will be competing on the basis of how they interpret their big data – they can choose not to and just go with what they feel and be artistic and see how things go. But this is going to be a dimension of competition for everybody. The government now has a project called “Open Data” – and let me spend a minute on this because it relates to your point. I’m working with the federal government and they have about 200,000 data sets available on data.gc.ca – and they’ve got really obscure data! For instance, Canada launched a global nuclear launch detection network – a certain type of frequency indicated a nuclear launch event. Immigrants are one of the most popular data sets. Just so you know, all the data will be screened for privacy and security before it’s put out there. Every government agency, every regulatory body, and every constituent that feeds data up to those agencies – this is the mother of big data and it’s just being released to the world. We’re actually working with Bilal at One Eleven and that’s where we’re hosting the big Hackathon event – which the government is sponsoring.