This article first appeared in the St. Louis Beacon, July 24, 2012 - When the Sloan Digital Sky Survey began operations in 2000, it took only weeks for it to acquire more data about the heavens than had been compiled in the whole history of astronomy, according to a 2010 article in the Economist. Over the next decade it built up a staggering 140 terabytes of information.
The large synoptic survey telescope, slated to open in 2016, is expected to assemble that much data as well.
In just five days.
That’s every five days.
Welcome to the world of big data where, from retail transactions to health-care records to YouTube uploads, companies and organizations are dealing with exponential increases in information that are starting to strain their ability to analyze and quantify what is being packed into their databases on a daily basis.
Gary Stiehr very much wants St. Louis to embrace that world. In fact, he thinks it’s here already.
“It’s something we really want to get integrated to the fabric of the city,” he said. “That may be a big vision, but I feel it has the potential to do that.”
Stiehr is describing the rather eclectically punctuated ///StampedeCon_2012.
Set for Aug. 1 on the Washington University School of Medicine campus, it will be the inaugural outing for the one-day event, which will feature nine speakers representing a diverse group of names like Nokia, Facebook, Monsanto and Kraft Foods.
Designed to appeal to everyone from business executives to software developers to system administrators, the topic is increasingly relevant in a complex world that is now creating information at a rate that would have been startling to human society just 10 years ago. Trying to make sense of it all is becoming a challenge of ever-widening proportions.
Enter Stiehr, who said it all comes down to the “three v’s” – volume, velocity and variety. The first is obvious. The second refers to the real-time nature of constantly updated information. The third deals with the free form aspect of the content in question, which is now more open-ended than ever before.
“You not only have standard structured data like a lot of companies collect where question one is yes or no and question two is a number, so you know what your data looks like,” said Stiehr. “Now you have all this unstructured data like Twitter or Facebook posts. Going through and interpreting that is difficult.”
But it’s often helpful for enterprises to do so if they want truly to comprehend the market they are engaging.
“That’s one of the reasons I really felt strongly about getting this going,” said Stiehr. “It’s going to apply to just about any business that wants to try to understand what their customers are like, wants to add value to their products. There is all this extra information-- whether it is comments on blog posts or Facebook or any of a variety of user-generated sources.”
That sort of information can be vital for looking at sentiment analysis and keeping an eye on one’s online reputation. But the issues don’t end there. With the recent Supreme Court health-reform ruling, preventive care is expected to be on the increase, Stiehr said. That will mean greater attention to analyzing patient records.
Meanwhile, companies can collect and quantify information from mobile applications and increase efficiency by understanding such mundane realities as how their data center is performing.
Retail is another area of focus.
“You can say, ‘what’s the lighting and electrical bill?’ in a particular part of your store that handles the pharmacy or the cosmetics area and ‘what are the sales from that area?’” he said. “You can start to look at the data collected and ask these questions in ways that previously were just too challenging to do for the IT infrastructure.”
Challenging but no longer impossible. Increasingly, Stiehr said, tools solidified by search engines like Google and Yahoo! are helping to answer questions about how to quantify massive amounts of information. The answer isn’t always specialized hardware or high-end software. Instead, less cumbersome, open-sourced solutions are becoming an option for those dealing with the data deluge.
As a St. Louisan, Stiehr said that big data could be a big deal for his hometown. A local IT infrastructure manager, he is the founder of the St. Louis High Performance Computing group. He said StampedeCon hopes to connect with local entrepreneurs because of the increasing amounts of venture capital that’s going out from both private firms and the government, fueling research and development for the nascent field.
“What we really want to do is get the word out about St. Louis and the fact that we have a lot of activity from high performance computing which transfers readily, so we have a good core of expertise here,” he said. “There is a lot of energy in the startup community, and you see a lot of momentum building, especially in the last couple of years. So we thought this is a great time to get this started, especially here.”
Stiehr hopes to present a way for those interested in the topic to avoid a flight to New York or California just to hear about it.
“We want to get entrepreneurs and startup companies around here familiar with the concept of big data and let them know there is a lot of demand for this,” he said. “By having the speakers that we have, we hope to give attendees a good view of the challenges.”