Data science demands elastic infrastructure
Those companies that try to run big data projects in data centers may be setting themselves up for failure. Matt Asay explains.
As companies struggle to make sense of their increasingly big data, they're laboring to figure out the morass of technologies necessary to become successful. However, many will remain stymied, because they keep trying to fit a necessarily fluid process of asking questions of one's data with outmoded, rigid data infrastructure.
Or as Amazon Web Services (AWS) data science chief Matt Wood tells it, they need the cloud.
While the cloud isn't a panacea, its elasticity may well prove to be the essential ingredient to big data success.
How much cloud do I need?
The problem with trying to run big data projects within a data center revolves around rigidity. As Matt Wood told me in a recent interview, this problem "is not so much about absolute scale of data but rather relative scale of data."
In other words, as a company's data volume takes a step function up or down, enterprise infrastructure can't keep up. In his words, "Customers will tool for the scale they're currently experiencing," which is great... until it's not.
In a separate conversation, he elaborates:
"Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on. You need an environment that is flexible and allows you to quickly respond to changing big data requirements. Your resource mix is continually evolving--if you buy infrastructure, it's almost immediately irrelevant to your business because it's frozen in time. It's solving a problem you may not have or care about any more."
Success in big data depends upon iteration, upon experimentation as you try to figure out the right questions to ask and the best way to answer them. This is hard when dealing with a calcified infrastructure.
A eulogy for the data center?
Of course, it's not quite so simple as "all cloud, all the time."
Data, it would seem, has to obey fundamental laws of gravity, as Basho CTO Dave McCrory told TechRepublic in an interview:
"Big data workloads will live in large data centers where they are most advantaged. Why will they live in specific places? Because data attracts data."If I already have a large quantity of data in a specific cloud, I'm going to be inclined to store additional quantities of large data in the same place. As I do this and add workloads that interact with this data, more data will be created."
Over time, enterprises will look to the public cloud for all the reasons Wood describes, but legacy data is unlikely to make the migration. There's simply no reason to try to house old data in new infrastructure. Not most of the time.
But some companies will find that they're more comfortable with existing data centers and will eschew the cloud. I'm not talking about hide-bound enterprise curmudgeons that shout "Phooey!" every time AWS is mentioned, either. No, sometimes the most data center-centric of companies will be the innovators like Etsy.
As Etsy CTO Kellan Elliott-McCrea informed TechRepublic, once Etsy had "gained confidence" in its ability to manage its Hadoop clusters (and other technology), they brought them in-house, netting a 10X increase in utilization and "very real cost savings."
Nor is Etsy alone. Other new-school web companies like Twitter have opted to run their own data centers, finding that this gives them greater control over their data.
You're no Twitter
As highly as you may estimate your abilities, the reality is that you're probably not an Etsy, Twitter, or Google. As painful as it is to say it, most of us are average. By definition.
This is what Microsoft's great genius was: rather than cater to the Übermensch of IT, Microsoft lowered the bar to becoming productive as a system administrator, developer, etc. In the process, Microsoft banked billions in profits, helping make a good sysadmin better or a decent developer good.
Regardless, all enterprises need to establish infrastructure that helps them to iterate. Some, like Etsy, may have figured out how to do this in their data centers--but for most of us, most of the time, Wood's advice rings true: "You need an environment that is flexible and allows you to quickly respond to changing big data requirements."
In other words, odds are that you're going to need the cloud.
Mindblowing blog appreciating your endless efforts in developing a truly transparent content. Which probably the best one to come across disclosing the content which people might not aware of it. Thanks for bringing out the amazing content and keep sharing more further.
ReplyDelete360DigiTMG PMP Certification Course
ReplyDeleteSuch a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
business analytics course
Really well-written and informative blog. Thanks for sharing this awesome blog. I really liked it. I hope keep sharing some more articles again quickly.
ReplyDeleteData Science Course in Hyderabad
This is a very nice one and gives in-depth information. I am really happy with the quality and presentation of the article. I’d really like to appreciate the efforts you get with writing this post. Thanks for sharing.
ReplyDeletePython classes in Amravati
Learn to master Data Science in real-time by doing hands-on exercises on real-time data science projects with the Data Science Training in Hyderabad program by AI Patasala.
ReplyDeleteData Science Institutes in Hyderabad
Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing.data scientist course in warangal
ReplyDeleteThis is nice and informative, containing all information also has a great impact on the new technology. Thanks for sharing it,
ReplyDeletefull stack developer course
The new wave of innovation that is changing the way people do business is called data science. Gain expertise in organizing, sorting, and transforming data to uncover hidden patterns Learn the essential skills of probability, statistics, and machine learning along with the techniques to break your data into a simpler format to derive meaningful information. Enroll in Data science in Bangalore and give yourself a chance to power your career to greater heights.
ReplyDeleteData Analytics Course in Calicut
Learn to use analytics tools and techniques to manage and analyze large sets of data from Data Science training institutes in Bangalore. Learn to take on business challenges and solve problems by uncovering valuable insights from data. Learn from the comprehensively designed curriculum by the industry experts and work on live projects to sharpen your skills.
ReplyDeleteData Science Training in Jodhpur
360DigiTMG provides you the best Data Science Course in Bangalore, with excellent training from the best trainers in the field and real-time projects, soon you will be an expert in the domain with the highest paid job.
ReplyDeleteData Science in Bangalore
Acquire a firm grounding in the theory of Data Science by signing up for the Data Science courses in Bangalore. Master the relevant skills along with all the essential tools and techniques of Data Science. Get to avail benefits like Flexible timings, Best industry trainers, and a meticulously crafted curriculum with hands-on projects that will give you exposure to a real-world working environment.
ReplyDeleteData Scientist Course in Delhi
Really impressed! Information shared was very helpful Your website is very valuable. Thanks for sharing.
ReplyDeleteFood Processing Consultants
avefrom.net is an online portal well-known for downloading videos from prominent video-sharing websites like Facebook, YouTube, Vimeo, Dailymotion VK.com, Veojam, and many more. en.savefrom.net remove
ReplyDeleteThanks! Very interesting to read. This is really very helpful. Data Science Course In Lucknow
ReplyDelete