Interest in data science grew >5x during the last 5 years as you can see above.
However it is still not clear to many how data science consulting is different than regular consulting. After all, consulting is supposed to be about making data-driven decisions. A critical difference is that data science consultants leave their clients with reusable operational models. However, most regular consulting projects answer important but one-off questions and do not leave clients with operational decision making models.
There’s a lot more to data science consulting and we answer all your questions:
- What is data science consulting?
- What’s the data science consulting ecosystem?
- What to consider when choosing data science consultants?
- What are the pitfalls for data science consulting projects?
What is data science consulting?
Data science consulting is the activity to effect change by building up the client’s analytics skills, developing competencies, and understanding of the machinations of their business.
This process can be categorized into four main headings. Strategy, consulting, development, and training.
The strategy part of the consulting explores what’s possible with data and makes a plan.
This part requires extensive knowledge regarding the use cases. Depending on the client’s industry the data collection method, regulation, and objectives can be completely different. For one case, the objective can be optimizing the energy consumption of a plant, which can be achieved through collecting the data through machinery and getting the necessary paperwork from the business owner itself, whereas for an FMCG firm, trying to create a data pipeline to maximize the sales, the data collection can be limited by red tape, consumer protection and personal data protection requires considering the legal side of the work.
Collaboration between different departments is key to success. Business and IT side of the client need to be present for the definition and possible solution for the problem.The nature of data science makes the process more interdisciplinary and interdepartmental.
Strategy usually answer the following questions 6 questions;
- What to do?
- What to collect?
- How to collect?
- Where to store?
- How to protect?
- How to implement?
Consulting is the traditional step of the process. This is the step of using data-informed answers to your business questions. This way the strategy part is turned into insights. The nature of the consulting part makes it depend on the strategy part. Usually, consulting is the natural result of strategy development. But, this may cause the conflict of interest if the validity of the model is evaluated by the same people providing the consultation. Therefore, separation of the steps makes it easier to find and spot the problems and insights.
Usually, consulting part revolves around 4 main questions;
- What is the insight?
- What needs to be done?
- How it should be done?
- What should be the process?
Development is the activity of designing and building a modern data product or internal tool. This is more like the IT part of the data science consulting. Custom tailored solutions for specific problems requires a heavy emphasis on the development process.
This part has 3 main subjects as Steve Ballmer previously stated:
Developers! Developers! Developers!
Training is boosting the data literacy of the client’s team. This would make sure the rest of the team is aware of the process and integrated into an improvement of the system. This would also ensure that the team would be able to capture the main points and provide a meaningful contribution to continuous improvement of the entire process. Feedback mechanisms can function well if the staff can provide the real-time effectiveness of the data mechanism.
Data Science Consulting Industry
The industry players can be categorized into four types. These are MBB, Historical Tech Companies, Start-ups, and Big-Data-Big-Companies
These are the traditional consulting firms. With their professional services, they have been serving their clients for a while. Now, they are updating and upgrading their activities with more data-supported services like advanced analytics.
McKinsey set up specialized teams for data analytics and there are some other ventures established specifically for this purpose. QuantumBlack is one of them. It was established in order to reimagine how organizations could continuously improve and outlearn their rivals. They provide services for various industries.
BCG set up BCG Gamma for their advanced analytics unit. BCG Gamma team comprises world-class data scientists and business consultants who specialize in the use of advanced analytics to get breakthrough business results. BCG Gamma combines advanced skills in computer science, artificial intelligence, statistics, and machine learning with deep industry expertise.
Bain provides its data science specific consulting activities through Bain Advanced Analytics Group. Their work focuses on three disciplines—primary research, advanced analytics and Big Data—and is rooted in our technical expertise, client experience and our knowledge of the latest data collection, analysis platforms and tools. We bring the right mix of disciplines to each client, recognizing that every challenge is unique.
Historical Tech Companies
This category most important players are IBM and Accenture.
Accenture Analytics provides Big Data and related Technology services to businesses and organization seeking to harness the power of big data analytics. Accenture invests heavily in R&D, academic alliances, and incubation of emerging technologies to advance the industry’s thinking around big data and analytics. Our 900+ data scientists currently serve more than 2,000 analytics clients, 70 of which are Fortune Global 100 companies. To date, we have helped more than 50 global clients use their data to generate data equity, business value, and competitive advantage.
IBM provides Big Data Consulting services. Big Data Services provides strategy, engineering, portfolio, and organization services to support your big data efforts. These services include implementing and providing ongoing maintenance, enhancement and support of big data, analytics and cognitive solutions and capabilities.
Big Data startups are emerging quickly because Big Data itself is quickly moving from emerging technology to mature technology. Companies that were startups five years ago are now key players. These are the junior players in the industry. With highly diverse capabilities. They are generally providing their services in a specific type of industry.
Datascope Analytics is one such firm. Datascope is a data science consulting company. They work closely with their clients, using creative processes inspired by the design community to help clients identify valuable and innovative ways to use data. They also make these ideas a reality, building out everything from quick proofs of concept to scalable production systems.
Maana is another example. Maana is the pioneer in knowledge-centric technology. The Maana Knowledge Platform turns human expertise and data into digital knowledge for employees to make better and faster decisions. They are backed by multiple investors.
These are the firms dominating the big data industry. They managed to grow so fast and dominate the market. They have the first comer effect, and basically, they are the ones shaped the practices and rules.
Cloudera is one of them. Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites. Founded by leading experts on big data from Facebook, Google, Oracle and Yahoo, Cloudera’s mission is to bring the power of Hadoop, MapReduce, and distributed storage to companies of all sizes in the enterprise, Internet and government sectors.
The other one is Palantir Technologies. Palantir Technologies Inc. develops and builds data fusion platforms for public institutions, commercial enterprises, and non-profit organizations worldwide. The company offers Palantir Gotham, a platform that integrates, manages, secures, and analyzes enterprise data; and Palantir Metropolis, a platform that integrates, enriches, models, and analyzes quantitative data.
3 Factors to Consider when Choosing Data Science Consultant
Do the team members have advanced degrees?
This is one of the major factors for deciding on who to work with. Data science is increasingly becoming an industry dominated by people claiming to be a data scientist. Usually, a Ph.D. is one of the best functioning proxies for that. Brock Ferguson’s guide shows the journey of academic becoming a data scientist. These are showing how much dedication is needed.
Do they have enough experience?
References matter. It is also important to see that the consultants also experienced a project in a similar setting. This also shows that the consultant can put meaningful insight and knows the practices in the specific industry.
Can they provide a long-term plan?
You need to make sure that the plan provided by the consultant is viable and can be upgraded regularly. Data science is a field experiencing constant improvement so it would be important to see the potential they can provide. Think about it as a long-term investment, you may need consulting again and updates so make sure they can provide the greater planning horizon
Pitfalls for data science projects
Kaggle, the data science competition community, had a survey asking data scientists the barriers they faced at work. Most of their answers shed light on the things that can go wrong on data science projects:
Out of these problems, 3 categories are relevant for data science projects:
- Data related issues
- Dirty data
- Data unavailable or difficult to access
- Privacy related issues
- Organization/project related issues
- Lack of management support
- Lack of clear questions to answer
- Result not used by decision makers
- Lack of domain experts
- Need to coordinate with IT
- Integrating findings into decisions
- Tool limitations
In summary, your data science project is as good as your data and your organization. With high quality data and a committed organization, you would already remove most important barriers to data scientists’ efficiency.