If you've never experienced impostor syndrome, try being a data professional. There's so much going on: from research papers to new tool categories, major acquisitions, and constantly evolving architectural paradigms. Staying on top of it all is nearly impossible. However, one way of keeping track is by relying on experts in each subfield. They are your gatekeepers, filtering out what is essential and what isn't.
Not only is following and engaging with influential experts an excellent strategy for staying on top of things, but it can also be a rewarding form of networking. Slack, Twitter, and LinkedIn are the channels par excellence in which data discussions are happening. Engaging with them is probably the shortest path to landing a new gig, co-authoring an article, or collaborating on an open-source project.
In this article, you'll get to know fifteen experts who are influencing public opinion within their field of data expertise. There's no strict formula behind the decision to list them, but the following aspects were taken into account:
- The number of their followers on Twitter or LinkedIn
- The size of the organization they work for
- The seniority of their role in the organization
- An overall sense of the ripple they create through adjacent channels, such as Slack, email, and blogs
- The quality and originality of the content they produce, such as social media posts, opinions, articles, books, and podcasts
Finally, we also ensured that the various subfields of the data industry are covered: data science, data engineering and analytics, data governance/management, and data technology.
If one person was there at the advent of machine learning (ML) and has played a part in its spectacular growth, it's Andrew Ng. He coauthored dozens of online courses on data science and collaborated on hundreds of research papers. He publishes the latest news and interesting applications in the field of deep learning on his personal website and through The Batch, DeepLearning.AI's newsletter.
You should follow him if you want to stay on top of the work of deep learning's avant-garde.
When it comes to explaining complex topics in ML to laypeople, Kozyrkov is the queen. Her YouTube channel (Making Friends with Machine Learning) contains hundreds of videos where she introduces viewers to various topics and outlines tips and tricks for training ML systems.
She's also an astute writer and publishes opinionated pieces regarding decision science, business intelligence, and data science on her Medium blog.
Chip Huyen is a leading voice on MLOps and ML engineering. She writes comprehensive articles on her personal blog about ML systems. She authored Designing Machine Learning Systems, a referential work on creating and deploying algorithms in a production environment. She also teaches at Stanford.
She only writes a few articles a year, but they're thorough, thought-provoking, and usually widely shared throughout the datasphere. If you're scared of the technical side of machine learning, her piece "Why Data Scientists Shouldn't Need to Know Kubernetes" is a good starting point.
If you're into video content about broad topics in data science, Daliana Liu is the one to follow. On her YouTube channel, she invites prominent guests and covers inspiring data science challenges.
It's hard to tell where you'll see Joe Reis next. He has been a guest on various industry podcasts (such as The Data Stack Show, Data Engineering Podcast, and Super Data Science Podcast with Jon Krohn, speaks at events, coauthored The Fundamentals of Data Engineering, and has a blog with widely shared articles. On top of that, he has interviewed many prominent data influencers himself, which he livestreams on his LinkedIn profile.
Reis is your go-to influencer if you want to understand the benefits and pitfalls of the various data engineering architectural patterns and frameworks.
The Seattle Data Guy, or Benjamin Rogojan, is truly an authority on data engineering. Furthermore, he's a super active blogger on Medium. He churns out opinionated posts on data engineering tools and architectural patterns every two weeks. The scope of his YouTube channel is a lot broader, ranging from career advice to tool reviews. Also, have a laugh at the data memes he posts on his Twitter account and LinkedIn profile.
The number of Substack posts that Randy Au has written is simply astonishing. He covers topics ranging from UX to data science and data quality. However, the posts that created the biggest ripples are primarily about analytics: he's very adamant about data cleaning and data collection being integral to an analyst's job.
There's no one more influential in analytics than Benn Stancil. He mainly writes about the processes, tools, and roles required for analytics, but he's also an important voice in the debate regarding the modern data stack. His highly opinionated Substack posts are some of the most shared articles among data analysts.
His opinions are also distributed via The Analytics Dispatch, Mode's weekly newsletter, a modern web-based analytics tool he cofounded. If you want to talk to him and discuss his articles, find him on Twitter or head over to the Locally Optimistic Slack channel, where he enjoys something close to guru status.
As CEO of dbt labs, Tristan Handy has a lot to say about analytics (engineering). His data modeling and transformation tool basically gave birth to the modern data stack and the ELT paradigm.
Handy is a frequent guest in data-related podcasts. However, if you want to hear from him on a recurrent basis, follow his personal Substack or the dbt newsletter known as *The Analytics Engineering Roundup*.
When Zhamak Dehghani published her famous data mesh article "How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh" in 2019, she probably didn't expect the traction it would get. It set off a paradigm change in how managers look at organizing their analytics skills across a company. Her work culminated in her book Data Mesh: Delivering Data-Driven Value at Scale.
When she started Monte Carlo, Barr Moses appeared to be everywhere. Throughout 2019 and 2020, you couldn't escape her nor her company. During that period, she coined and popularized the terms data reliability and data downtime.
Regarding data quality and observability, in particular, no one has more to say than Moses. She's talked about it during numerous online and in-person events and has been featured on various podcasts. Want to know more? She's an avid Twitter and LinkedIn user and regularly writes high-quality Medium posts.
Another prominent figure who appears to be everywhere is Prukalpa Sankar of data discovery tool Atlan. She's super active on LinkedIn and Twitter, maintains a high-quality Substack newsletter, and attends many in-person events in the Bay Area.
Although much of the content she produces is Atlan-related, she remains a leading voice on data quality and data catalogs.
While Maxime (Max) Beauchemin is a data engineer at heart, he has built two famous open source tools and commercialized them: Airflow and Superset. He's not very active on social media and has just a few articles on his Medium. That said, Beauchemin has written remarkably influential pieces on various platforms regarding data engineering and the modern data stack:
- The Rise of the Data Engineer at FreeCodeCamp
- "The Downfall of the Data Engineer" on his Medium
- "How the Modern Data Stack Is Reshaping Data Engineering" on the Preset blog
As mentioned, there's no particular platform on which Beauchemin is super active. However, if you've subscribed to other influencers' channels, you'll know when he's created a new piece of content. He's excellent at unpacking how new tools and paradigms affect the daily jobs of data professionals.
Few people know more about data technology than Matt Turck. There are two main reasons for that:
- Every year, he publishes an overview of all data tools by category on his blog with an accompanying article that describes how the market is evolving. The accompanying visuals with hundreds of icons are so popular that they've become commonplace on pitch meeting slides of consultancy firms.
- He interviews many prominent figures in the data industry and publishes videos and transcripts on his blog.
Although Python has become the lingua franca for many data tools, there's still a huge R community. Their leading light is Hadley Wickham. He's involved in dozens of the most used R packages. You're using his work whenever you're loading a package from the tidyverse, like ggplot2 or dplyr.
He developed the most popular R GUI (RStudio, soon Posit) and coauthored a handful of R-related books. He's also a professor in statistics at the universities of Auckland, Rice, and Stanford. In the R ecosystem, nobody is more influential than Wickham.
This article listed fifteen influential figures in the data industry, more specifically in data science and machine learning, data engineering and analytics, data governance, and data technologies. Don't hesitate to follow them today.
If you found this article useful, head over to the Ikigai Labs blog page for more inspiring content. Ikigai Labs can support you with data-driven analysis and insights and is an ideal solution if your organization is looking to streamline data-heavy operations.