During my trip to KDD2019 in August, I had the pleasure of sitting down to chat with the co-chairs of the conference, Ankur Teredesai and Vipin Kumar. In the interview that follows, we discuss the growth of the KDD conference over the years, and also it’s changing focus. You can read my KDD2019 Field Report HERE.
insideBIGDATA: Please give our audience a brief introduction including your role here at KDD and also a little bit about your day job.
Ankur Teredesai: Sure, my name is Ankur Teredesai. I am the chief technology officer and co-founder of a company called KenSci. I’m also a Professor of Computer Science & Systems at the School of Engineering & Technology, University of Washington Tacoma where I’ve been teaching since 2006. As part of my educational career, I’ve spent around 25 years focused on machine learning, practicing data science. I’ve really seen the revolution from statistics to big data, to AI, to data science and I’ve thoroughly enjoyed it. It’s a great place to be and I’ve been part of the security community so that’s my day job and my role. I lead the Healthcare AI community as well in shaping and forming our perception of how we make healthcare decisions more data-driven and how to change the perception of physicians that AI is here to help and not replace them. I’m on that big mission.
With KDD I
was the information director for KDD for the past 10 years. Essentially I was the
spokesman for KDD. I did all the web development tour and making sure that KDD
communications are in order, coordinating. So I’ve grown with the community as
the community has grown. I learn more about
this conference and how to handle different aspects of it. Two years ago
when there was a little bit of a gap where folks were not sure who’s going to lead
the conference in Anchorage because we’ve already decided Anchorage as a venue.
They asked me to step up and if I would take that role.
Vipin Kumar: My role would be similar to Ankur’s
because we are trying to make sure we put a team together, get out of the way,
and let them run the show. I guess the history goes back more than 25 years and
we do have professional staff to help so this is a big organization. In my day
job, I am a professor of computer science at the University of Minnesota. I’ve
been in academia for almost 36 years and working in this field all along. The
field’s names have changed from time to time. We used to call it artificial
intelligence. Then it became data mining. Then it became predictive analytics,
then data science and now it’s back to being called machine learning and AI. So
labels have changed but we keep doing the same thing. The generation of
computer algorithms have become more powerful every decade. We’re seeing the
times changing now, where everybody in the world is interested in AI and how
they can apply it.
insideBIGDATA: I think we’ve all seen a number of
so-called “AI winters” where the technology didn’t fulfill the promises, right?
Vipin Kumar: Or sometimes people expect too much. I
remember around ’84, or ’85, which was the previous big hype of AI. People were
purchasing computers worth $100,000 just to run AI algorithms. So it would be
in today’s terms, a three, four hundred thousand dollar machine. So nobody buys
AI on their desk at that cost. So the promise was overhyped. And it also didn’t
deliver. After a couple of years, it died down. But then the field of data
mining came up. It was much more focused. It wasn’t promising intelligence. It
was promising predictive analytics. It sort of jump-started, in many ways, as the
field of this conference. It had a big role of jump-starting, I would say, the
next resurgence. It naturally became more closely related with big data and
data science and then machine learning. It took 20 years to reach this point.
So we feel a lot of these ups and downs.
Ankur Teredesai: I frame it like a triad of three
forces coming together. The first one is the advent of cloud and making compute
very accessible and cheap to a certain extent. And now we have the compute
power that it is not $100,000 but $3,000.
Vipin Kumar: And it is 100 times more expensive
but no more powerful than what we have.
Ankur Teredesai: We sent a man on the moon with less
compute power than what we have in our iPhone, or even watches today. The
second force that I see has fundamentally changed or transformed, is the
reliability of data, especially in highly regulated markets. So it was
impossible for early data science or machine learning developers or scientists
to have access to regulated data sets like health care or finance or criminal
justice or banking, etc. And that has fundamentally changed the way that we
look at AI for wellness ethics. So they reviewed the policy but at the same
time more openness to explore issues. The third force, I believe, is really
regulation and in policy. So there’s much more awareness that without proper
infrastructure and investments from the government in shaping the policies on
the AI, it’s going to be all done in the wild. So those three forces have
really come together. To make the really concrete example of that, would be
Affordable Care Act back in the day. You had the high tech act that forced all
the health care systems to digitize themselves and make medical records electronic.
Then that’s formed a generation of data collection within the systems. With
Affordable Care Act, there was a huge incentive, to now make that data
actionable. The affordable health care act was not so much about patients,
honestly, it was more about making sure that decisions, that health care
systems are making are more accountable. So that policy shaped– combine that with
the availability of compute power to actually handle and manage that data, and
transforming so that it was ready for machine learning and AI, plus the
availability of the cloud.
insideBIGDATA: How has this sort of transformation
in our industry affected the conference, KDD? How has it evolved since 1995 when
the first KDD happened?
Vipin Kumar: Yes, but then it sort of dates back
to 1989, with smaller workshops. So 25 years of conference, five years before
that the smaller workshops.
So it sort
of started in the late ’80s and many of the sparks that you can see from the
field go back to the late ’80s, and if you go back to 1970 and earlier, so this
community of artificial intelligence, people were trying to build a machine
that truly could become intelligent. People were trying to investigate, even
back in those early days, how do you build intelligent machines? Then a group
of people started thinking about these algorithms that could do pattern
recognition and look at the images and then find things in them. So these
algorithms were sort of more– they were not looking for intelligence, they
were trying to get something done. So at that time, this community then was
pushed out of the AI umbrella and they started this conference. A lot of the
talks here, a lot of the work on AI, so the business community has sort of come
back together, I believe, with deep learning frameworks. Then, in the late
’80s, some statisticians and some algorithm designers independently developed
simple algorithms. That started showing promise and that sort of started this
trend of – what can we do with algorithms? And that sort of started this new
trend of data mining. What can we do with big scale data? Different generations
of algorithms have come about.
People have
been analyzing data for as long as we have been alive. It’s like astronomers
having to look at the sky and trying to figure out what’s happening up in the
heavens. But then the statistician came along and they started analyzing data,
doing the science of data. But the generation of algorithms that have come
about, say every 10 years, you can say for this decade this was the highlight.
Every day you see new innovations, and I think the confluence we see today of
machine learning is comprised of all of these developments.
I’ll give
you one more example. In 1950 one of the founders of computer science Herbert
Simon predicted that within 20 years computers would beat humans at
chess. And then 1970 came and nothing happened. Computers were still
struggling. The first time a computer was able to beat a chess champion was in
1995 with IBM’s Deep Blue with top class people working on it for decades. It
was considered to be a huge milestone where we could fulfill the promise of
Herbert Simon so many years ago. This algorithm had a lot of expertise
downloaded from the chess experts into the computer program that knew nothing
about chess other than rules.
We have to
realize that there have been developments in computing, tremendous amount of
computing power which nobody could have imagined that in the last 40 years, and
also data availability, and data regulation. But the generation of algorithms,
and this is what this community is about in the sense that the evolution came
because the computer science community kept developing faster and faster computers.
That credit can be fully claimed by the field of computer science because
that’s what computer scientists were designed to do. And the second thing is
that we are trained in our profession to come up with new tricks, new
algorithms, and new recipes.
insideBIGDATA: Given the acceleration of our
industry and I think the acceleration has just increased in the last five years
to an incredible extent. How does that play out?
Vipin Kumar: There has been acceleration happening every
decade that I can point to. This one is just so amazing.
insideBIGDATA: So how do you translate that acceleration
into content for the KDD conference? I mean, how do you feel a sense of how the
industry is changing, to make sure that you offer content at the show to
attract attendees and please them, and make sure that they get what they came
for?
Vipin Kumar: One way to think about it is that–
I attended this conference in the ’90s, versus I’m attending this conference
going back to ’95, ’96. So the question would be what kinds of things were
being talked about in the ’90s or what kind of things were being talked about
in 2000 or in 2010, versus now. I have all the proceeding going back to the
beginning on my bookshelf. One thing that you would notice is that today there
is not a single aspect of our life that’s not being touched by these
algorithms. You name it, and I will give example. If I can’t, Ankur will find
an example. I mean, you just think of anything – it’s amazing that areas we would
have thought that this technology would never touch, ever. But now we’re
finding applications there.
insideBIGDATA: I think that’s new. I mean, the fact
that it’s so pervasive. Every industry. Every walk of life is being touched by
it. I don’t think that’s ever happened before.
Ankur Teredesai: The only thing I would add to that is
there are conferences, and there are conferences. The one things that is unique
about KDD as a conference is the early founders of KDD and folks who attended
including women and others. So I’ve been involved in KDD for last 17 years or
so, and what I loved about the community the first time I attended the
conference was it’s a great home for both applied and theoretical researchers.
So that early interaction between folks who are ready real-world
application-minded brought in that agility to the conference. Where we didn’t
pin ourselves down to saying, “Hey, data mining is one thing, and that’s
all that we wanted to do.” We kept the doors open for evolving that community,
to shape it side by side with the industry. We always had industrial, but the ratios
used to be different where it was primarily an academic conference that had a
few industry participants. And then over the years, especially the last decade,
we seen a change in the numbers as we’ve gone a lot toward industry.
So if you
think about the structure of the conference, you have the research track and
you have your applied track. The applied data science track is very impactful
because this is where the industry gets to share things that are in progress as
well as deployed. So there’s a huge emphasis on deploying the algorithms that are
being developed in research. So that’s why in my talk yesterday, I was focusing
on how do we go there. So we have focused the last decade on going from
research to industry faster. But the question that I want to encourage the
audience to think is, “Should we focus on going faster, or is it time now
to focus on doing it better?” Sometimes those two can be very orthogonal.
And the position that I’m taking today is I’m putting a stake in the ground
saying we have understood how to go faster. Now, we need to invest significant
energy go deeper, and get better at translating the results from research to
industry. That’s one aspect of it.
The second
aspect of it is this recent introduction of accepting that data mining, data
science, AI, machine learning is starting to get very verticalized. There is
significant domain expertise that is needed in order to solve a problem
end-to-end. So Vipin has been working on multiple domains in his career from
collaborating with environmental folks to healthcare folks to advertising folks
to search engine folks. And that’s very characteristic. And I have done a
similar type of a journey where from advertising to social networks to
healthcare, etc. And what we have done now that is very interesting and
different is, we have added this concept of “team days” at the conference so
that you can start small movements and be inclusive of inviting
epidemiologists, inviting Earth and geospatial sciences researchers, inviting
folks that are working with other data science topics like deep learning, but
still find a home in KDD.
So I see
that as long as we continue to foster that spirit of diversity, of topical
thinking, and be more broad, this community will thrive.
insideBIGDATA: I think you’ve succeeded in
communicating that message of the quality of split between industry and academia.
Because I was at the conference lunch earlier today and I was talking to some
attendees. I asked “What is your perspective of KDD?” One gentleman
basically repeated what you just said. He reported that he’s gone to other
conferences like NIPS and few others and he said “This one is kind of
evenly split with industry.” So that was coming from some random attendee,
which is just pretty cool. So just briefly, referring to your crystal ball, what
do you think will happen with the conference in these next few years?
Ankur Teredesai: I think there’s going to be an
amazing growth in this conference. More than the conference, I feel very proud
that we have set up a community in the right direction because the conference
is just the tip of the iceberg. There is the whole iceberg of community
underneath it that helps ensure that the best minds in the world who are
working in this field submit their papers to this venue. The process of
reviewing those papers and making sure that they are high quality, meet the
bar, by ensuring that we continue the double-blind review process and make it
fair and open to everyone, not just those scientists who are well funded with
deep pockets and have access to specially controlled data sets. It comes out in
the paper. So investing in that is going to be the next big challenge for the
community.
In terms of growth, I think we have no doubt that next year in San Diego, we’re going to see 5000 people, hopefully. It’s just like any startup in any industry. The first thousand people are the hardest to get. Then once you have a whole community of your first 500 to 1,000 people, the acceleration from 1,000 to 2,000 is—troublesome, a lot of people come and go, but you sustain some sort of a movement for three, four years. When you reach a size of 3,000 in a logistically difficult place to get to such as Anchorage, people have expended time and resources to reach this place. And we had to cap the registration a month ago with 3,200 attendees. So we are there, and for next year and beyond.
Sign up for the free insideBIGDATA newsletter.
Leave a Reply