With “Foundations of Computer Vision”, Antonio Torralba, Phillip Isola, and William Freeman take a closer look at a changing field
If you were to assemble a “dream team” of professors to teach you about computer vision, it would probably include Antonio Torralba, Phillip Isola, annd Bill Freeman. In April, that dream became a reality with MIT Press’s publication of “Foundations of Computer Vision”, an 800+ page textbook aimed at bringing clarity and coherence to a dynamic research field–and its roots. The textbook has garnered praise from many corners, with Alexei Efros, Professor of Electrical Engineering and Computer Science at UC-Berkeley noting, “It is as if Bach, Mozart, and Chopin were to collaborate on a music textbook”. We sat down with Torralba, Isola, and Freeman to learn more.
Computer vision is an enormous topic–how did you decide on the scope, level, and specific goals of the textbook? Did you have a vision (pardon the pun) of your ideal student / audience?
Bill Freeman: We felt the field of computer vision lacked a coherent story and we wanted to provide that. Of course, during the time that we wrote the book, the field underwent a revolution, moving to neural network techniques, which now dominates other methods. That created a new need for a coherent story: how do the new methods relate to the older methods? What parts of the older methods should survive? So some parts of the book show the underpinnings of current methods and relate them to more classical techniques in computer vision or signal processing. Another tension in the field is vision versus learning. Is computer vision just machine learning applied to images, or is there a science of vision that we need to keep in mind? We think there is science of vision, and one of my goals for the book was to “put the vision back in vision and learning”.
Antonio Torralba: We wanted to write a book that could transmit our passion for the field, not just as an engineering discipline, but also as a science. The book makes connections with other disciplines (like visual psychophysics, neuroscience, …) and across time (presenting both classic and modern methods).
Even for three authors, an 800-page textbook is daunting. How long have you been working on Foundations of Computer Vision, and how did you divide up the tremendous amount of work to be done?
Bill Freeman: Based on our first email to MIT Press, we’ve been working on the book for 13 years. The division of labor was organic, often based on who gave which lectures in the MIT computer vision class we all helped to teach. In later years, as the field raced ahead, Antonio and Phillip wrote more chapters and my parts of the book became smaller. We envision two primary audiences for the book: students learning computer vision (in courses or by themselves), and current researchers in computer vision.
Antonio Torralba: I have the impression that we started to write the book because we thought that the task would be easier. It is hard to start to write the first page if you think you still have 799 more pages to go.
You decided to specifically cover issues of ethics and fairness in the textbook; why are those topics, specifically, of current concern in computer vision?
Bill Freeman: Computer vision is having a huge impact in society, safety, surveillance, law enforcement, advertising, with much coverage and discussion in the popular press. We felt it was important to join that discussion, and to raise those issues for current researchers and students joining the field.
You emphasized, in the book’s summaries, that you wanted to cover both classic computer vision and machine learning; in your experience, do students need to understand one to grasp the other?
Phillip Isola: These days yes, machine learning is at the core of most computer vision systems, and it’s important to understand it. One way to think of computer vision systems is that they contain some parts we can write down equations for, and other parts that are so detailed that no human can work it all out. For these latter parts, we need algorithms that automatically engineer the systems for us, and that’s where machine learning comes in. Another perspective is that learning also plays a big role in how humans see. We learn how to recognize different objects through experience with them. So learning-based vision is very much in line with biology, and biological vision is still the best vision system we know.
Antonio Torralba: Researchers have been thinking about vision for a long time. Many of the current approaches were seeded by the previous work. To be able to innovate, one needs to have a good culture of the field and its history.
You noted that you wanted to write a book that presumed very little pre-existing background knowledge. Once a student has taken a course with your textbook, what will they be prepared to do next? What’s the *next* learning step after this book, in your opinion?
Bill Freeman: The next step, after the book, and even while reading the book, is to jump in and join the field, applying the techniques to your own problems, developing new techniques, and/or reading, with background knowledge from the book, what others are doing. We included chapters on doing research, writing papers, and giving talks. We really want the book to help people as they join the field. For someone who doesn’t plan to do research, we hope the book will illuminate the vision aspects of the AI revolution.
MIT EECS is now two years into offering 6-4, a groundbreaking major in Artificial Intelligence and Decision-Making. How did developing the major help inform the creation of this textbook?
Antonio Torralba: the new major is a large project that involves a wide range of disciplines. The computer vision class has undergone multiple changes over the years, and new classes that have been created for the major, had an impact on the book. For instance, Phillip, together with Stefanie Jegelka, created the new Deep Learning class.
How did you decide what to leave out?
Phillip Isola: We didn’t try to cover all the latest systems and applications. There are just too many of them. Instead we tried to cover the fundamentals that we think will be here to stay.
You note that this book relies extensively on illustrations and examples honed within your own classrooms. Tell me about the process of refining those illustrations–how do you know when a specific drawing or example is communicating exactly what it needs to?
Phillip Isola: My favorite part of working on the book was making the illustrations. One thing we tried to do is make the figures show real data, rather than just being cartoons. From teaching, I had many cartoon figures that had been effective at giving an intuition but weren’t always perfectly accurate to reality. I tried to remake these figures but using real systems and real results. Sometimes it worked and reality matched the cartoons very nicely. Other times my intuitions were off, and making it real I found new ways to present things. In the end, for most of the figures, you can trust that they are not hiding much, this is really how the data look.
Antonio Torralba: We did put a lot of effort into making new figures that illustrate the concepts covered in the text. It was quite fun to do.
Did you solicit student input or feedback on any of your chapters? How have your own students responded to the publication of the book?
Bill Freeman: Yes, we gave drafts of chapters to colleagues who were experts in those topics, to hear their feedback and corrections. Their comments were crucial, and we thank them in the preface.
Antonio Torralba: Also, some students helped with some of the experiments, and also to read the book and improve the clarity.
The AI and ML fields are rather notorious for a rapid pace of development, which must make developing any kind of educational curriculum, from a course, to a major, to a textbook, challenging. How did you balance trying to give this book a long lifespan, while keeping its material as current as possible?
Bill Freeman: We kept on writing, while our editor urged us to “Let It Go”, like the song.
Antonio Torralba: We started writing this book when computer vision approaches did not work very well. It was an exciting area of research, with a lot of promise and it was starting to have a real impact, but most solutions were very brittle. We ended the writing when one could have an AI writing pieces of the book! So, yes, the advancement has been amazing.
To give the book a long lifespan we have focused on the foundations and not so much on the state of the art or specific implementations. Most of our examples are chosen to illustrate the basic principles of each approach.
How do you foresee your book being used outside of EE, CS, and AI & D departments? Do you feel there are areas where it really helps students to have that dual literacy in computer vision and another field?
Bill Freeman: We have students in the computer vision class from many departments, including urban planning, architecture, mechanical engineering, physics and biology. As computer vision becomes more reliable, it will be a tool with broad impact.
If you could snap your fingers and increase the widespread public’s knowledge of computer vision immediately, what is one key point you would like all laypeople to understand?
Phillip Isola: Look at the pixels! Look at the data. Computer vision is all about the structure of visual signals, and modern systems are mostly about learning from visual data. So you really have to look closely at the data, the pixels, to understand how these systems work, and what is going on when they make mistakes.
Media Inquiries
Journalists seeking information about EECS, or interviews with EECS faculty members, should email eecs-communications@mit.edu.
Please note: The EECS Communications Office only handles media inquiries related to MIT’s Department of Electrical Engineering & Computer Science. Please visit other school, department, laboratory, or center websites to locate their dedicated media-relations teams.