IAM

ARTICLE

A PhD in Numbers

Conducting PhD research can be a long endeavor, involving much more than the publications listed on Google Scholar. As I recently submitted my thesis, in this article, I look back on my time as PhD researcher in terms of numbers. This way, I hope to shed some light on what a PhD can look like in terms of everyday work.

Introduction

Update: Added a section on read papers.

Most PhD programs share the same goal of performing original academic research. In this sense, a PhD can be seen as a qualification to perform independent research and is usually required to work as a professor at a university or a researcher in general. In many disciplines, a PhD involves publications at peer-reviewed journals and/or conferences. This is generally the publicly visible part of a PhD: Publications are not only easily accessible online but often also involve talks or posters at the corresponding conferences.

Despite this common goal of conducting original research, PhD programs vary widely across disciplines, countries and institutions. This not only affects everyday work, but also determines preferred journals and conferences as well as the value placed on number and quality of publications and authorship. Even within computer science, habits can be radically different across research areas, resulting in very different PhD experiences.

In this article, I want to look back at the last 4 and a half years of my life and summarize my PhD in numbers. In the beginning, most PhD students primarily care about number of publications, posters and talks since these directly impact metrics such as citations or h-index. This is understandable since these metrics also drive hiring decisions later on. However, a PhD also entails many other activies, some of which requiring significant time commitments.

Time
234 Weeks
1169 Days
Four and a half years, broken down into weeks and workdays (5 per week), demonstrates the commitment of starting a PhD. Note that the days do not include any weekends. Unfortunately, depending on ambitions and culture, I found PhD students to work on weekends quite frequently, for example, to meet deadlines or due to traveling.

Travel
3 Physical Conferences
3 Physical Summer Schools
12 Virtual Conferences/Workshops
3 (Physical) Retreats
2 Travel for Talks/Awards
I traveled to 3 conferences and summer schools in person. These were all in the first two years of my PhD — before the pandemic. Usually, this took 5-7 days each, without any vacation but often including travel on weekend. In addition, we had 3 retreats — 5 days each — and I traveled for talks/awards 2 times — 8 days in total. During the pandemic, I (virtually) attended 12 conferences and/or workshops, but often just went through recorded talks in a fraction of the time that an in-person conference takes.

Conferences
5 Accepted
3 Rejections
1 Not Submitted/Withdrawn
Overall, I got 5 papers accepted at conferences, mostly clustered towards the end my PhD. This is because, some papers were rejected once or multiple times which "artifically" extends projects and makes many PhD students look more productive towards the end. Additionally including one paper that I ended up not submitting, nearly 80% of my papers did not work out on first try! This highlights that rejections, resubmissions and withdrawals are an integral part of doing original research.

Journals
1 Accepted
1 Submitted
2 Revisions
Considering journals, a paper can be accepted conditioned on a major or minor revision. At a conference, this paper might be rejected. At a journal, in contrast, it might be accepted by a after a revision if the idea or experiments are promising enough. However, this comes at the "price" of a significantly longer publication process. These revisions were part of all my journal papers and submissions.

Coding and Writing
∼ 1308 Code Commits (5.59/Week)
∼ 1234 LaTeX Commits (5.28/Week)
∼ 25 Public Repositories
Before getting published, every paper needs to be written. When working predominantly empirically, papers also involve quite a bit of coding. Unfortunately, I had to migrate all my repositories from Mercurial to Git in July of 2020. Extrpolating the 5.59 commits per week to a total of 234 weeks results in roughly 1308 commits in total — just writing codes for experiments. Ignoring vacation and travel, this is roughly one commit per work day, which is reasonable given that I worked on my own code base. On top, I needed roughly 5 commits of LaTeX for papers, slides and posters. However, this does usually not translate to one commit per workday as paper writing was usually limited to few weeks for each projects and does not account for OverLeaf. I published my source code (both experimentation and LaTeX code) in 25 GitHub repositories. Often, this also involved significant work to clean up, refactor and document code.

Reviews
97 Totals
30 Distinct Venues/Journals
Besides publishing myself, I spent quite some time reviewing for conferences and journals. In total, without accounting for workshops, I reviewed 97 papers. If I assume two hours per review, plus another hour for rebuttal/discussion or revisions, this results in 36 working days. This is a bit more than an average of one day per venue. Personally, I can only recommend starting to review as early as possible as it also helps to improve writing and stay up-to-date.

Reading
375+ Summarized
177 Summaries on ShortScience 850+ References
After being asked on Twitter, I tried to check how many papers I read — in addition to the reviews above. Luckily I developed a habit of keeping track of papers I read, by annotating them or writing summarizes and organizing them in a list of references. Turns out I wrote quite a few summarizes and came up with a significant list of references surrounding (adversarial) robustness in deep learning. This also highlights that reading papers and staying up-to-date take up a significant portion of a PhD and learning how to read efficiently and quickly identify relevant papers can be extremely useful.

Talks
17 Conference/External
∼ 18 Internal
15 Posters
Another big part of my PhD were any type of presentations — including talks and posters. In total, I gave 17 external talks, including talks at workshops or conferences as well as invited talks in other research groups. Moreover, these talks were often accompanied by a poster. On top, I gave roughly 18 internal talks — for example, in seminars, reading groups or team meetings. Internal talks often require less preparation, but are not less important for being successful as a PhD candidate.

Communication
17343 Emails Received (14.84/Day)
5755 Emails Written (4.94/Day)
∼ 211 Meetings (0.9/Week)
∼ 2549 Slides (10.9/Meeting)
Between October 2017 and March 2022, I received 17343 and sent 5755 emails. This is despite using other communication channels such as Slack, as well. On average, this amounts to 15 received and 5 sent emails each day. I also want to emphasize that these do not include many automatic emails, for example, from IT ticketing systems. On top, only counting meetings with my PhD advisors, I created roughly 2549 slides to prepare for roughly 211 meetings. One advantage of having two PhD advisors is the frequency of meetings: I had a meeting nearly every week — in 90% of weeks.

Students
26 Interviewed
5 Worked With
2 Supervised Theses
2 Workshop Papers
Working with students is another interesting part of doing a PhD. Unfortunately, in Saarbrücken, there are many good research groups in computer science. As a result, there was a lot of competition in terms of finding good students. Nevertheless, I worked with 5 students and supervised two theses. Two students managed to get workshop papers accepted. Note however, that I had to interview a total of 26 students to find suitable candidates to work with.

IT
0.5-0.75h/days IT Admin Duties
1559 Ticket Emails Received
As in many groups, PhD students have duties besides research. This might include organizing retreats, assisting with teaching, or — as in my case — being responsible for IT issues. I spent, on average, 30-45min per day. This amounts to roughly 6-10% of my time. This is based on the first year of my PhD where I actually kept track of the time I spent on IT-related topics. While I couldn't look up the exact number of tickets I worked on, I received a total of 1559 automated ticket notifications throughout my PhD.

Finally, I also want to highlight some aspects where these numbers would be 0 — things I did not do at all. Nevertheless, I saw many PhD students spend a lot of time on some of these things, so I wanted to higlight them:

  • 0 workshops/challenges organized — started working on one, which did not work out.
  • 0 research vitis — mainly due to the pandemic, but interned at DeepMind instead.
  • 0 retreats organized — retreats in our group where organized by PhD students.
  • 0 teaching assistantships — I was in charge of IT duties instead.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.