ANEESH KUDRIMOTI- MARCH 22, 2023
EDITOR: AAYUSHI SINGH
In this paper, I aim to investigate whether there is a causal relationship between lockdown (i.e. social-distancing) measures during the COVID-19 pandemic and collaboration in economics research. To investigate the impact of pandemic lockdown measures on collaboration, I randomly selected a set of NBER working papers: one group of 30 papers published before the implementation of significant lockdown measures in April 2020, and another group of 30 papers published after these measures were put into place. Using a series of regression models, I analyzed the data to isolate the effects of the lockdown measures on collaboration and overcome possible gender bias. However, based on my analysis, I was unable to reach a clear conclusion about the effects of pandemic measures on collaboration.
(1.1) Background/Motivation of Study:
A great deal of research has been conducted on the negative implications of the pandemic on productivity, innovation, and mental health in academic research. This is not surprising as initial lockdown restrictions forced many research institutions to shut down or reduce their operations, resulting in delays in existing research and a sudden shift to remote work. However, little research has been done on the flip-side of this phenomenon. Running parallel to the perceived negative effects of the pandemic on research is the rapid growth of video-conferencing platforms like Zoom. To put this into perspective, between 2020 and 2021, Zoom meeting participants increased by 2900%. The growth of video-conferencing certainly indicates that human beings have adapted to the challenges of the pandemic by embracing new technologies in order to stay connected and continue working despite social distancing measures. Against this backdrop, several questions arise: (1) Has the need to communicate virtually brought collaboration back to pre-pandemic levels or has it facilitated even greater collaboration and productivity? (2) How can we measure such an increase in collaboration and what is the threshold for such a change to be significant? In this paper, I seek to address these questions, with a focus on collaboration networks in economics research. I chose to focus on this field in particular because it frequently involves collaboration across academic institutions and countries, making it an ideal framework to examine the effects of social-distancing measures on collaboration networks amid geographical barriers.
(1.2) Literature Review:
Before addressing the research methodologies in this study, I want to preface some of the existing literature in the field. Although authors of existing studies primarily examine the effect of lockdown measures on productivity in academic research, rather than collaboration, it still offers valuable insight into the multiple ways the pandemic has shaped research and impacted researchers. The literature was instrumental in developing the regression model presented in section 2.2 because it enabled me to identify potential factors that may be associated with both collaboration and the effects of the pandemic. This literature review is structured into three sections: productivity in the sciences, productivity in economics research, and gender disparities in research—all in relation to the pandemic. In each section, I will summarize several relevant findings as well as limitations that my research seeks to address.
(1.2.1) An overview of literature on productivity in S.T.E.M research
Much of the research done on fields outside of economics—largely in S.T.E.M fields—explores two contrasting effects of the pandemic on research output. On one end, the pandemic has led to the limited ability to conduct lab-based scientific work, which is essential to advancing research in the sciences. As one paper published in Ithaka S+R highlights, the closure of labs and lab-based scientific research activities during initial lockdown measures has not only hindered research progress and projects, but also created additional pressures related to indirect costs. These indirect costs include universities’ pressure to secure stable revenue streams due to diminishing flexibility in federal funding, and the challenge of supporting existing research due to the lack of academic instruction that often subsidizes research. As outlined in a paper published by Springer Nature, the indefinite timeline for continuing research, coupled with unstable funding sources, has taken a toll on scientists. Postgraduate and early career researchers, in particular, have been deprived of networking and publishing opportunities, leading to grim career prospects. Although the literature mentioned reinforces the pandemic’s negative effects on research output, an article published by the NIH suggests that the pandemic has led to a massive influx of scientific publications on COVID-19, which currently accounts for 10-20% of current biomedical investigation. Thus, output in the sciences has paradoxically both been drastically limited by the pandemic and expanded due to the necessity of medical research to limit the spread of Covid-19. While much of the literature is not directly related to collaboration in economics, it provides insight into factors such as limited capabilities and opportunities for researchers, funding issues, and general uncertainty that have affected research productivity and, as a result, collaboration during the pandemic. However, it is important to note that much of the literature focuses on qualitative evidence (i.e. spoken word) rather than data to understand this issue, which may undermine its validity. In my paper, I thus take a more data-driven approach to determine changes in co-author networks resulting from the pandemic.
(1.2.2) An overview of literature on productivity in Economics research
A journal article published in the Oxford Academic provides some valuable findings related to the effect of the pandemic on productivity in Economics and Finance. Kruger and fellow researchers decided to focus on economics and finance in particular because of how research in the field is reliant on feedback and constructive discussions, which traditionally revolve around in-person seminars, conferences, and informal office conversations, much of which ceased in March 2020. While my paper is more so centered on collaboration, the study provides a valuable basis for analyzing the pandemic’s effects on research productivity and collaboration in economics and finance.
To measure productivity in these fields, researchers examined a set of working papers posted on the Social Science Research Network (SSRN) by faculty at top-50 U.S. economics and finance departments. To quantify production, they measured the frequency at which papers were posted to the SSRN by faculty, and used a difference-in-differences model to determine a statistically-significant change in research output before and after the pandemic. The key finding is that following the onset of COVID-19, research production in economics and finance (measured by the posting of working papers) increased by 29%. This figure shows the resilience and potential evolution of economics research in the face of the pandemic, which is certainly a contrast to the S.T.E.M. fields discussed above. It also signals a possible increase in collaboration that may have promoted an increase in research output.
The most relevant finding from the paper in relation to my article was an “increased reliance on past coauthorship networks” within faculty and “larger production gains for authors that are more central to the network”. While this finding highlights the importance of coauthorship networks in relation to research output, it fails to show how collaboration dynamics may have shifted during the pandemic. In other words, it does not detail whether researchers in the study potentially overcome geographical barriers to maintain their co-author networks, which is especially relevant given the unique circumstances of remote work and reduced in-person interactions. In my paper, I mainly focus on incorporating this potential shift in collaboration dynamics by defining a metric that can measure collaboration in relation to geographical barriers.
(1.2.3) An overview of gender disparities in economics research
An interesting result common to much of the literature I examined was a striking gender disparity in research production. In a journal article centered on the impact of Covid-19 research published in the NIH, the authors discuss how early analysis on publications in-and outside of scientific research have shown that “female academics are publishing less and starting fewer research projects than their male peers” . The authors specifically point to the increased familial and childcare responsibilities that women are facing during the pandemic due to having to work from home. In a related article published in The Guardian, the authors interview several female academics in the UK to gain insight into this potential gender gap. One female academic explains this disparity in terms of the historic wage gap, saying “because she earns less, and can be more flexible about when she works, the bulk of the childcare falls to her.” Both articles provide some qualitative evidence on gender disparity in research production through anecdotal evidence from female academics, but are limited in that they don’t show a statistically significant difference between production across males and females. The article mentioned above regarding the effect of the pandemic on research in economics and finance seeks to incorporate this perceived difference into their modeling to limit potential noise in their regression model. The key finding from the paper was that “women between the age of 35 and 49 experienced a production increase that is 0.31 papers per year smaller than men in the same age group, a difference that is statistically significant at the 1% level.” In addition, researchers found a mean 6% increase for women aged 35–49 compared to a mean 32% increase for men aged 35–49 before and after the pandemic. Both of these statistics, again show a sizable and statistically significant change in production sectioning on gender. While my paper isn’t specifically focused on examining these differences, there is likely a correlation between research production and research collaboration, which implies gender could potentially be associated with collaboration differences. I thus incorporated gender differences into my statistical model so as to isolate the effect of geographical barriers on research collaboration. This is essentially to overcome omitted variable bias when determining a causal effect between social-distancing measures during pandemic and collaboration in Economics. The specific incorporation of gender differences in my model will be explained more in Section 2.2 of the paper.
(2) Data Collection and Research Design
(2.1) Data Cleaning and Collection
My paper primarily relies on the metadata of the NBER working paper series, which contains details such as titles, coauthors, abstracts, and dates of NBER working papers from 1973 to 2023, and is publicly accessible. NBER working papers are particularly well-suited for this study due to three key reasons: (1) they are working papers, which means the actual collaboration necessary for the paper occurred close to their publication date, rather than years earlier, (2) they are authored by at least one NBER affiliate, thereby ensuring their credibility, and (3) all the papers are related to Economics, which is the main focus of this research. I was able to access this data as a result of a blog article written by Economist Alex Albright. This article focused on the data surrounding publications in Economics and provides some interesting descriptive analysis related to this issue. Here is a look at the first few rows of the raw dataset:
Given that 1,200 working papers each year are published in the NBER working series every year, the data set contains nearly 33,000 entries of 41 variables. As such, I seeked to clean the raw data to obtain the necessary information related to my overarching research question.
As mentioned in Section 1.1, I am looking to see whether the introduction of lockdown measures, and thus a rise in virtual communication, has brought collaboration back to pre-pandemic levels or possibly facilitated even greater collaboration and productivity. I thus defined a metric that could quantify both a change in collaboration and the geographical barriers introduced by lockdown measures — the average pairwise distance between coauthors. Formally, this measure is:
The potential implications of this metric are twofold: One possible implication is that due to lockdown measures, NBER affiliates may no longer be able to work with colleagues in close proximity. As a result, their communication networks may expand outside of the university where they work and potentially reach other universities. This is because the cost of communication with a researcher at their university becomes equal to the cost of communicating with a researcher in any other location, as proximity is no longer a factor. This would be reflected as a higher expected distance between co authors after lockdown measures are instituted. On the other hand, there are factors mentioned in the literature—such as gender disparities, a reliance on past coauthorship networks, and varying demand across disciplines—that have the potential to actually diminish collaboration as a whole. This would be reflected as a lower expected distance between coauthors after the pandemic. Thus, this metric has limited bias in that it accounts for both possibilities and can thus act as a proxy for collaboration.
To obtain the data to compute this metric, I cleaned the raw data set to include the name of each paper in the data set, a set of coauthors (each in their own column), and the issue date split into three columns containing year, month, and date. Once I did this, I sectioned off the data set into a set of papers published between 2016 and 2019 and a set of papers published between 2019 and 2023. I then randomly selected a set of 30 papers within each group. Once I had this data, I manually entered the universities of the first three coauthors as well as the gender of those coauthors for each data set. I found this information through the NBER affiliate profiles for these coauthors as well as faculty websites for those who were not NBER affiliates. Once I recorded the university of each coauthor, I computed the average distance between each coauthor in kilometers. I also encoded gender as a binary variable (1 = Female, 0 = Male), year as a binary variable (Post April 2020 = 1, Pre April 2020 = 0) for the purpose of my analysis. The former is represented under the coauthor gender columns, and the latter under the year_bin column. I chose April 2020 because this was typically when lockdown measures were firmly in effect worldwide . I then merged the two separated data sets back together. Here is a look at the first few entries of the cleaned data set used in my analysis:
(2.2) Research Design + Modeling
To determine a causal effect of lockdown measures during Covid-19 pandemic on collaboration in Economics research, I ran a series of regressions to gauge the causal effect of the lockdown (represented as Year) on collaboration. In my first study, I regressed the average distance between coauthors utilizing the following model. For simplicity, I am symbolically representing the average distance between coauthors as θ:
In this initial regression, I am predicting the average distance between coauthors for a particular paper depending on whether the paper was published before or after initial lockdown measures were instituted in April 2020. The use of a binary random variable for year is crucial in this case as the estimated regression coefficient B1_hat will represent the average change in the distance between coauthors (θ) before and after April 2020. This is essentially equivalent to conducting a difference in means t-test for the distance between coauthors before and after lockdown measures were implemented. However, this design becomes more useful when adding independent variables.
To isolate the effect of lockdown measures, and determine possible implications of gender on collaboration as mentioned in Section (1.2.3), I introduced co-author 1, co-author 2, and co-author 3 gender as additional independent variables in my regression model :
Each coefficient in the multilinear regression model represents the change in the average distance between coauthors for a unit change in that particular variable holding every other variable fixed. Note that gender is also a binary random variable, and thus each coefficient represents the expected change in coauthor distance between males and females.
(3) Results and Discussion
The results of the initial simple regression model are detailed below:
The results of the multilinear regression model are detailed below:
Based on the simple regression model presented, it appears that there is a positive association between the institution of lockdown measures and the average distance between coauthors. This is evidenced by the coefficient for ‘year_bin’ (the Beta-1 coefficient defined in the model). The interpretation of the coefficient is that after April of 2020, the expected distance between coauthors is predicted to increase by 119 km. The intercept term in this case represents the expected distance between coauthors before the pandemic, which is around 2309.9 km. While it was tempting to draw inferences from the OLS estimates, it was observed that the standard error of the statistic, 582.6, was quite large. At a 1% significance level, the p-value of the beta-1 coefficient is 0.88 which is significantly larger than 0.01, indicating that the null hypothesis of no difference between mean coauthor distance before and after the pandemic cannot be rejected. A 1% significance level was chosen to ensure the statistical validity of the estimate. Thus, the estimate for B1 is not statistically significant. It is difficult to interpret anything economically significant from the simple regression coefficient of beta-1 given the likelihood of omitted variable bias. In this context, omitted variable bias could be generated by leaving out variables that potentially affect the average distance between coauthors, including whether coauthors are established (i.e. tenure-track or non-tenure track), their gender, and a plethora of other circumstantial issues specific to each author. I thus try to expand on this model, by including gender as an additional control variable.
For the multiple linear regression model, when isolating the effects of lockdown measures on the pandemic, it appears that there is a negative association between the institution of lockdown measures and the average distances between coauthors. This is reflected by the coefficient for ‘year_bin’ (Beta-1), which suggests that when holding gender constant, the expected distance between coauthors is predicted to decrease by 160 km. It is also noteworthy that the gender of coauthors appears to be associated with their distance from each other. The coefficient estimates for coauthor-1 gender (Beta-2) and coauthor-2 gender (Beta-3) are moderately positively correlated with an increase in expected coauthor distance, while coauthor3 gender (Beta-4) is strongly negatively correlated with a decrease in expected co-author distance. Recall that the positive increase in this case means an increase in collaboration for papers with a female co-author, and negative means a decrease in collaboration for papers with a female coauthor. However, these estimates should be interpreted with caution due to the high standard error associated with the sample. At a 1% significance level, the p-value for the Beta-1 coefficient is 0.90, which is significantly greater than 0.01, indicating that the null hypothesis of no difference between mean coauthor distance before and after the pandemic fails to be rejected. Similar conclusions can also be drawn for the Beta-2, Beta-3, and Beta-4 estimates. Thus, the results of this study are inconclusive and require further investigation.
The study has some limitations associated with the collected sample. Although the original data from NBER working papers is representative of economics research, the way in which the raw data was collected and analyzed could be improved. One possible way to do this is to obtain data on universities of coauthors, gender, and distance between coauthors for as much of the raw data as possible and then take a large random sample. The study would benefit from this random sample being as large as possible (at least 10% of the 33,000 working papers). This random-sampling technique, as well as a larger sample size, could likely reduce much of the standard error in the estimates, leading to much more representative results.
Apart from sampling techniques, there is also the possibility of an error in the model caused by a potential correlation between gender and year, which is also known as multicollinearity. The investigation of pandemic effects on research production in Section 1.2.3 of this paper revealed that the pandemic had negative effects on research production across women. This result could have implications for collaboration amongst female coauthors, which would reflect in a correlation between year_bin in the model and co-author 1, 2, and 3 gender.
The danger of multicollinearity is that the regression coefficients become very sensitive to small changes in the data. This means that even minor changes in the data can cause large changes in the estimated coefficients. As a result, the coefficients become unreliable and difficult to interpret, which is a possible reason for the high degree of standard error seen in the coefficient estimates. Therefore, unless future studies can resolve this issue, the results of the study are inconclusive.
Lastly, there is the potential for omitted variable bias as our regression-model is built on the assumption of 4 independent variables, but there are likely much more. Quantitatively, this is evidenced by the R^2 value of the multiple linear regression of 0.04 (< 0 when adjusted for additional regression variables). This metric indicates that our independent variables of year and coauthor gender fail to account for much of the variability in distance between coauthors, and thus there are likely a host of other factors affecting the average distance between coauthors. This could be improved upon by first addressing some of the limitations discussed above, and then including a set of relevant variables. For example, only the first-listed coauthors are NBER affiliates which with respect to the paper written could impact them and/or the extent to which each coauthor is willing to collaborate with peers in the field. This could be accounted for by a co-author status variable indicating whether that particular author is or is not an NBER affiliate. Additional variables require further research and could be of interest to both isolate the effect of lockdown measures and find new causal relationships.
In this study, I seeked to determine potential causal effects of lockdown measures instituted during the Covid-19 pandemic and collaboration in Economics Research. To investigate this, I utilized a publicly available record of NBER working papers from 1973 to present-day. After dividing up the set of working papers into those published between 2016 and 2019 and 2019 and 2023 and taking a random sample of 30 papers within each group, I recorded data on the universities, gender, and distance between coauthors for each individual paper. I then utilized a series of regressions to determine the causal effects of lockdown measures accounting for potential gender differences in research collaboration. However, due to a high degree of standard error in regression estimates, results were inconclusive and further investigation is required.
Featured Image Source: [Why is Big Data Important in Our Life and Business]
Disclaimer: The views published in this journal are those of the individual authors or speakers and do not necessarily reflect the position or policy of Berkeley Economic Review staff, the Undergraduate Economics Association, the UC Berkeley Economics Department and faculty, or the University of California, Berkeley in general.