Monday 9 August 2010

Collaboration: An Example


Set-up: I spent much of July working on a particular project in the hopes it would included in the results sent to a conference at the end of August by one of the groups at CERN. Several other students from different universities did the same. We were all given the outline of the analysis, and then wrote our own code, adding in our particular methods to calculate the results. When the editor began pulling things together for this note, my project had insufficient statistics for the results to be included, but I was asked to make some of the plots used to describe the part of the analysis we all had in common. I made very nice versions of the plots and submitted them to the editor on a Friday, thinking that beyond cosmetic changes I was more or less done with the note.

Monday evening, 6 p.m.: editor calls a meeting with me and two other students to say that our results don't match. Namely, student X who made the table has selected a different number of events than I did for the plots that are supposed to go with the table. We are told to figure out the difference and resolve it as soon as possible.

So much for French class that night. Instead I park myself at my computer to start figuring out exactly which events are in which plots and comparing this to similar lists produced by X for his tables. This is made more difficult that my work computer has started randomly freezing and crashing, and managed to corrupt a chunk of the datasets I had set up in the process. So I am trying to compare numbers piecemeal with X using the uncorrupted datasets while reproducing those that got messed up. My computer continues to freeze throughout this process. We eventually reach the point where no further comparisons can be made without a more methodical comparison, and sign off Skype chat. I set up my computer to clean-up the last five corrupted files, submit all the analysis jobs to the cluster, and catch the last bus home at 12:40.



Tuesday, 9 a.m.: I arrive at my desk to take stock of my jobs. All have successfully finished. I submit analysis jobs on the newly reproduced files, and compile the rest of my results in a list to send to X. I also compare my list to one he sent me previously, and identify discrepancies. I continue to update these lists as my analysis jobs finish. By lunch, we have identified our differences and by late afternoon established what caused them all. I recreate the plots and send them to the editor. I am able to leave CERN in time to do a little grocery shopping and make myself some dinner.

Wednesday, 10 a.m.: I receive an email from the editor saying the plots still don't match the tables. I send a new event list to X and ask for an undated one from him. The discrepancy is down to eight events. I spend another evening at CERN on skype chat with X, comparing the information on these events and explaining what procedures we used to make certain calculations. Differences all established, I recreate the plots, including several new elements requested by the editor, and go home. The time is 10:30 p.m.

Thursday, 10 a.m.: I receive an email from the editor saying the plots still don't match the tables. I send a new event list to X and ask for an updated one from him. Upon closer inspection, the list contains exactly the same events as were discrepant the day before. I ask X why these are still here, as I had shown him where in both the analysis spec we were given and in the text of the note we were working on the description that matched my procedure and not his. He said that he had asked the editor and been told to do it his way. I was not told this. I refrained from responding to avoid saying anything contentious, and instead appealed to my post-doc for help. A flurry of emails between the editor and my post-doc ensued, with the result being that my procedure was adopted. This accounted for half our discrepancy. X identifieds the reason for the other half, and we both set about rerunning everything and reproducing our plots and tables.

The complete reanalysis on my part involves about 9 million data events and 40 million monte carlo events. It can take several hours, particularly if my colleagues are also using all the computers to run analyses, as they were that evening. I leave the jobs to run and head into Geneva, as is my want on Thursday nights. When I arrive home, a handful of my jobs had not finished running. At about one a.m., the last of them finished, and I remotely download the results, reproduce the plots, and submit them to the editor.

Friday, 8:30 a.m.: the working group meeting in which this note will be presented begins. I find that being present in a meeting where your results will be presented not by you is almost as bad as having to present them yourself. It is noted that one of the plots does not match the tables, instead being discrepant by ONE event. The discrepancy is at least further down the line of the analysis. I leave the meeting a little early to produce a new event list. It is en route to X when we receive the email from the editor telling us to find the difference.

The difference is found to once more be a procedural one, one that affects a certain calculation by about a percent. X is using the wrong method, though it is debatable if my method is truly the correct one. The editor supplies a third method and tells us to do that. Both X and I must reanalyze everything. I submit all the necessary jobs and head home. After dinner, I remotely log into my work computer (which still crashes periodically) to check progress. The full set of jobs completes after eleven p.m. I download the results and recreate the plots. Four now perfectly match X's number. Four are completely screwed up. I send in the good ones, find the bug, and resubmit everything. I go to bed after midnight.

Saturday, 7 a.m.: I turn my computer back on. My jobs have all successfully completed. I download the results, create the final four plots, note that they match the results of the table, and submit them. I give myself permission to not think about physics for the rest of the weekend (lasts for about four hours).

Epilogue--Monday, 5:30 p.m.: I receive an email from the editor saying the angular range used in part of the analysis needs to be changed and everything redone. Also, the title indicating these plots are submitted for approval needs to be added. For once, no one else is running analysis jobs on the cluster. I submit new plots to the editor at 7:40 p.m.

1 comment:

Rose Ledezma said...

I think I would have gone crazy at least ten times. Maybe twenty. Maybe fifty.