How to organize a PhD when buried under a mountain of data

I will preface this by saying I am not an organized person, if you need proof just look below at the picture of my desk.

Research projects are inevitable in life: their topics range from planning a trip or event to writing a PhD. At least for me, one of the hardest things about researching things and doing research projects is staying organized. But more on that later.

My desk

What is data and how can it become a mountain? Data is defined by the Oxford English Dictionary as “Facts and statistics collected together for reference or analysis.” Nowadays data infiltrated every aspect of our lives. One of the primary tasks during my PhD has been to identify how microorganisms use basaltic rock as a substrate. To do this I have collected tomography data at a variety of scales (producing data sets which can resolve features that are tens of micrometers to other data sets that can be used to observe features which are larger than 500 nanometers). Now that it is collected I have to analyse it all. As Pavel has mentioned in earlier posts tomography datasets are thousands of individual files that together can be used to create a 3D rendering of the object that was scanned.

It is because of this that I have ended up with a mountain of data to climb. The computer on my desk in the image above has 8 TB of storage. Next to my desk is a server which has a capacity of ~65 TB and scattered around my office and apartment are more than 15 portable hard drives, each with a capacity of at least 3 TB. At last look, I have over 40 TB of primary data, all of which must be stored in duplicate, most of these data will balloon to 3 times their original files sizes during the analysis process.

Datasets of this size are nothing new, and an entire field, Big Data, is dedicated to figuring out how to analyse, store, and manage such data sets. Organizing and managing these kinds of data is not very different than organizing any data or primary research you might conduct during a PhD project, MSc project or everyday life. The only difference here is magnitude.

I started my PhD over 2.5 years ago, and I went in naively thinking that setting up some folders to save things in an organized fashion would be enough. Little did I know that I would end up with so much data and ultimately, I have had to devise a system of managing it all on the fly. I would not recommend that. It makes things very confusing and rather unhelpful.

When managing personal datasets and personal research there is no best method so to speak. The best organization system is one that gets used and one that works for an individual. Note: this is not true for widely used datasets where versioning, a robust naming method, and consistent organization is key. That said, there are a few things that I have found make life much easier. Choose a method and stick with it. For example, if you start with putting the date in every file name so you know when the file was originally created then you should continue with that.

Personally, for everyday work and everyday analyses I have a panoply of folders that are split up into categories as you can see in the image below. I also store everything in a paid dropbox account (not an advertisement, I just love the service) so that all the files are automatically stored in the cloud as well, and very basic versioning is performed. This works passably well for me, but may not work for everyone.

File organization tree

So why does this matter for anyone who is not doing a big academic research project? Everyone has research projects, even if they do not necessarily think of them in that way. Where do I want to go on vacation? Where do I want to host a party? What is the best restaurant in my price range in my city? These are all questions which can be researched in everyday life. There are many ways to do so, a fair number of people like take the approach of flying by the seat of their pants, others will create detailed dossiers of their options. Those who take a lackadaisical approach may have once found the perfect restaurant, but cannot remember where it was or how they found it. They then end up not being able to return (I do this all the time). Alternatively, some may compile documents with tens of vacation options only to decide that they are not going this year. Finding a method of organizing files, data, etc that works for you can streamline your entire research process. I know it certainly worked that way for me.

Rocks that pop!

  • Discovery

In 1972, the scientists onboard the French research vessel Jean Charcot, during the “Midland” cruise made an amazing discovery: Rocks that pop! From the seafloor in the Atlantic Ocean they retrieved some basaltic glassy pebbles that exploded noisily, much like firecrackers and jumped merrily to a height of up to one meter on the ship deck. A decade later, another geologic expedition aboard the RV Akademik Boris Petrov made the same surprising discovery from a complex region of the Mid-Atlantic Ridge that contains vast areas of lava flows (see previous post) as well as heavily faulted terrain with intact blocks of deep crust. These rare forms of lava rock are really interesting because of their spectacular behaviour but mostly because of their richness in gas and information they provide on the deep Earth.

Figure 1: a) Photo of a popping rock. Volcanic glass in black and rounded vesicles. b) Photo of a thin section of popping rock (Sarda, 1990).

Continue reading

Doing a PhD – What is scientific research like?

When thinking about your career prospects you may wonder what it would be like to stay at university and go on to complete a PhD with the aim of working in scientific research after that. You may ask yourself what type of struggles you will encounter, or how different it is from working in a company. Or you may just wonder how different it is from undergraduate and master’s level Science studies. Is it for you? Let’s find out.

PhD vs. BSc or MSc

In comparison to studying a degree at bachelor’s level, a PhD will focus on a very specific topic in high detail and at high level, while during your bachelor’s degree you will have covered a very wide range of topics more superficially and written a thesis that is more descriptive or helpful in learning methods and concepts than in advancing science. In comparison to master’s level, it depends. In research-oriented master’s degrees, the master’s thesis or dissertation will be a first taste of what research is actually like, however at a smaller scale. On the other hand, industry-oriented master’s degrees will be more relevant to the interests of a company or industry sector, and may therefore require skills that are more suited to that particular field of industry and applied science.

Continue reading

Why do minerals have colours?

Because some of you asked during ‘La Fête de la Science’ in Paris why minerals have different colors I decided to write this post about it.

First of all, I need to define the electromagnetic spectrum, and the energetic distribution of electromagnetic waves. But what is a wave? Let’s make that clear by looking at waves in the ocean (figure 1).

Figure 1: Ocean waves (source:

Continue reading

One day of hard X-ray radiation

Most people understand geology as a science that deals mainly with large scale objects and processes, such as subduction zones and sedimentary basins. Nonetheless even the largest scale event can sometimes only be understood by looking at its tiny components. In fact, small scale processes such as dissolution and precipitation of rock-forming minerals together with other chemical reactions control what happens on a large scale.

There are many instrumental techniques to investigate objects at a small scale: different types of microscopy, spectroscopy and so on. One of the relatively new techniques developed in the 1970s is X-ray tomography. It is a nondestructive method, that allows reconstruction of 3D-structure of  an object.

Embed from Getty Images

Fig 1. Radiogram of a hand. Bones are brighter because of higher X-ray absorption level.

How does this work? Perhaps you know how radiography works: X-rays emitted by a certain source penetrate the object of interest, for example your body, and interact with its matter. During the interaction, part of the X-rays are absorbed while another part reaches the photo-detector. Roughly speaking, the fraction of X-rays being absorbed depends on the density of the matter, for example bones absorb more X-rays than organs and skin. The part of the X-rays that reach the  detector produces a 2D projection image of the object — a radiogram.

Tomography uses a series of radiograms of the object, made from different view angles, and produces not a 2D but a 3D density map of the object. Both X-ray radiography and tomography are well known to the general public by their medical applications: radiography is used for detection of broken bones, and computed tomography (CT) is widely used for detection of tumors, but is also highly useful when geologists are looking at rocks.

In practice, tomography of rock samples can be done with lab-scale equipment (like in Karins post), which has a comparable size to the medical CT-scan. However, in order to see the tiniest details, a long exposition time is needed to produce a radiographic projection with high spatial resolution. It could take a whole day or even several days to make a full tomogram of a sample. Luckily, there is a way to shorten tomography acquisition time — to use more powerful X-ray sources!

The best X-ray sources available today are provided by synchrotrons. Usually it is hard to get working time on a synchrotron: there is a special application procedure (for which you have to write smart projects) and high competition between research groups. This summer our group got lucky and we got 24 hours of synchrotron time at the ESRF (European Synchrotron Radiation Facility) in Grenoble, France (Fig. 2).

Fig 2. View on the ESRF synchrotron from above. X-rays are delivered in beamlines distributed all along this huge ring (see how small the cars are!)

In the evening of the 7th of July, after the working day, we went to Grenoble for image acquisition of our rock samples. On arrival we were accommodated in guest houses on the ESRF campus. It was around a 5 minutes walk from the guest houses to the tomography beamline ID19 where we were working. The campus of ESRF is cozy, green, and surrounded by alpine mountains from all sides. It is full of trees and green grass with rabbits pasturing on it. There are bike tracks and bike parkings everywhere to go from one place to the other (look at the ring size on Fig. 2).

Fig 3. In the control room of the beamline
Fig 4. In the hutch, where you place your rock sample for analysis (the hutch walls are made of lead to protect people against radiation when the X-rays are switched on!)

The next day at 8 am we started our 24 hours shift at the ID19 beamline. It is a very long (145 m) line because it takes huge distances to focus X-rays to a tiny spot for analysis at high spatial resolution. Hence the building with a hutch is  located aside of the main ring. There were two rooms at our disposal: the experimental hutch itself and the control room with computers and other control equipment (Figs. 3 and 4).

People working at ESRF have a peculiar sense of humor. Computers in the anteroom are named quite usually (so typical computer names): Lysithea, Ganymedes, Callisto, and Siegfrid, but monitors that showed what was going on in the experimental hutch (that is closed when X-rays are on) are “Big brother 1”, “Big brother 2”, “Big brother 3” and “Big sister”. By the way, Big brother 1 was looking at the sample stage, so we could see the step motors moving the sample to put it in focus, and turning the sample during tomography.

Fig 5. Big brother 1 is watching our sample

From Montpellier, where our lab is, we brought 12 samples (natural dunite and peridotite samples from Oman and experimental olivine samples), 5 of which were imaged with two different resolutions. Resolutions were chosen such that we can image the whole sample with medium (0.65 µm) resolution and a subvolume of the sample with high resolution (0.16 µm).  Imaging with high resolution allow us to see smaller pores, cracks and inclusions, but the imaged volume gets smaller.Depending on the sample size it needed 2 to 4 scans of subparts to get a view of the whole sample. And one scan takes around 20 minutes, that is quite fast for a 3D image with a 0.65 µm resolution! So our schedule for these 24 hours was really tight, but we succeeded to get images of almost all our test samples.

Fig 6. 2D tomogram slice of a natural dunite sample

The visit to the ESRF allowed us to obtain almost 1 terabyte of high resolution 3D-images in only 24 hours. Use of synchrotron radiation is the only way to get such a huge amount of data in a short period of time compared to a conventional CT scanner. We are grateful to high-energy physics for providing such a tool. And not only physics, but also other fields of science such as chemistry and mathematics contribute to achievements and developments of new methods in geology.

By the way, these linkages to other fields of science is one more argument that geology is a science.  

“Geology isn’t a real science!”

Dr. Sheldon Cooper, physicist, from “The Big Bang Theory”, and his strange relationship with geology! (still image from “The Big Bang Theory”)
Dr. Sheldon Cooper, physicist, from “The Big Bang Theory”, and his strange relationship with geology! (Still image from “The Big Bang Theory”)

Some of you probably know “The Big Bang Theory” sitcom and if not I strongly suggest it. One of the characters in this funny American sitcom is a physicist and he firmly believes that geology cannot be considered as a real science. This is a pretty strong statement and it makes us think …..So let’s borrow this exclamation “Geology isn’t a real science!” and reflect about what geology is.

You probably know the stereotype of a geologist: shorts with a thousand pockets, a hammer, a compass and a magnifying glass. But what is this sort of David Attenborough looking for? Continue reading