Episode 25: Preserving History…About Earth and Beyond

Putting the first man on the moon in 1969 cemented America's dominance in the space race. And as we have continued exploration into the vast unknown, the purpose of each mission is about collecting data, terabytes of priceless and irreplaceable data. How does our body react to weightlessness? Can plants grow in space? Was there ever life on Mars? On this episode, NASA’s Tom Stein explains how and why he and his team collect, preserve, and distribute terabytes of irreplaceable data generated by the space program.
Transcript
Listen More
All Luminaries Podcasts

Listen In To Learn

  • How NASA captures data in context that that makes it meaningful
  • How to safely store data in an easily-retrievable format
  • How to build a robust data backup system
  • How space exploration is moving toward data-driven discovery

Collecting and Storing Scientific Data Is Keeping Astronauts In Space

What’s the most valuable data you store in the cloud? For most of us, it’s pictures of family and friends. These photos are priceless to each individual, certainly. It’s definitely worth storing a backup in the cloud and having them on a local disk.

Now imagine your data was priceless not just to you, but to the entire human race. How would you store, say, images from the last moon mission? How would you capture not just the data, but the context that makes the data meaningful, and then preserve it for future generations? Most importantly, how would you keep data secure and make it freely available around the world?

Answering these tough questions is a full-time job for Tom Stein, NASA Planetary Data System Geosciences Node Operations Manager at Washington University. On this episode, Stein explains how he and his team collect, preserve, and distribute terabytes of irreplaceable data generated by the space program, data that is ultimately being used to inform and improve the space program.

Featuring Luminary: Tom Stein, NASA Planetary Data System Geosciences Node Operations Manager, Washington University

Tom Stein is the operations manager and senior IT lead for NASA’s Planetary Data System Geosciences Node at Washington University in St. Louis. He has been with WUSTL for more than 25 years. With an information management background, Stein is a technology advocate who seeks to incorporate technology advances into tools and processes that improve access to and understanding of NASA mission data. He leads development of new approaches and interfaces, including the Analyst’s Notebooks, a web app for planetary scientists and the general community that details Mars rover missions. In addition, Stein is involved with efforts to standardize planetary data management and access worldwide and chairs the International Planetary Data Alliance.

For all those discoveries that we’re finding now, the data driven aspect of looking through all of these returned data is going to bring a lot more discoveries that we, I think, 10, 20 years from now, are going to look at and say, wow, that is incredible what was found out of that mission.

— Tom Stein, NASA Planetary Data System Geosciences Node Operations Manager at Washington University

Luminaries Hosts

  • Mark Schaefer Author, Consultant, College Educator. Mark is a leading authority on marketing strategy, consultant, blogger, podcaster, and the author of six best-selling books, including "KNOWN." He has two advanced degrees and studied under Peter Drucker in graduate school. Some of his clients include Microsoft, GE, Johnson & Johnson and the US Air Force
  • Douglas Karr Technologist, Author, Speaker. Pre-Internet, Douglas started his career as a Naval electrician before going to work for the newspaper industry. His ability to translate business needs into technology during the advent of the Internet paved the way for his digital career. Douglas owns an Indianapolis agency, runs a MarTech publication, is a book author, and speaks internationally on digital marketing, technology, and media.

ANNOUNCER: Luminaries, talking to the brightest minds in tech.
MAN: And my hope is that we come together to share more than technology, and expertise, and products, but that we share a vision of a future that is better than today, a vision of technology as the driver of human progress.
ANNOUNCER: Your hosts are Mark Schaefer and Douglas Carr.
[RUMBLING]
MARK SCHAEFER: And we have liftoff, ladies and gentlemen, for another episode of Luminaries, where we speak to the brightest minds in tech. This is Mark Schaefer, with my co-host, Doug Karr.
Are you excited about this one, Doug?
DOUGLAS KARR: Ah, yes.
[LAUGHTER]
MARK SCHAEFER: Are you geeking out on this one?
DOUGLAS KARR: Yeah. I’m feeling a little weightless.
MARK SCHAEFER: As you may have guessed from our opening, we’re going to be talking about something space-related, NASA-related today. And our guest is Tom Stein.
Tom, I’m going to have to read your title here because it’s a lengthy one. It means you must be very, very important. His title is the NASA Planetary Data System Geosciences Node Operations Manager for Washington University. Welcome.
TOM STEIN: Thanks very much.
MARK SCHAEFER: Tell us– tell us a little bit about yourself
TOM STEIN: Yeah. It’s a pleasure to be here. Don’t let the long title fool you. Nobody really knows what I do, sometimes including myself.
DOUGLAS KARR: Man, that is like the best job. You work for NASA and nobody knows what you do. Sign me up.
TOM STEIN: Yeah, exactly. Yeah. So what we’re charged to do at the Planetary Data System Working Group is to work with planetary missions and space science data providers to define the data products and then archive them for the long term for the science community.
MARK SCHAEFER: Yeah. I was looking over your website. And, again, I mean literally, I grew up in the Apollo era. And I am the biggest space geek ever. I’m like wearing my NASA hat right now, anticipating that you would be here.
I had a scrapbook when I was a kid. I collected everything about going on– that was going on with NASA and Apollo. And I knew who all the astronauts were. So basically it sounds like you’re kind of NASA’s library. Am I describing that right?
TOM STEIN: Yeah. I think that’s a fair description in one sense. We are tasked with preserving all those science data and giving access to the data. But really our task is more than acting as librarians. We start working with data providers years before a mission even launches because we want to make sure that the science data are well-defined and well structured, for both archiving purposes, but also for daily use by the researchers.
MARK SCHAEFER: So I mean like back in the Apollo days, what did they do before they had you? I mean, where is that data? Do you guys still have that?
TOM STEIN: Right. So the data are still around on rolls of film, negatives, and prints, and so forth. But what happened that kind of woke NASA up was in the early ’80s they started thinking, gee, we’re so focused on the accomplishments during the mission. We need to make sure that we have a plan in place to save the data and have that plan in place before launch so that when the mission is over, people can return to it year after year.
MARK SCHAEFER: Yeah. That’s right. So I mean the data, has a lot of that been digitized from, like, Apollo Mercury?
TOM STEIN: It has been. Yeah. And what is striking to me is a lot of those data are in constant use. Even 40 years later, a lot of the rock samples continue to be looked at by the science community and studied even to this day. You can imagine it’s not a simple process to send another mission up to Venus, or to the Moon, or wherever.
MARK SCHAEFER: Yeah.
TOM STEIN: So once you have those data, you want to get the most out them.
MARK SCHAEFER: Yeah. That’s for sure.
TOM STEIN: But on top of that, there’s also temporal studies. Now that we’ve been to some of these places and enough time has passed, we can go back and look and say, what’s happened to this area 30 years– 30 years since we went there first.
MARK SCHAEFER: Oh, interesting. Yeah, of course.
TOM STEIN: Those changes.
MARK SCHAEFER: Of course.
DOUGLAS KARR: I’d like to learn more about you. I mean, did you ever think you’d be building an archival data system for NASA?
TOM STEIN: Well, I was originally at the Smithsonian Institution. And I came to Washington–
MARK SCHAEFER: You’re such a name dropper, I swear.
TOM STEIN: Let me pick that up off the floor here.
MARK SCHAEFER: And then you worked for Beyonce, right?
DOUGLAS KARR: Yeah. Let’s just hit all the bases. There you go.
TOM STEIN: But I came to Washington University specifically to work on this project. But that was 25 years ago. I had no idea, of course, where we would end up today.
MARK SCHAEFER: Wow. Things really changed.
TOM STEIN: When we started this whole project, and it was about two years before I came aboard, we were working on a back station, nine-track tape drives. We didn’t have a hard drive. The first hard drive we bought was $10,000 and one gigabyte, if you can believe that. So, yeah, things have changed a little.
MARK SCHAEFER: That’s amazing. So what– I mean I want to follow up on those questions. I mean, what’s your background that prepared you for this?
I mean, you mentioned that you worked for Smithsonian. I mean, are you a data guy, are you an IT guy, are you a historian? I mean, what’s your background that led you to this cool job?
TOM STEIN: Well, what I have to admit is that my background is as murky as my job title.
[LAUGHTER]
So I have a geology degree.
MARK SCHAEFER: That’s crazy.
TOM STEIN: And I was working on an vulcanism project at the Smithsonian, doing IT development and things. And actually developed an interactive computer interface that was used in a traveling exhibit. It was the first time they put a keyboard out on the floor of the Smithsonian.
And all I can say is, back in those days, especially as a geologist, I didn’t know what it meant to do testing beforehand. Thank goodness for the 386s and the turbo boost buttons, if you remember those.
MARK SCHAEFER: Yes.
TOM STEIN: That made all the difference. But the IT side of things developing and stuff, that just kind of is a gift from God or something. I really didn’t take classes, although I’ve gone on to earn a degree in IT.
DOUGLAS KARR: That’s a great story. What’s the favorite part of your work? What makes you go, wow, I can’t believe I’m here?
TOM STEIN: Yeah. What I really enjoy is just the variability that each day brings and the ability to be creative in developing, whether it’s the web applications that we put up to allow people to take a look at the missions and kind replay them and live through them again or just working with different partners. We’re working with about 10 different missions right now and maybe 40 or 50 instrument teams, to work through their data, get it prepared, and also put it online.
MARK SCHAEFER: That’s really cool. So I mean do you actually get to sit in on the NASA meetings? Do you get to go to Houston or Cape Canaveral?
TOM STEIN: So I’ve have some of those opportunities. It’s been pretty fantastic. One of the challenges that we had, that came up almost 20 years ago now, there was a Rover test out in the Mojave Desert, where the scientists put themselves kind of in a trailer. And a couple engineers took the test Rover out in the field.
Scientists were in the trailer trying to pretend that they were actually doing a Mars mission. And after a couple of days, they’re starting to realize– we’re all taking our own notes. But we kind of lost track of what our decisions were along the way. Why did we decide to drive left or not right?
And so my boss came back and said, hey, let’s try to develop a system where we can keep track of those things. And so we did that with 20-years-ago technology of hand-coding web pages of day one, day two, day three kind of things. And as the Opportunity and Spirit rovers were preparing for launch, we started thinking about how can we make a robust system so that after these 90 days or 180 days– hopefully the mission gets extended– how do we capture, not just the data but the intent behind it, put it in context? So we put that system together, not thinking that one of those 90-day Rovers would still be running today, 14 years later.
MARK SCHAEFER: Yeah. So much fun, so interesting. Now, I’m imagining that security is a big issue for you. And we’re going to talk a little bit later about how a lot of this data is also open to the public. I mean just government, and university, and NASA. In some ways, I’m thinking maybe you have a target on your back. So tell us a little bit about what sort of provisions you have for security. Tell us what you can at least in terms of your strategy and the sort of technology you have to help you there.
TOM STEIN: Sure. A real plus for us is none of the data that we’re working with are classified. So at least the stress is kind of gone. I mean, certainly we can’t lose what we have. I mean it’s a national treasure. We’re really tasked with a lot of value.
But we do at times have people doing those kind of regular denial of service, trying to hack in. Most of our security comes from our firewall configuration. But we also control everything from the network to the keyboard. And so it’s really in our hands how we deal with that.
We had an incident, perhaps six, seven years ago, where two FBI agents came to my office.
MARK SCHAEFER: Oh, my gosh. That’s the start of a good day. I said, hello.
TOM STEIN: And they asked what do you guys do? And I’m thinking, I’m pretty sure you already know or you wouldn’t be here. But in the end, while they couldn’t tell me a whole lot because I don’t have clearance– and they said, well, some outsiders– outside the US had been hopping through Department of Defense systems checking one after the other. And then they went from the sixth DOD system into ours, or tried to. They couldn’t get in. But they wondered what are you guys doing that’s somebody is going to be–
MARK SCHAEFER: Oh, interesting.
TOM STEIN: –go DOD into to your space. I have no idea why. And I wasn’t arrested. And they never came back. So I felt pretty good about that.
MARK SCHAEFER: Maybe they like the Apollo program, like I do.
[LAUGHTER]
Maybe they’re just geeky.
DOUGLAS KARR: And I’m curious why, out of all the universities in the country, why did NASA select Washington University for this work?
TOM STEIN: Yeah. Well, this might sound a little self-serving. I think it’s one of the smartest things NASA ever did. And I don’t mean just choosing Washington University. But our group focuses on data about the surface processes of other planets. There are a handful of other groups in our planetary data system that focus on rings, on atmospheres, and so forth.
So we have different groups at different institutions. But the idea behind it is to locate the work where science research is being done. Washington University has got a very rich history in planetary science. My boss has been involved with every Mars mission since the 1970s, for example. And he’s the PI on Opportunity.
And so instead of going to an IT location where people are really good at developing IT solutions, but maybe don’t fit the needs. And then that’s what we really see. So it’s an opportunity to take IT abilities and that knowledge domain on the science side and merge them. And maybe, getting back to your original question, that’s kind of where I positioned myself, I think, or have been positioned as a bridge between the two. So that also is part of what makes the job challenging, is to put all those groups together.
MARK SCHAEFER: And I was interested in that because I work for a university. I’ve been teaching at Rutgers University for about eight years. And I’ve witnessed firsthand what a bureaucratic institution a university is. Love you Rutgers.
So you’re this bridge really between two incredibly complex and, let’s face it, bureaucratic institutions, a university and the government. That’s got to be a big part of your job, just being a mediator, I would think.
TOM STEIN: Well, we’re really fortunate at Washington University to have some flexibility and freedom to operate. As I said, we’re running– we’re even allowed to run our own email servers, which maybe these days is a wish that’s a lot of groups still have or at least that one IT guy in the back who wears the Birkenstocks wishes he still had. But we’ve got some good people doing the paperwork on both sides, on the NASA side and the university side, to kind of help keep us out of the nitty-gritty of that contract work. But at the same time, we do have recompetes, and reviews, and so forth that we have to prepare for.
On the other hand, it keeps us really focused on what we’re doing. And as we look to our customers who really drive what our focus is, having the strong support behind us on the paperwork side really.
MARK SCHAEFER: Who is your customer? Would it be NASA or is it the people who are accessing the data?
TOM STEIN: Well, I guess technically NASA is our customer because we’re in a sort of a subcontract role. And we have a five-contract that’s kind of been recompeted, and reviewed, and so forth for nearly 30 years now. But what I think of as a customer is both our data providers and our end users.
And really, it’s the general public at large. I mean these are funded by US taxpayers. The data are 100% available for free to anybody. And because we’re not having borders on the data, anybody in the world. And it happens. We got people from all over the world coming in and downloading data and using it.
MARK SCHAEFER: That’s so cool.
DOUGLAS KARR: Wow. And it’s a ton of data. You guys are connecting one to two terabytes a month.
TOM STEIN: Right. Right. That’s what’s been archived. The way NASA works, the policy is about six or seven months are given to data providers, the science teams, instrument teams, to work through the data they acquire, to validate it, to make sure that it’s in shape. And then we release it on a regular schedule.
So those releases can end up being about three terabytes per month, depending on the mission set that comes up. Every month, we’re releasing data from some mission or other. So that volume is growing a little bit at a time. We’ve largely followed Moore’s law over the past years.
At the same time, there’s only so many missions that will go up. We’re working on in-site Lander and Mars 2020 as immediate missions. But already, we’re looking into the 2030s and working with some folks on missions that are scheduled to launch at that time.
DOUGLAS KARR: And with, obviously, people, scientists, and other researchers accessing this data from all over the planet, can you share any stories of some of the discoveries that have been made?
TOM STEIN: Right. The really cool thing is that the number discoveries that are out there are so numerous, with Curiosity Rover that just landed in 2012– well, already five-plus years ago. It seems like just yesterday. It was only a couple of years into the mission when the number of science papers that were published by team members was less than the number of science papers published by people outside the science team. Already the data were being looked at and researched in greater numbers.
MARK SCHAEFER: Is that students or academics or just people who are doing it for fun?
TOM STEIN: We’ve got all kinds of people coming in. At first, we were thinking our target audience is going to be senior research scientists at universities or at NASA centers, and graduate students as well, and in the field. However, we’re finding undergraduates are using it. Last month, at the lunar planetary science conference, a teacher came up to me and said, my high school students are using your interface and pulling down your data to support their senior research projects.
So we have, across the board, people doing it. We even have mission teams from other space agencies outside the US who are preparing for missions to the Moon or to Mars, saying we’re looking at your system to figure out how we can capture the information presented when we go up.
DOUGLAS KARR: Yeah.
MARK SCHAEFER: Wow.
TOM STEIN: So that’s pretty exciting.
MARK SCHAEFER: That is quite a compliment.
TOM STEIN: It is. And one of the agencies that we’re involved with is called the International Planetary Data Alliance. It sounds kind of like a Star Trek thing. So get another hat out, I guess.
[LAUGHTER]
But one of our goals has been to work on internationalizing the planetary science standards and setting up interoperability between agencies. And that’s worked really well. Our six-month release policy from NASA, which is a big change from the way NASA used to do things decades ago, where scientists would just sit on their data for years. So that policy has been adopted by a majority of these other space agencies, who now also are releasing their data, not only in a timely fashion, but for free. And that’s pretty exciting.
MARK SCHAEFER: So do other countries exchange their data too, or is this unique?
TOM STEIN: No. They’ve come onboard doing this. And there are certainly different data flows from folks like the European Space Agency, Jackson, Japan, the Indian Space Agency, and so on, and other national agencies within there. We have representatives from about 18, 19 countries that are involved in this process. And it’s a very organic culture. And really, it’s kind of like a hands-across-the-sea sort of endeavor.
MARK SCHAEFER: Yeah. That’s cool. Has there been any practical applications that have solved problems here Earth that you can think of, that were kind of interesting to you, something beyond just like science projects?
TOM STEIN: Well, certainly we’re learning a lot about traveling in space, developing not only a place for humans to land on the Moon, but how to support long-term activities, to develop opportunities to actually build materials there and make it more habitable. Now, when I say “we,” I don’t mean our university, but certainly the community at large.
The data have a lot of uses. And what I think is pretty exciting is we’re really moving– well, into– we’re really moving into Jim Gray’s fourth paradigm of data-driven discovery. I know that concept’s been around for about a decade.
And on the astronomy side, who’s way ahead of us doing this because their instruments are largely Earth-based and their data flow is enormous. But even we’re starting to get to a point where a majority of the data that are collected on planetary missions won’t be seen by human eyes. And so we’re looking to adapt the way our data are presented to the public so that we can apply some of these rigorous data science approaches to the data and find out what’s there.
There was recently published, last week or two, a paper on mud cracks observed on Mars, which speaks a lot to the past climate history of the planet and something we can look at and apply to our own planet because Mars and Earth are similar in so many ways, despite some size differences, distance from the Sun, and so forth. But really, for all those discoveries that we’re finding now, the data driven aspect of looking through all of these returned data are going to bring a lot more discoveries that we, I think, 10, 20 years from now, are going to look at and say, wow, that is incredible what was found out of that mission.
DOUGLAS KARR: This data is priceless obviously.
TOM STEIN: Absolutely.
DOUGLAS KARR: And so I’m curious, what extra steps do you guys take from a archival, and back-up, and everything to make sure that this is never lost?
TOM STEIN: Right. Great question. And it would keep me up at night if I felt like we weren’t doing it right. We have our primary data store, which is tiered, things like our database servers and so forth, set up on all flash. But even our entire archives, the rest of it’s sitting on spinning disk, that we mirror over to another spinning disk set in another location every day. And then we’ve got a tape backup. And our final fallback is a fourth copy at Greenbelt, Maryland where NASA has a large center for kind of putting the stuff on the shelf in the event something really bad happens.
I mean we don’t take lightly that responsibility of keeping the data. We get a lot of requests from people saying, hey, try our cloud or try this system.
And even university initiatives of centralizing data store opportunities– could you imagine if you’re a university researcher, who’s got a 18-month grant to do something, that’s not time to spin up an entire IT solution. So having a centralized opportunity at the university, it makes sense for them to jump in. But for us we’ve got to be really careful about what we do. So that’s why we maintain that entire ecosystem.
MARK SCHAEFER: It sounds like you’ve got a good collaboration with Dell, too. I mean, you’re using a lot of Dell products to help you solve some of these issues and keep that safe and also accessible.
TOM STEIN: That’s absolutely true. We are running Dell practically across the board. We’ve got the PowerEdge just supporting our VMware solution, with virtualizing all of our servers. Except, I think, one is not virtualized. We’ve got the trifecta of the All-Flash, the VMX, and the Isilon that support our data store. I do my development on my Inspiron laptop.
DOUGLAS KARR: So it’s all the way down to the finger tips.
TOM STEIN: Even my underpants are from Dell, I think.
MARK SCHAEFER: TMI. TMI. This is a family show, Tom. We’re trying to keep this– But that’s awesome. That’s awesome.
And are you thinking about innovating in any ways how you’re, like, presenting the data? So in new graphical interfaces or ways that might be more fun and accessible to, like, schoolchildren?
TOM STEIN: Yeah. That’s a great question. NASA has a couple of portals that are very popular, with especially schools. There’s some planetary atlas that has a lot of images, kind of picture-of-the-day sort of things, the press releases.
We have developed a web application that kind of merges all the data from the Rovers together. It’s not just, here’s this camera’s image or that spectrum that was collected, but tying in the Rover traverse map, where you can pan and zoom, click on a spot, find the targets there. But also, incorporating with that, other technologies.
We’ve had a machine learning experiment that came out of the Jet Propulsion Lab, to scrape through abstracts from a science conference that looked for targets from the mission against common elements and minerals that were being researched. Looking automatically, finding those relationships so that now, as we pulled out into our interface, you can search for, show me all the targets where there was potassium found. And then it has that excerpt out of the paper saying, this is the reference and why we’re claiming that you’re going to find potassium there.
So that that’s part of that data-driven discovery concept that we’re trying to bring home. And I’d say probably about half of the tool set that we built into our interface has come because somebody on the outside has asked us. Somebody said, hey, how far apart are these two rocks on an image? And we discovered that where there are stereo pairs and there’s XYZ data available, we could add to our interface the ability to click points, put in the measurements, develop elevation profiles, and let people download those data. So there’s is a lot of value-add in what we’re doing, on top of just saying, oh, here’s a directory of images.
MARK SCHAEFER: That would be so much fun, I think, just dreaming up new ways to present the data. That’s really, really awesome. If our listeners wanted to learn more about you, and what you’re doing, and get involved with your project, and have access to the data themselves, where would they go? How would they find you on the web?
TOM STEIN: The best thing to look for is PDS Geosciences. And that’s going to take them to our node home page. And there are links into our orbital data tools, our LAN admission data tools, to the archives. And also certainly a way to get in touch with us because we also do a lot of kind of– hand-holding might be a little bit of a strong term– but working with users, and students, and people who are interested. And understand that they– these are not your average jpeg off the digital camera or off the phone. These are science data. And it takes a little bit of effort to get into it. But once you do, it’s pretty cool to see the results.
MARK SCHAEFER: Tom, the data scientist, with a very, very long time, we sure appreciate you. We appreciate your time. It’s been so much fun for a NASA geek like me to learn about what you do. And we appreciate all of you for listening today, for spending time with us. We never take that for granted.
On behalf of Doug Karr, this is Mark Schaefer signing off for now. We’ll see you on the next episode of Luminaries.
ANNOUNCER: Luminaries, talking to the brightest minds in tech, a podcast series from Dell Technologies.
[MUSIC PLAYING]