DeSci Mic 🎙️ How AI is Essential to Advancing DeSci

Music Thank you. Music Thank you. Thank you. Music Thank you. Hey everyone.

Okay, awesome.

Yeah, welcome back, everyone, to another episode of DSci Mike. This week, we are revisiting an important and recurring topic around DSci,

0:04:51 - 0:04:59

the intersection of AI and DSci, specifically how AI is essential to advancing DSci.

0:05:00 - 0:05:09

We'll be hearing about the Macrocosmos AI protein folding subnet on BitTensor, which is really exciting.

0:05:10 - 0:05:11

Eager to learn more.

0:05:11 - 0:05:21

We're also joined by ZeroX Mikey F from WaterBearSci, who is going to be joining us to talk about their specific approach to AI.

0:05:22 - 0:05:26

And we have also got a fan favorite on Desai Mike returning.

0:05:26 - 0:05:32

That's Science Stanley, who works at Stanford and also Lillipad Tech. So we'll be hearing from him

0:05:32 - 0:05:39

too. As usual, I'm joined by my co-host Aaron McGinnis, who's actually behind the Desai Mike

0:05:39 - 0:05:49

account this week. Crypto Shrimp PhD is away this week. So yeah, Erin, did you want to kick it off?

0:05:50 - 0:05:57

Yeah, I'm super excited for this conversation. It's really leading off of some of the discussion

0:05:57 - 0:06:15

last week, which Michael from WaterBear was a part of and really kind of prompted deeper discussion on how AI is essential piece to advancing DSi and what the future of science looks like.

0:06:15 - 0:06:22

So to kick off this conversation, I would love to pick up where we left off last week.

0:06:20 - 0:06:22

where we left off last week.

0:06:22 - 0:06:29

And Michael, if you can kind of maybe recap some of that conversation

0:06:29 - 0:06:38

and then tie into just the broader topic of using AI to advance DSI.

0:06:38 - 0:06:42

Yeah, I think last week I was just saying that, like,

0:06:43 - 0:06:47

I don't think that, humans are gonna be doing research

0:06:47 - 0:06:48

in the same way.

0:06:48 - 0:06:53

What I think what's gonna happen is like AI is going to do,

0:06:55 - 0:06:56

like there's a couple of questions here.

0:06:56 - 0:06:59

One is like, what is AI able to do

0:06:59 - 0:07:02

and what is AI not able to do both now

0:07:02 - 0:07:06

and then like maybe like 50 years from now. So I say like in the

0:07:06 - 0:07:16

near future, AI is going to do all the stuff that traditional scientists do. So the role of humans

0:07:16 - 0:07:27

will be to contribute data and then all of the analysis will be done using AI. So you'll just like upload data sets

0:07:27 - 0:07:28

and then it will just be like,

0:07:28 - 0:07:29

oh yeah, beep, beep, beep.

0:07:29 - 0:07:31

Here's 10 questions that I've generated

0:07:31 - 0:07:34

that are interesting based on that data set.

0:07:35 - 0:07:36

And also, could you go fetch

0:07:36 - 0:07:38

these other hundred rows of data?

0:07:39 - 0:07:40

That'll help me complete my analysis.

0:07:41 - 0:07:44

And like initially what will happen

0:07:44 - 0:07:47

is like you'll be able to sort of move that AI

0:07:47 - 0:07:48

in different directions.

0:07:48 - 0:07:51

But I think at some point down the road,

0:07:51 - 0:07:53

it'll just sort of tell you what you need to know.

0:07:53 - 0:07:55

So that's like the near term stuff.

0:07:55 - 0:07:58

And then I think in the longer term,

0:07:58 - 0:08:01

if synthetic data works and if like,

0:08:04 - 0:08:09

like more advanced modeling works of physical systems.

0:08:09 - 0:08:13

I don't even know if you're going to need humans to collect the data anymore.

0:08:14 - 0:08:17

I think it'll probably be able to be done all in silico.

0:08:18 - 0:08:20

I don't know how close that one is, though.

0:08:20 - 0:08:21

That could be many, many years out.

0:08:20 - 0:08:24

That could be many, many years out, and we'll find out.

0:08:22 - 0:08:24

And we'll find out.

0:08:25 - 0:08:32

So yeah, it was just like a broad, broad topic that I think like AI is going to be doing a lot of science in the future.

0:08:34 - 0:08:35

Completely.

0:08:35 - 0:08:46

And part of this is also decentralization, being able to allow some of this collaboration or interoperability with that as well. Can you

0:08:46 - 0:08:54

touch on that and maybe loop some different trends or involvements you might have?

0:08:54 - 0:08:57

Wait, say that again. I didn't quite understand. Say it again.

0:08:57 - 0:09:05

Yeah. So that piece of it was pretty heavily focused on AI and science.

0:09:13 - 0:09:13

How do some of like blockchain or decentralization approaches fit into this whole equation?

0:09:20 - 0:09:20

Yeah. I mean, I think that the future will be rewarded based on who contributes the data.

0:09:24 - 0:09:25

So basically it'll work by like you upload a data set.

0:09:26 - 0:09:28

I don't know if any technology to do this today,

0:09:28 - 0:09:29

but it's certainly possible,

0:09:29 - 0:09:32

but based on how much information is in your data.

0:09:32 - 0:09:34

So you contribute a really good piece of data,

0:09:34 - 0:09:35

like it's really valuable,

0:09:35 - 0:09:40

then you'll be rewarded more heavily versus say everyone

0:09:40 - 0:09:41

contributes like shades of blue.

0:09:41 - 0:09:43

And then all of a sudden you contribute like red and everyone's

0:09:43 - 0:09:50

like, wow, red is a really different color this is super new then um basically what will happen is

0:09:50 - 0:09:56

you'll be rewarded more and because your data was more useful and that'll be used to train the ai

0:09:56 - 0:10:00

and then anyone who does inference on that model will basically pay a little bit of money and then

0:10:00 - 0:10:05

you know it goes back to the the people who contributed the data and uh to track

0:10:05 - 0:10:11

that i think it's probably easiest to do it with the blockchain like if you were to develop a system

0:10:12 - 0:10:19

that um was like trying if you wanted to like track where data came from like

0:10:19 - 0:10:24

the easiest you could you could do it in like a database or you you could do it on the blockchain

0:10:24 - 0:10:25

and the block it would you could do it either way but i or you could do it on the blockchain.

0:10:25 - 0:10:27

The blockchain, you could do it either way,

0:10:27 - 0:10:29

but I think the blockchain is the more natural place

0:10:29 - 0:10:33

to do it because then you can track where it came from

0:10:33 - 0:10:36

and the payment rails are already built in

0:10:36 - 0:10:37

and all that sort of stuff.

0:10:37 - 0:10:41

So that's like the sort of interaction between like AI

0:10:41 - 0:10:44

and the blockchain is like for incentives

0:10:44 - 0:10:45

and for rewarding people.

0:10:48 - 0:10:56

Completely. Can you maybe talk a little bit about the work you're doing in sleep as maybe a tangible

0:10:56 - 0:11:02

example of what this could end up looking like or how people might be able to interact with

0:11:02 - 0:11:07

everything you're talking about? Yeah, so we're trying to develop the world's best sleep data set.

0:11:08 - 0:11:12

So the way we're doing that is we started this thing called WaterBear.

0:11:12 - 0:11:15

And initially, WaterBear started off as a sleep data set

0:11:15 - 0:11:20

where people upload their Oura Ring data and then sort of launch.

0:11:21 - 0:11:27

Then they can have the ability to enroll in different research projects. So you can say

0:11:27 - 0:11:32

like, oh, when I sleep east-west, I sleep better. When I sleep north-south, I sleep better. When I

0:11:32 - 0:11:36

sleep on a sleep-eight bed, I sleep better. I sleep worse. When I sleep with or without a person,

0:11:36 - 0:11:42

I sleep better or worse. When I take this supplement, I sleep better or worse. And all

0:11:42 - 0:11:50

these things are just like very complicated. And we wanted to collect a large data set both of the the conditions going in and then the final

0:11:50 - 0:11:56

results as measured by the Oura Ring data and so we're tracking all that data and

0:11:56 - 0:12:01

then we developed like a platform for people to basically analyze their own

0:12:01 - 0:12:07

data sets using like it's a combination of chat GPT and pandas

0:12:07 - 0:12:10

that we built and a little bit of UI magic as well,

0:12:10 - 0:12:14

that basically like writes code to give you information

0:12:14 - 0:12:16

about statistical significance,

0:12:17 - 0:12:18

prompts you for next question.

0:12:18 - 0:12:21

So it'll just look at your data set and be like,

0:12:21 - 0:12:24

oh, here's three questions that I thought of based

0:12:24 - 0:12:30

on this data set that you provided me. And you can also look at your data set and be like oh here's three questions that I thought of based on this data set that you provided me and you can also look at your you can look at your data but you can also

0:12:30 - 0:12:35

look at the group's data and you can say like are there any outliers here what is the r squared

0:12:35 - 0:12:41

value is this statistically significant and you can begin to like interactively do science on

0:12:42 - 0:12:45

your sleep data set sort of with the help

0:12:45 - 0:12:50

of the AI and what's really cool about this is that you don't just need to be

0:12:50 - 0:12:55

you don't need to be a sleep expert you don't need to have a PhD in sleep or

0:12:55 - 0:13:00

bio to be able to do this because the AI gets you up to speed very quickly and

0:13:00 - 0:13:06

it it can both write the code to do the analysis, but also it has all that sort of background information

0:13:07 - 0:13:10

on what are the different interactions

0:13:10 - 0:13:12

between these things that cause,

0:13:15 - 0:13:18

all the biology behind sleep.

0:13:18 - 0:13:21

And so we think, and as ChatGPT gets better,

0:13:21 - 0:13:24

and I think this will improve and proven,

0:13:24 - 0:13:25

and down the road,

0:13:25 - 0:13:34

we want to train, fine-tune a specific model so that people could get more information

0:13:34 - 0:13:37

around sleep as it relates to them.

0:13:37 - 0:13:41

And all that information would be coming from this data set that we've collected around

0:13:41 - 0:13:45

all these different interventions and all this private data basically

0:13:45 - 0:13:50

and so we would hope people would pay for it and then all that money would sort of go back to

0:13:50 - 0:13:52

the people that contributed to the data set

0:13:52 - 0:14:00

and no so then we also we also sort of broaden the platform so that uh we're not just doing sleep now

0:14:00 - 0:14:05

we're doing ones on psychedelics we're doing ones on microplastics

0:14:05 - 0:14:10

we're doing ones on like blue on nutrition and blueprint diet so we're sort of using this

0:14:10 - 0:14:17

platform and anyone can create a project as well on it and use these tools to mesh together ai

0:14:17 - 0:14:47

blockchain and science those are super compelling examples of of how the intersection of AI and DSI can be used to advance science as well as one's own personal health. intersection that might be either building off of some of the projects you're working on or

0:14:47 - 0:14:52

adjacent to while you're focused on building all of this out?

0:14:52 - 0:15:03

Hmm. Yeah. I mean, I think we, I think the area of data analysis and like,

0:15:00 - 0:15:10

And like, and an AI is a big topic that will be interesting.

0:15:10 - 0:15:17

I think reward, like measuring the value of information is that's used for training a model will be very interesting.

0:15:17 - 0:15:26

If someone figured out how to say like, what is the incremental information that is able to be extracted from this data

0:15:26 - 0:15:27

would be interesting.

0:15:29 - 0:15:30

Those would be my two areas

0:15:30 - 0:15:32

that I think are quite interesting.

0:15:32 - 0:15:34

And also like robot process automation,

0:15:34 - 0:15:37

just like automating science collection,

0:15:37 - 0:15:40

automating data collection using a cloud lab

0:15:40 - 0:15:42

or a robot would be really cool.

0:15:48 - 0:15:54

Definitely, there are a lot of really exciting opportunities across this whole intersection of how we can use AI and more decentralized technologies or

0:15:54 - 0:16:09

more privacy and security focused approaches for advancing science and health. And Macrocosmos, I know you're taking kind of another approach to

0:16:09 - 0:16:18

what the future of AI and DSI at this intersection point can look like. So I might want to start by

0:16:18 - 0:16:27

having you share a bit about what Macrocosmos is to begin with, and then we'll start getting into the subnet you recently launched.

0:16:28 - 0:16:30

Yeah, so thanks, Erin, and thanks for having us.

0:16:31 - 0:16:33

So I'm Oscar Zanis here at Macrocosmos.

0:16:33 - 0:16:39

We are an AI research lab that span out of the OpenTensor Foundation,

0:16:39 - 0:16:41

who are the original builders of BitTensor.

0:16:42 - 0:16:47

And as of today, we're about 20 engineers, AI researchers, PhDs,

0:16:47 - 0:16:56

working on how we can build the world's most powerful incentive mechanisms for the provision

0:16:56 - 0:17:10

of intelligence. And BitTensor is a decentralized AI, decentralized compute protocol that is about four or five years old now.

0:17:11 - 0:17:27

It was conceived by Const and others. I guess, you know, the community to write rules and incentive mechanisms for people to create,

0:17:28 - 0:17:33

I guess, problems that can be solved. And Stefan will articulate this a little bit more, but

0:17:33 - 0:17:39

we are basically game makers for trying to create incentive systems and games for people to provision

0:17:39 - 0:17:46

computer intelligence resources to solve tricky problems. and our bread and butter in this space has been

0:17:46 - 0:17:51

the creation of of i guess what a lot of people see today is the very sexy elements of artificial

0:17:51 - 0:17:56

intelligence so we were the original builders of the first subnet that proved you could do

0:17:56 - 0:18:00

decentralized inference at scale so what that looks like is a thousand different large language

0:18:00 - 0:18:07

models all competing to give the best response in a computationally efficient and effective manner.

0:18:07 - 0:18:11

We've sort of since spanned out to do broader topics in that we're currently

0:18:11 - 0:18:16

hosting a pre-training competition on BitTensor where the team developing

0:18:16 - 0:18:20

foundational pre-trained models at 7 billion beat the efficiency

0:18:20 - 0:18:23

of Falcon 7 billion by 50 times and that it

0:18:23 - 0:18:26

took 50 times less resources

0:18:26 - 0:18:29

for a decentralized, incentivized, competitive team

0:18:29 - 0:18:31

to train a foundational model.

0:18:31 - 0:18:36

And last week we launched our first protein folding subnet.

0:18:36 - 0:18:39

And Stefan and I both have research backgrounds

0:18:39 - 0:18:41

as do a lot of our team.

0:18:41 - 0:18:43

And for us, this was saying,

0:18:43 - 0:18:48

can we create mechanisms and ways for

0:18:48 - 0:18:53

decentralized communities to organize compute around problems that are not

0:18:53 - 0:18:59

just sexy and interesting from an AI perspective but are meaningfully able to

0:18:59 - 0:19:03

push forward the boundaries of humanity and solve a lot of humanities tricky

0:19:03 - 0:19:10

problems and we thought protein folding was a really elegant expression of how can people organize

0:19:10 - 0:19:12

resources to solve these problems.

0:19:12 - 0:19:15

And there's a lot of reasons for that.

0:19:15 - 0:19:22

I guess most obviously Google obviously released AlphaFold3, which is a sort of very, very

0:19:22 - 0:19:25

high profile AI product that is not an LLM.

0:19:25 - 0:19:26

And secondarily to that,

0:19:26 - 0:19:30

sort of the first decentralized citizen science project

0:19:30 - 0:19:31

was arguably folding at home.

0:19:31 - 0:19:33

So we see protein folding

0:19:33 - 0:19:36

as being a really beautiful expression of BitTensor's ability

0:19:36 - 0:19:40

to organize people and resources around a problem,

0:19:40 - 0:19:42

but more importantly,

0:19:42 - 0:19:46

to help see if decentralized communities can do things that are a bit more

0:19:46 - 0:19:52

altruistic and a bit more interesting and we see protein folding as the first subnet on bit tensor

0:19:52 - 0:19:59

that is doing something that is is ultimately a little bit more altruistic in a broader sense

0:19:59 - 0:20:11

but more importantly we're really interested to see how much the community and how much a decentralized community can decide how important those things are in a distributed manner.

0:20:11 - 0:20:18

Because in traditional sort of capitalist systems, often it's the thing that makes the most money that gets the most attention.

0:20:19 - 0:20:23

Whereas our thesis is in decentralized systems, a lot of people actively care about these things.

0:20:23 - 0:20:25

And we think there's a lot of ability actively care about these things and we think

0:20:25 - 0:20:30

there's a lot of ability to organize resources to solve these problems and for now it's protein

0:20:30 - 0:20:36

folding but in the future it might span to anything from particle physics to perhaps solving

0:20:36 - 0:20:41

some of the challenges around climate science or climate change and i'll pause for breath there and

0:20:41 - 0:20:46

hand over to crux my co-founder who probably have a little bit more to say about protein folding specifically.

0:20:48 - 0:20:50

Sure. Thanks, Will.

0:20:50 - 0:20:58

So I guess just to zoom out a little bit and motivate a little bit more what we're here to do.

0:20:59 - 0:21:07

There's clearly an opportunity for sort of massive collaboration at scale through incentivized decentralized systems

0:21:07 - 0:21:12

there's an enormous amount of latent compute in the world that can be harnessed and ideally

0:21:12 - 0:21:18

oriented towards humanitarian goals let's say however one of the unique challenges that's

0:21:18 - 0:21:23

presented with that is if it's a truly decentralized system and that typically means it's anonymized as

0:21:23 - 0:21:29

well so the participants are not necessarily identifiable, what typically comes with this as a result

0:21:29 - 0:21:32

is it can be quite adversarial.

0:21:32 - 0:21:36

What this means is because it's effectively an economic vehicle, people compete to make

0:21:36 - 0:21:41

the most of the token within the context of that subnet or that environment.

0:21:41 - 0:21:45

But this does not always lead to the results that one would like.

0:21:45 - 0:21:51

I often use the phrase, well, borrow the phrase, following the letter of the law, but not following

0:21:51 - 0:21:55

the spirit of the law. What this typically means in decentralized systems that are anonymized

0:21:55 - 0:22:02

is people will find ways to game the system. So imagine protein folding at home.

0:22:04 - 0:22:07

This is great because everyone in that community wanted to participate.

0:22:08 - 0:22:10

People were willingfully participating.

0:22:10 - 0:22:11

And most important, it was voluntary.

0:22:12 - 0:22:17

This more or less aligns everyone's effort vector towards sort of a common good scenario.

0:22:18 - 0:22:24

What's different here is when you add the economic layer to this as sort of a blockchain-based system,

0:22:20 - 0:22:27

economic layer to this as a sort of a blockchain based system, what can quite quickly happen

0:22:27 - 0:22:33

is sort of a race to the bottom, where people find ways to subvert the rules or perform

0:22:33 - 0:22:37

the calculations differently or sort of tinker with the outputs.

0:22:37 - 0:22:40

And the result of this can be a very rapid degradation of the quality of the compute

0:22:40 - 0:22:41

that's being carried out.

0:22:42 - 0:22:46

And so what Macrocosmos is here to do, what we're doing with the protein following subnet

0:22:46 - 0:22:47

and what we're doing at large

0:22:47 - 0:22:50

is trying to sort of really click into place.

0:22:50 - 0:22:53

How does one build an adversarially robust algorithm

0:22:53 - 0:22:55

that allows people to cooperate at scale,

0:22:55 - 0:22:59

even when, again, it's anonymized, it's highly adversarial.

0:22:59 - 0:23:01

And that is a very non-trivial problem.

0:23:01 - 0:23:04

And in fact, it's often case specific.

0:23:04 - 0:23:06

So solving this in the context of

0:23:06 - 0:23:11

evaluating the quality of natural language, which is inherently quite fuzzy and subjective,

0:23:12 - 0:23:17

presents one challenge that we've been working on for over a year. Whereas on the other hand,

0:23:17 - 0:23:25

protein folding actually maps quite gracefully onto a decentralized system because it's very objective.

0:23:29 - 0:23:33

A protein that has been folded in a more biologically appropriate way has, let's say, thermodynamic properties like its energy and so on and so forth.

0:23:34 - 0:23:37

And these things are sort of, it's impossible to game that system.

0:23:37 - 0:23:39

Nature, you cannot game it.

0:23:39 - 0:23:47

And this is a fantastic, I guess, match for a decentralized system.

0:23:47 - 0:23:54

People are competing to provide basically the most protein folding molecular dynamic simulations steps per second.

0:23:54 - 0:23:57

And that's kind of the access that everyone's racing towards.

0:23:57 - 0:24:00

And you better believe the race to the bottom is going to happen anyway.

0:24:00 - 0:24:06

anyway but why why not make it something which is focused towards um basically just cranking out as

0:24:06 - 0:24:12

many biologically interesting proteins rather than once again kind of going off the rails and people

0:24:12 - 0:24:19

are instead computing to competing along the wrong axes i guess the last point i'll make on this as

0:24:19 - 0:24:25

well and um something that is really remarkable about Folding at Home, which of course inspired

0:24:25 - 0:24:30

us to actually take on this current endeavor, is it has often been said that Bitcoin itself

0:24:30 - 0:24:33

is the largest supercomputer in the world.

0:24:33 - 0:24:40

And this is kind of a really great soundbite to emphasize how much potential there is for

0:24:40 - 0:24:46

decentralization and for the blockchain to harness all of that. Similarly, folding at home,

0:24:46 - 0:24:51

not many people know this, but with sort of the beginning of COVID, as the world collectively

0:24:51 - 0:24:55

focused a lot more on trying to contribute in any way we could towards the acceleration of

0:24:55 - 0:25:03

treatments and vaccines and so on and so forth, actually folding at home adoption rose so much

0:25:03 - 0:25:08

at the beginning of 2020 that it became the world's first exaflop compute system.

0:25:08 - 0:25:22

So this goes to show that an enormous amount of progress was made in molecular biology research because folding at home was an appropriate vehicle for people's voluntary efforts to be aligned towards humanitarian goals.

0:25:22 - 0:25:26

And we're trying to replicate that. We know that the opportunity is large

0:25:26 - 0:25:28

and with well-crafted systems,

0:25:28 - 0:25:31

we believe that the results will speak for themselves.

0:25:35 - 0:25:36

Amazing.

0:25:36 - 0:25:38

If people are listening in right now

0:25:38 - 0:25:42

and they're excited about what you're building

0:25:42 - 0:25:48

and maybe an opportunity to contribute towards this more

0:25:48 - 0:25:56

collaborative and decentralized approach to exploring other types of ways to fold proteins.

0:25:56 - 0:26:03

How might they be able to get involved? Are they like, is the subnet usable for them?

0:26:04 - 0:26:06

What are different opportunities for people?

0:26:07 - 0:26:09

Yeah, great question, Aaron.

0:26:09 - 0:26:10

Thanks for asking that.

0:26:10 - 0:26:13

So really where we're getting to right now

0:26:13 - 0:26:15

is we've sort of released this a week ago.

0:26:15 - 0:26:17

We're stabilizing the network,

0:26:18 - 0:26:20

which is to say we're sort of fleshing out

0:26:20 - 0:26:22

any of those early bugs that can happen

0:26:22 - 0:26:23

when you take a system like this into production.

0:26:23 - 0:26:29

And trust me, there's plenty. What we've got our sites firmly set on here is inviting

0:26:29 - 0:26:34

researchers to come in for fully funded grants a completely free research experience from their

0:26:34 - 0:26:39

point of view because it's kind of a mutually beneficial symbiosis here we want to demonstrate

0:26:39 - 0:26:45

that bit tensor can bring real utility to researchers, especially people that want to carry out high

0:26:45 - 0:26:51

caliber, high impact research. And for whatever reason, those researchers don't always necessarily

0:26:51 - 0:26:56

get first choice when it comes to HPCs and clusters and traditional avenues for them to

0:26:56 - 0:27:01

access compute at scale. What we want is to invite them in to work with us so that we can show that

0:27:01 - 0:27:05

BitTensor can also address these really imperative problems.

0:27:08 - 0:27:08

And of course, they benefit from unlimited access to compute.

0:27:11 - 0:27:11

They can basically steer the tank, for lack of a better phrase.

0:27:16 - 0:27:19

So they have this massive computational system that they can use on demand, oriented towards the specific parameters for their research.

0:27:20 - 0:27:21

And that's our dream here.

0:27:21 - 0:27:22

The invitation is open.

0:27:22 - 0:27:28

We are accepting applications for people that want to start using the subnet as a research

0:27:28 - 0:27:28

vehicle.

0:27:29 - 0:27:35

And yeah, I guess the only thing I would add there as well is protein folding is our first

0:27:35 - 0:27:36

excursion into the space.

0:27:36 - 0:27:41

But as Will mentioned, we're both very passionate about research and we're trying to think about

0:27:41 - 0:27:45

how we can come up with a sort of more general set of principles

0:27:45 - 0:27:50

to translate traditional research problems into ones that can become adversarily resistant

0:27:50 - 0:27:57

distributed problems, benefiting from the massive parallelization opportunity, but also making sure

0:27:57 - 0:28:03

that they can't be subverted and they're basically exploit resistant. So it's going to be moving

0:28:03 - 0:28:05

really fast. Protein folding is going to be

0:28:05 - 0:28:08

our first endeavor, but we hope not the last.

0:28:11 - 0:28:18

Amazing. That's super exciting. I know in some of our conversations, there's particular passion

0:28:18 - 0:28:29

amongst the team on physics. Do you see, I know there's some mention of like particle physics exploration. Are there

0:28:29 - 0:28:38

any other kind of hopes for within the D-Site ecosystem at large or other visions kind of

0:28:38 - 0:28:47

along those lines? Absolutely. Well, what we've been trying to identify here is what are the biggest lever arms that we can provide?

0:28:47 - 0:29:01

We can solve very specific problems like protein folding, but there's certain fundamental operations, let's say, that are much more powerful because they're so general and they're so agnostic to the specific context of certain scientific problems.

0:29:00 - 0:29:01

of certain scientific problems.

0:29:01 - 0:29:05

And I believe one example of this is matrix diagonalization, right?

0:29:01 - 0:29:04

And I believe one example of this is matrix diabolization.

0:29:05 - 0:29:10

This is kind of like the spine which holds up the body of so many different disciplines.

0:29:10 - 0:29:15

In fact, it goes from modern machine learning with single value decomposition all the way

0:29:15 - 0:29:17

to optimizing roots.

0:29:18 - 0:29:22

Of course, a plethora of physics problems all depend on diagonalizing huge wave function

0:29:22 - 0:29:27

matrices and so on and so forth. So we think that strategically, we actually

0:29:27 - 0:29:30

want to double down on the most primitive problems that

0:29:30 - 0:29:33

have the most sort of access opportunity,

0:29:33 - 0:29:37

and then translating that into a specific use case,

0:29:37 - 0:29:40

whether it's particle physics, condensed metaphysics.

0:29:40 - 0:29:43

That's simply a matter of switching out

0:29:43 - 0:29:46

the sort of the math back end, so to speak,

0:29:46 - 0:29:48

and having access to our subnets to power that.

0:29:51 - 0:30:00

That's super exciting. I'm really eager to see how as more researchers engage with what

0:30:00 - 0:30:06

you've built out and just more scientists across the board engage with some of these

0:30:06 - 0:30:13

different subnets and networks and compute capabilities, what might be possible as we

0:30:13 - 0:30:31

keep advancing science overall. And Stanley, I know you're also involved in the AI compute space with some of your work at Lilypad. Can you share with us how Lilypad is

0:30:31 - 0:30:40

approaching DSi and maybe looking at it from another angle, such as discovering different drugs?

0:30:40 - 0:30:51

drugs? Hey, yeah, absolutely. And also, good morning. So happy to be here. Wanted to also

0:30:51 - 0:31:00

just shout out WaterBear for such an incredible project. As a person who is drinking coffee as

0:31:00 - 0:31:05

quickly as possible this morning because he struggles with sleep hygiene sometimes.

0:31:05 - 0:31:10

I think it's just hard to overstate the importance that sleep plays in health.

0:31:11 - 0:31:13

So like, yeah, doing God's work over there.

0:31:14 - 0:31:16

Yeah, we like you as well.

0:31:17 - 0:31:22

Oh, well, listen, mutual love fest and that there'll be more love as I drink coffee too.

0:31:23 - 0:31:25

But 100%, let's talk some proteins. fest and that there'll be more love as I, as I drink coffee too. Um, but a hundred percent,

0:31:25 - 0:31:31

let's talk some proteins and, um, you know, all of these kind of infrastructure questions of,

0:31:31 - 0:31:37

uh, kind of storing, accessing, scheduling, compute via the blockchain are so interesting.

0:31:38 - 0:31:46

Um, but at, at Lilypad, um, I'm, uh, kind of focused on the science, and it has been so fun.

0:31:46 - 0:32:00

And it kind of actually is an interesting configuration that came as a result of my kind of serial tendency to come into Twitter spaces and complain that researchers don't have access to compute.

0:32:00 - 0:32:06

The good folks at Lillipad reached out, and they said, what could we be doing with this compute?

0:32:07 - 0:32:13

And man, is it so cool to hear about other projects in the space and folding.

0:32:13 - 0:32:25

I'm a person who came to kind of molecular engineering as a passion through folding at Home. Me and my friends would go up on Sand Hill Road and dumpster dive for

0:32:25 - 0:32:33

computers that we could Frankenstein into some truly ridiculous clusters in the chase for points.

0:32:35 - 0:32:39

But yeah, then kind of maybe some of the specific stuff that I'm looking at would be,

0:32:40 - 0:32:51

you know, maybe the process that's directly downstream from folding. I'm currently doing work to set up docking as a system that could run on decentralized infrastructure. And then

0:32:51 - 0:32:57

docking is kind of interesting. That's where, you know, you fold proteins into these shapes, but

0:32:57 - 0:33:03

you care about the shapes because the shapes determine how they interact with other objects,

0:33:03 - 0:33:06

how they dock with other objects. how they dock with other objects.

0:33:06 - 0:33:10

So the docking simulation kind of starts maybe once you've folded,

0:33:10 - 0:33:13

and then you're sort of simulating the kind of particulars

0:33:13 - 0:33:17

of how these kind of lock and key configurations happen

0:33:17 - 0:33:22

between a protein and other types of molecules.

0:33:23 - 0:33:25

So yeah, that would be like kind of the main thing

0:33:25 - 0:33:27

I'm up to at the moment at Lilypad.

0:33:41 - 0:33:42

Awesome.

0:33:42 - 0:33:51

I think my internet is a tad bit spotty. So Stanley, definitely please keep elaborating on

0:33:51 - 0:34:00

some of kind of even more downstream from this, but it looks like when it might get to the stage

0:34:00 - 0:34:06

of interacting with people and some of the other involvements you have,

0:34:06 - 0:34:12

especially looking at different rare diseases. Oh, yeah, totally. And man, I have so much more

0:34:12 - 0:34:18

to say about proteins and molecular engineering. And hopefully, you know, we'll be able to kick

0:34:18 - 0:34:25

that ball back and forth a little bit. Because you have to say on the topic of sleep hygiene, since AlphaFold3

0:34:25 - 0:34:30

came out, it's been a pretty epic moment. I really think we're living through history.

0:34:31 - 0:34:36

So, you know, I think the race to understand what we can do with a tool like that, but also,

0:34:36 - 0:34:41

you know, I think there's a beautiful decentralized science story playing out where a number of

0:34:41 - 0:34:46

groups are kind of rushing to reproduce AlphaFold3 in the open source.

0:34:48 - 0:34:50

So yeah, maybe we can loop back around to that.

0:34:50 - 0:34:57

But yeah, 100% got to shout out, you know, something we're doing on a different side of Desai AI,

0:34:57 - 0:35:00

which is the Stanford Rare Disease AI Hackathon.

0:35:01 - 0:35:06

We'll try to post a link to the site for that. But high level, you know, we're

0:35:06 - 0:35:12

just hoping to create high quality models and validation sets for, you know, the medical and

0:35:12 - 0:35:18

clinical practice around rare genetic diseases. So we're doing this through this really cool

0:35:18 - 0:35:26

collaborative intelligence process, aka a hackathon. We have, you know, a couple dozen teams from all over the

0:35:26 - 0:35:31

world, each working on different specific problems, training different specific models.

0:35:32 - 0:35:38

And then we're having those models interacted with and validated by a network of experts in

0:35:38 - 0:35:57

rare genetic disease. And this is just really interesting because in some ways, when you think of like all the different knowledge communities that exist in the world, I think rare disease communities have some of the deepest wisdom.

0:35:57 - 0:36:05

You know, you have these people who, you know, because they've had to have become experts on their condition, but also incredibly decentralized,

0:36:05 - 0:36:13

fragmented knowledge. And, you know, I think it's always acknowledged challenge that it's hard to

0:36:13 - 0:36:18

connect with the right communities of support when you do have a rare illness. So yeah, it's

0:36:18 - 0:36:23

really interesting to kind of look at the process of training and intelligence as, you know,

0:36:23 - 0:36:25

something that can happen collaboratively

0:36:25 - 0:36:30

that can kind of mobilize these communities and that can, in a really powerful way, hopefully

0:36:30 - 0:36:36

distill some of the wisdom that's present and then put it to work for patients. So yeah, that's

0:36:36 - 0:36:42

been really exciting. And yeah, in about a month, we'll be doing a kind of demo day where

0:36:42 - 0:36:48

teams will be presenting models and we'll be doing some kind of collaborative validation with our medical mentors.

0:36:52 - 0:36:59

Amazing. That's super exciting. And I know some of the work you're doing here under like the

0:36:59 - 0:37:27

umbrella of research to the people is in collaboration with Stanford. Are there other either larger academic or university institutions or private sector as well who are leaning into either leveraging AI heavily in some of these investigations or a more decentralized approach to solving some of the problems that they're really passionate about?

0:37:28 - 0:37:30

Oh, I think 100%.

0:37:30 - 0:37:47

And I think we're seeing a lot Genes that kind of acts as a research

0:37:47 - 0:37:56

intermediary for a lot of institutions that do genetic medicine. And yeah, it was just kind of

0:37:56 - 0:38:01

so cool because the contributors were very high level machine learning engineers. The projects

0:38:01 - 0:38:07

were very high quality and they were being done with data that, you know, had never

0:38:07 - 0:38:12

really been analyzed before at all. And so it was really something effective and beautiful that

0:38:12 - 0:38:16

happened and kind of specifically because you put, you know, just the right problem in front

0:38:16 - 0:38:25

of the right people. So yeah, I am definitely seeing more of that happening, but I think we need even more.

0:38:35 - 0:38:36

And I have to say, for the macrocosmos folding net, man, I'm going to be keeping that in mind.

0:38:39 - 0:38:40

Because resources like that and access to engineering talent like that,

0:38:45 - 0:38:51

I think there's a lot of people working on little niche areas of science that don't even know it's available. And so I really also would love to shout out the organizers of this space

0:38:51 - 0:38:55

for just having the conversation because, you know, I think the more we're talking about

0:38:55 - 0:39:01

what's increasingly possible in silico and, you know, how many engineers and builders are out

0:39:01 - 0:39:08

there who really like want to pitch in for good causes. Yeah, I think that we're going to connect the dots, get a really beautiful picture drawn.

0:39:09 - 0:39:15

I'm not sure if you'll be able to dumpster dive the hardware you need to compete on our subnet,

0:39:15 - 0:39:16

but we invite you to come join us anytime.

0:39:18 - 0:39:19

Oh, no, absolutely.

0:39:19 - 0:39:20

But that's what I mean.

0:39:20 - 0:39:25

Like things that, you know, we saw happening in microcosm long ago or

0:39:25 - 0:39:31

happening at a worldwide scale in macrocosm you could say ah listen you

0:39:31 - 0:39:35

caught me that's just where I was going sir and I and I have to say 100% like I

0:39:35 - 0:39:43

said want to get some hopefully proteins folding over there and what we found

0:39:43 - 0:39:47

super interesting is that the pace of the community we launched an

0:39:47 - 0:39:53

initial um cpu miner when we initially launched the subnet there's already guys who've built gpu

0:39:53 - 0:39:58

miners that are 10 times faster than the base miner um and would love to see some of those uh

0:39:58 - 0:40:06

those open source alpha fold threes you were talking about stanley getting into play on the subnet. So super exciting.

0:40:07 - 0:40:07

Oh, heck yeah.

0:40:11 - 0:40:11

And, you know, maybe one way or another we can connect after this.

0:40:16 - 0:40:19

And if nothing else, you guys, my original training was in HPC for particle physics. So it sounds like we might have some stuff to nerd out on.

0:40:20 - 0:40:25

But, yeah, you know, diffusion based systems like alpha fold three, like learners

0:40:25 - 0:40:26

for molecular structure.

0:40:27 - 0:40:32

Um, I think they're just going to get, uh, more and more powerful, obviously effective

0:40:32 - 0:40:33

for medicine.

0:40:33 - 0:40:38

Um, I was excited before someone mentioned kind of like engineering of thermostability

0:40:38 - 0:40:41

and that's been something I've been working on a lot.

0:40:41 - 0:40:45

And, and I think that like, you know, when you think about the total space of

0:40:45 - 0:40:49

all possible proteins, which is, you know, something we can think about mathematically

0:40:49 - 0:40:56

to some degree, we've only really scratched the surface of what proteins can exist. And so I think

0:40:56 - 0:41:01

that even beyond medicine, like when we really have control and powerful tools for like precision

0:41:01 - 0:41:07

engineering of proteins and enzymes, I think it's going to rearrange the whole economy.

0:41:07 - 0:41:08

It's very exciting stuff.

0:41:10 - 0:41:12

But it's going to take a lot of compute.

0:41:17 - 0:41:18

Absolutely.

0:41:19 - 0:41:24

All of these different spaces are making really exciting progress.

0:41:20 - 0:41:28

these different spaces are making really exciting progress. Were there any other additional points

0:41:28 - 0:41:36

that you wanted to hop back to, whether related to LilyPad or some of the conversation on protein

0:41:36 - 0:41:48

folding? Stanley? No, I think I kind of covered the main stuff um you know one one like small nerdy

0:41:48 - 0:41:53

thing that i'm really excited about is is actually um protein language models um so i'm sort of just

0:41:53 - 0:42:00

finishing up a small paper about um using environmental data sets to fine-tune uh protein

0:42:00 - 0:42:08

language models um but yet to me sort of like like tools like AlphaFold, tools like, you know,

0:42:08 - 0:42:14

diff dock and docking simulation, and then protein language models. These are just tools that are

0:42:14 - 0:42:22

going to add up to some real magic in the short term. If I may, you know, there was a tidbit from

0:42:22 - 0:42:25

earlier in the conversation that I would like to circle back on.

0:42:30 - 0:42:31

You know, it's been mentioned that, you know, AI is becoming, whether it's like, you know,

0:42:37 - 0:42:44

large language model based or some other approaches, you know, increasingly better and competitive with humans on doing sort of like repeatable sort of like rote tasks, like, you know,

0:42:44 - 0:42:45

for example, like data entry

0:42:45 - 0:42:47

or, you know, this type of thing.

0:42:47 - 0:42:52

And I think it's also been demonstrated that like, you know, AI is also capable of like

0:42:52 - 0:42:57

inference, you know, generating hypotheses and testing hypotheses and so forth, right?

0:42:58 - 0:43:11

So, you know, I mean, how does this change the relationship of human researchers and scientists to the endeavor of science, right?

0:43:20 - 0:43:34

I can't believe that at some point, whether it's soon or in the future, AI will be able to steer scientific discovery in terms of what are the important goals to pursue?

0:43:34 - 0:43:40

Or will that be sort of like a uniquely human role for the foreseeable future?

0:43:42 - 0:43:44

I'd happily take a first pass at this one.

0:43:48 - 0:43:48

I would personally delineate it into sort of

0:43:52 - 0:43:52

different time horizons so that we can speak with perhaps a little bit more confidence.

0:43:58 - 0:44:06

What I think has already been demonstrated with some success is sort of embedding physics priors into AI models. So sort of finding a way to get a model that respects things like the conservation of energy, momentum and time and so on and forth.

0:44:07 - 0:44:16

And the value of that is they are much more effective at researching interesting scenarios and they become much more attuned to what might be novel and interesting.

0:44:17 - 0:44:30

What you're basically wanting to do is create a novelty detection system, which is what scientific discovery is effectively doing. It's just a massive novelty search in a way. Longer term, it's much harder to know how autonomous those systems

0:44:30 - 0:44:34

would be, in my opinion. I know that's not the most fun and quotable response, but that's the

0:44:34 - 0:44:40

academic honesty in me not really wanting to go out on a limb there. I think that the systems,

0:44:40 - 0:44:49

first of all, need to understand what the sort of constraints that reality imposes upon their experiments and the kind of reasoning that they're able to perform.

0:44:49 - 0:44:51

And I think that we've had promising results so far.

0:44:51 - 0:44:57

I think it's only going to get better as we are able to embed that more and more deeply.

0:44:58 - 0:45:02

And look, I'll be watching with bated breath to see how far this gets.

0:45:03 - 0:45:10

But think of it more like a coding copilot that's going to become more and more competent. And it's a continuum. I don't see

0:45:10 - 0:45:15

this as zero one. It's going to be a productivity boost in the interim and maybe longer term. It's

0:45:15 - 0:45:20

going to be minimal direction. Something like a Jarvis from Ironman that's kind of like your

0:45:20 - 0:45:26

helpful assistant that's also an absolute badass researcher, but doesn't necessarily need to become entirely autonomous

0:45:26 - 0:45:28

to bring that 95% of value.

0:45:29 - 0:45:30

So that's my honest take on that.

0:45:42 - 0:45:47

Anyone else have any other additions to that question?

0:45:55 - 0:46:07

Awesome. Well, we have some time now for any questions from the audience as well. So, if audience members have questions, feel free to request the mic

0:46:07 - 0:46:15

to dive deeper into this conversation on the intersection between AI and decentralization

0:46:15 - 0:46:22

to advance science and de-sci. Or you can leave a comment on this space and we can get to that and

0:46:22 - 0:46:27

kind of bring that into the voice conversation as well.

0:46:30 - 0:46:36

Given that, speakers, anyone up here, if you'd like to reiterate a point as well,

0:46:36 - 0:46:39

there's opportunity for us to dive into that right now too.

0:46:40 - 0:46:52

in the meantime we have one person who's come up but michael go oh i was just gonna say um

0:46:52 - 0:47:02

like i i wonder at what point like it will just be able to calculate like the need for data like

0:47:02 - 0:47:08

will data ever will we ever get to a point where data is completely not needed because we're

0:47:08 - 0:47:13

able to calculate everything from like first principles that would be like a

0:47:13 - 0:47:14

real game changer.

0:47:15 - 0:47:18

And like if synthetic data is able to actually work,

0:47:18 - 0:47:20

that would be really,

0:47:20 - 0:47:20

really crazy.

0:47:20 - 0:47:25

And then there'd be sort of no limit on how fast science could go.

0:47:25 - 0:47:32

Because right now science is still sort of bottlenecked by data capture.

0:47:33 - 0:47:39

But if synthetic data is going to work, like then there's really no sort of limit

0:47:39 - 0:47:43

on what is possible with science.

0:47:44 - 0:47:46

But I think that's still a big question

0:47:46 - 0:47:48

and given a different time horizon as well.

0:47:54 - 0:47:58

I see Sterling's got his hand up.

0:47:58 - 0:47:59

I'd like to give him a chance first, maybe.

0:48:01 - 0:48:02

Cool, cool.

0:48:03 - 0:48:04

I actually, yeah,

0:48:04 - 0:48:16

I have a novel situation right now where I'm doing an AI project through a DSI catalyst molecule model.

0:48:20 - 0:48:27

captured from patients who have Alzheimer's and dementia. And then that data can be analyzed by

0:48:27 - 0:48:33

an AI model. And then that, we're hoping, can be kind of licensed out

0:48:33 - 0:48:39

to be used in further treatment down the line to kind of advance that area.

0:48:40 - 0:49:05

I'm curious, does anybody... Where do you kind of delineate the line between, so you have data, right? So we're using like, let's just say, WAV files or OGG files. That's the data. Then there's AI code, which is basically Python right now.

0:49:05 - 0:49:06

And then there's a model.

0:49:08 - 0:49:10

Where do you guys kind of see the delineation

0:49:10 - 0:49:12

and where does it decide, where is it not?

0:49:12 - 0:49:15

Is there a good framing to look at this

0:49:15 - 0:49:17

in terms of from people who are a little bit further on

0:49:17 - 0:49:19

in using AI in these kinds of contexts?

0:49:19 - 0:49:22

This would be my first time incorporating AI

0:49:22 - 0:49:23

in a study like this.

0:49:24 - 0:49:29

But yeah, does anybody have any thoughts on how to kind of talk about it?

0:49:29 - 0:49:32

Where is it DSI? Where is it not?

0:49:32 - 0:49:39

Or any novel kind of insights into kind of how to navigate that conversation around utilizing AI in this kind of,

0:49:39 - 0:49:42

is like a real example to the Catalyst model.

0:49:42 - 0:49:47

If anybody has any thoughts, I'd love to hear any feedback or something like that.

0:49:49 - 0:49:50

I'd say if you can like,

0:49:50 - 0:49:51

if you can like separate your,

0:49:51 - 0:49:57

what you're trying to do into chunks that you can like give people.

0:49:57 - 0:49:57

And it could,

0:49:58 - 0:49:58

to meet you certainly.

0:49:58 - 0:49:59

I don't know if we've ever talked,

0:49:59 - 0:50:00

we've interacted on the,

0:50:00 - 0:50:01

on Twitter a bunch.

0:50:01 - 0:50:05

I'd say if you can give people like chunks of tasks

0:50:05 - 0:50:06

for them to do,

0:50:06 - 0:50:07

so maybe it's some analysis

0:50:07 - 0:50:09

or some sort of data collection

0:50:09 - 0:50:12

or some sort of finding related work

0:50:12 - 0:50:13

or recruiting patients,

0:50:13 - 0:50:15

that would be like a chunk of,

0:50:15 - 0:50:16

that would be something that I would distinguish

0:50:16 - 0:50:19

something as a DCI project as,

0:50:19 - 0:50:20

as if there's,

0:50:20 - 0:50:22

if it's not just one person doing everything,

0:50:23 - 0:50:26

but you're sort of rewarding people with token rewards

0:50:26 - 0:50:28

and sort of changing the incentives around.

0:50:31 - 0:50:32

Yeah, yeah.

0:50:32 - 0:50:32

Thank you, Waterbear.

0:50:33 - 0:50:34

Yeah, good to interact with you as well.

0:50:34 - 0:50:36

This is Michael also.

0:50:36 - 0:50:38

I do DCI NYC.

0:50:38 - 0:50:39

That's how we've interacted previously.

0:50:39 - 0:50:40

Oh, okay, great.

0:50:41 - 0:50:41

Wow, okay.

0:50:42 - 0:50:42

Well, awesome.

0:50:43 - 0:50:43

Even better.

0:50:44 - 0:50:46

Cool, yeah. Yeah. So yeah, I think

0:50:46 - 0:50:53

that's a good way to look at it for me. And that's even actually in the research contract

0:50:53 - 0:50:57

that was signed when the minting happened was that there's a delineation between,

0:50:57 - 0:51:08

I think I listed it as five potential pieces of the equation, which is there's the data, there's the model, there's sort of the

0:51:08 - 0:51:14

intangible intellectual property about how the data is kind of collected and stored even.

0:51:15 - 0:51:19

You know, there's like a question of like the methodology of how it's going to be encrypted

0:51:19 - 0:51:25

and abstracted so that it's protecting patient data. There's also a piece of that too.

0:51:26 - 0:51:27

So yeah, I think I like that.

0:51:27 - 0:51:29

That's probably the word I'm looking for

0:51:29 - 0:51:31

is kind of the chunking of different components

0:51:31 - 0:51:36

of all pointed to by that one singular token.

0:51:37 - 0:51:38

So I like that kind of thing, that framing.

0:51:38 - 0:51:39

So thank you.

0:51:42 - 0:51:50

I guess to answer an earlier question about synthetic data, I believe this can be a natural

0:51:50 - 0:51:55

sort of fit with what I was saying about baking in the laws of physics.

0:51:55 - 0:51:56

There's multiple ways to do that.

0:51:56 - 0:52:00

You can enforce it at the level of an architecture and, for example, a convolutional neural network

0:52:00 - 0:52:06

is a classic example of something that embeds sort of spatial invariance into a model's understanding of the world.

0:52:07 - 0:52:13

And there are ways that you can demonstratively do the same with perhaps more deep physics principles or scientific principles.

0:52:13 - 0:52:20

But another way you can do it as well is you can just give it loads and loads of samples of data, synthetic data, experimental data, doesn't really matter.

0:52:20 - 0:52:25

And of course, eventually it will condense that representation somewhere within the model's

0:52:25 - 0:52:29

weights or whatever the architecture you're using is. And in fact, this is something I did for

0:52:29 - 0:52:37

several years before working with Will on sort of decentralized intelligence. I was working on

0:52:37 - 0:52:42

state of the art, basically building AI models for state of the art semiconductor manufacturing.

0:52:42 - 0:52:49

And what we found is for these atomic level precise manufacturing processes,

0:52:49 - 0:52:52

where you're basically depositing a layer of atoms at a time onto a crystalline

0:52:52 - 0:52:57

structure to be used as a three, five wafer to use a little bit of the technical

0:52:57 - 0:52:58

language of it.

0:52:58 - 0:53:02

We found that we could actually just show it lots and lots and lots of examples

0:53:02 - 0:53:06

of various thermodynamic phenomena and a machine learning model would learn.

0:53:06 - 0:53:09

So I think that synthetic data is a lot more convenient

0:53:09 - 0:53:11

and a lot less invasive, and frankly, it's a lot easier.

0:53:11 - 0:53:15

It's a lot more passive than designing bespoke architectures

0:53:15 - 0:53:17

that capture the particular laws

0:53:17 - 0:53:19

or whatever it is that you need.

0:53:19 - 0:53:29

So I'm pro-synthetic data, let it be known.

0:53:32 - 0:53:32

Yeah, I mean, I think it works up until a point,

0:53:38 - 0:53:41

and we're not at that point yet to know where it sort of cuts off. At some point, you have to...

0:53:45 - 0:53:46

adesium Adam or something.

0:53:47 - 0:53:48

Something super, super...

0:53:48 - 0:53:50

Like, what is the basic build?

0:53:50 - 0:53:53

I don't know that we have the knowledge to figure out, like,

0:53:53 - 0:53:56

what is the basic building block enough to build up everything else.

0:53:56 - 0:53:57

But, I mean, it is possible.

0:53:57 - 0:54:00

But this is sort of an engineering...

0:54:00 - 0:54:02

A very, very, very difficult engineering question now.

0:54:03 - 0:54:08

I think you'll probably slowly ascend the terms in the Taylor series expansion.

0:54:09 - 0:54:13

That's what I think your synthetic data is going to do. You're going to slowly get some model of

0:54:13 - 0:54:19

the first term, and then the second, and then the third. But I mean, sure, it's only as rich as the

0:54:19 - 0:54:23

data that you teach it with. And that's under the assumption that the model is sort of infinitely

0:54:23 - 0:54:27

responsive to the data it's given, which is not always true either. There's always biases in there.

0:54:27 - 0:54:34

But look, I'm very pragmatic with these things. I'm not a theorist, if you can't tell. And so

0:54:34 - 0:54:38

for that reason, I just sort of need to specify how accurate my model needs to be to be useful.

0:54:39 - 0:54:43

And I'm quite happy to just gather that many terms in perturbation theory and then get on

0:54:40 - 0:54:44

happy to just gather that many terms in perturbation theory and then get on with my day job guys

0:54:43 - 0:54:44

with my day job, guys.

0:54:44 - 0:54:53

man uh really interesting stuff and i i just love the point too that there

0:54:53 - 0:55:00

there really are quite a few layers from the data to the model to the processes and operations around

0:55:00 - 0:55:06

you know getting and and working with the data. And I always myself like to remember that

0:55:06 - 0:55:15

the DSI movement is kind of a movement defined or pursued relative to what exists currently.

0:55:15 - 0:55:21

I mean, there always have been decentralized projects in science to different degrees,

0:55:22 - 0:55:26

and decentralization can happen on a spectrum and so i think that for

0:55:26 - 0:55:32

me the heart of the movement is a recognition that science has gotten to a place where it

0:55:32 - 0:55:38

errs on the side of centralization too often and that yeah the core heart of de-sci is sort of like

0:55:38 - 0:55:51

being a counterweight to kind of help us you, find the right configuration for the right problem. And Sterling, I do just love that you're working on kind of cognitive health, because like,

0:55:51 - 0:55:56

A, I think that's such an important and under addressed area, particularly in terms of patient

0:55:56 - 0:56:02

impact. But also, I think some of the most interesting decentralized data work is being

0:56:02 - 0:56:06

done in cognitive science. I recently have been doing

0:56:06 - 0:56:12

a little bit of multimodal training with the human connectome data set. And the human connectome

0:56:12 - 0:56:16

data set is something you might be able to say more about than me, but it's sort of

0:56:16 - 0:56:23

translationally integrated brain scans with the kind of cognitive and behavioral inventories for

0:56:23 - 0:56:25

a number of patients over a time series. And so you can actually kind of cognitive and behavioral inventories for a number of patients over a

0:56:25 - 0:56:31

time series. And so you can actually kind of start to connect the dots or the pixels, so to speak,

0:56:31 - 0:56:37

between what's happening physically in the brain and what's happening cognitively and behaviorally

0:56:37 - 0:56:44

in the patient. And again, just sort of no silver bullet, no single answer for like, what is DCI

0:56:44 - 0:56:45

doing or what should it do.

0:56:46 - 0:56:55

But kind of an interesting example, right, that the Human Connectome Project does have labs from all over the world working together to pool this data.

0:56:55 - 0:57:06

But I would maybe mention that they're all doing it with a very rigid schema and, you know, with the very sort of like similar cognitive and behavioral metrics.

0:57:06 - 0:57:15

And so, you know, even in a decentralized science project, like there's maybe some room to open it up to researchers to say like, hey, what kind of data would be valuable for you guys?

0:57:15 - 0:57:19

What kind of metrics like would be most valuable to to integrate?

0:57:20 - 0:57:26

But anyway, that's just my two cents and high level, though, just very grateful for the work you're doing, Sterling.

0:57:26 - 0:57:32

And, you know, I think it's remarkable to like how much of our cognition is carried in our voice.

0:57:32 - 0:57:42

I think maybe you could even tell us some more about this, but I think some of the most reliable early warning indicators for dementia are you can hear in the voice, right?

0:57:46 - 0:57:48

Yeah, actually.

0:57:56 - 0:58:02

My computer's acting up. Yeah. So yeah, voice is like a really good biomarker for a lot of the development of neurocognitive disorders. And it's actually surprising that it's really not even

0:58:02 - 0:58:05

measured in studies. And I think the reason is because it's really not even measured in studies.

0:58:08 - 0:58:09

And I think the reason is because it's been too subjective.

0:58:14 - 0:58:18

There are doctors, really good doctors can listen to a patient on the phone and immediately determine there's something, there's early warning signs for dementia.

0:58:18 - 0:58:25

But it's been impossible to standardize that until now AI has gotten so good that you can actually develop a

0:58:25 - 0:58:35

model that can learn how a doctor is able to hear the problems. And so that's what our objective is

0:58:35 - 0:58:41

with this particular study, is to have authenticated dementia patients speaking

0:58:41 - 0:58:46

into a standard recording device, probably an iPhone,

0:58:47 - 0:58:55

and then really running hardcore AI models on that and saying, okay, this, like, you know,

0:58:55 - 0:59:01

really developing that as a, could become a really good biomarker that's used across all,

0:59:01 - 0:59:06

that's used for years to come. So that's the hope with that study.

0:59:06 - 0:59:12

Oh, man, I just love it. And then you imagine with these new multimodal models that are coming out,

0:59:12 - 0:59:19

like if you had, you know, features from audio, from from these behavioral and clinical inventories,

0:59:19 - 0:59:24

and then also the patient's brain scans. And then again, you kind of go like, oh, well,

0:59:24 - 0:59:26

this is where D-Sci could be

0:59:26 - 0:59:31

doing really interesting stuff, right? Because if in an imagined future, every experimental

0:59:31 - 0:59:36

protocol and methodology, including the human connectome project is on chain, then it's like,

0:59:36 - 0:59:44

you know, oh, pull requests to add an audio data layer to the contribution spec. Oh, anyway, man,

0:59:44 - 0:59:47

that's just so exciting. And I hope you'll kind of,

0:59:47 - 0:59:52

you know, share information about that work as you go, because I think that could be incredibly

0:59:52 - 0:59:59

high impact. Yeah, I want to kind of keep my project as much of an open book as possible.

1:00:00 - 1:00:09

So I do try to share as many of the milestones as we hit as we go, because I'm learning this as much as every one of you are as well.

1:00:09 - 1:00:13

And I think I don't want to be the only person who who does something novel.

1:00:14 - 1:00:19

I think academia is rife for disruption big time.

1:00:20 - 1:00:22

So I'm here with all with all of you.

1:00:23 - 1:00:23

So, yeah.

1:00:25 - 1:00:26

Completely. I like it with all of you. So, yeah. Completely.

1:00:26 - 1:00:27

Sounds super cool.

1:00:28 - 1:00:29

I like it a lot as well.

1:00:30 - 1:00:31

I'm a big fan of Sterling.

1:00:33 - 1:00:33

Yeah, it's fun.

1:00:34 - 1:00:45

One quick thing I'll say is it's funny because when I was a kid, my mom stole me away from school to go chase whales in Canada to go listen to them with a hydrophone.

1:00:45 - 1:00:50

So she was obsessed with trying to understand how whales communicate. And I was like, why?

1:00:51 - 1:00:56

I was like, I'd rather be playing PlayStation, Twisted Metal with my friends. And now as I'm

1:00:56 - 1:01:02

older, I'm like, oh my God, like I got, like what a gift that I got to go do that. And now

1:01:02 - 1:01:07

I get to use that in my work today. It's pretty, it feels right for me to do this kind of.

1:01:08 - 1:01:08

So yeah.

1:01:09 - 1:01:10

It's so funny you say that.

1:01:11 - 1:01:12

And I know we're just hitting the time,

1:01:12 - 1:01:15

but over at New Atlantis Labs,

1:01:15 - 1:01:17

just started working with a really incredible

1:01:17 - 1:01:18

machine learning engineer,

1:01:18 - 1:01:21

who is actually where I know most of the stuff

1:01:21 - 1:01:24

about audio diagnosis of cognitive function.

1:01:24 - 1:01:33

And then, yeah, we've recently been applying some of the same tools that he used to featurize audio data on WhaleSong.

1:01:34 - 1:01:41

And I did some work maybe three or four years ago building transformers around crow vocalizations.

1:01:40 - 1:01:48

crow vocalizations and we had some really good results and i think that the whale language

1:01:41 - 1:01:44

And we had some really good results.

1:01:48 - 1:01:53

problem is is several orders of magnitude more complicated because even just understanding

1:01:53 - 1:02:02

you know the features uh because they they're so um you know acoustically precise like these

1:02:02 - 1:02:06

are creatures that can kind of like sense the exact dimensions

1:02:06 - 1:02:12

of a coke bottle from 300 yards with sonar like the sort of intricacy of their voices is

1:02:12 - 1:02:17

mathematically just even hard to kind of wrap your head around from a spectrographic perspective

1:02:17 - 1:02:24

like kind of just looking at the graph of the audio wave um but yeah i actually do have a feeling

1:02:24 - 1:02:25

that sort of within the next couple years,

1:02:25 - 1:02:31

we actually will get some serious advancement in understanding whale language. It's very exciting.

1:02:34 - 1:02:45

Amazing. So many exciting advances across all of science and all of the incredible work everyone here is doing and within the broader

1:02:45 - 1:02:54

de-sized scientific AI compute landscapes as well. Wanted to say thank you so much to the

1:02:54 - 1:03:02

Macrocosmos team for joining in on this space today and sharing a bit about the protein folding

1:03:02 - 1:03:07

subnet they recently launched on BitTensor to Michael and

1:03:07 - 1:03:14

the work he's doing at WaterBear specifically on sleep data and making that more accessible

1:03:14 - 1:03:21

and understandable to the individual person, as well as allowing understanding at a more

1:03:21 - 1:03:25

collective level. And then Stanley, of course, with all of the work

1:03:25 - 1:03:32

you're doing across rare diseases, the hackathon, from a compute perspective at LilyPad,

1:03:33 - 1:03:42

and just other involvements across protein space as well. So thank you to everyone for tuning in.

1:03:42 - 1:03:47

Sterling would love to dive into some of the work you're doing in future spaces too.

1:03:47 - 1:03:52

Personally, I would love to have one on neuro-related things and the brain.

1:03:52 - 1:03:55

So I think that's definitely in store.

1:03:55 - 1:04:00

With that, we have a space every Wednesday at 4 p.m. UTC.

1:04:00 - 1:04:05

So looking forward to seeing everyone back next week and invite your friends.

1:04:06 - 1:04:15

If you have a topic you would like to be a theme for a week or you know someone else who would be quite interesting, please have them reach out as well.

1:04:16 - 1:04:22

I'm Erin McGinnis and the listeners, Merrick or the D-Sign Mike accounts are all great ways to get connected.

1:04:23 - 1:04:25

Thanks for joining us, everyone.

1:04:25 - 1:04:26

Have a great week.

Full Transcription

Host

Speaker