Glossary – CNTR Monitor

a
AI agent
A software program that is able to interact with its environment, collect data, and use it to perform tasks with specific goals. While the human determines the goals, AI decides autonomously on the measures required to achieve these goals.

AI model
The finished result of the learning phase of an AI algorithm is referred to as a model and contains the relationships learned from the information used in the learning process. However, this knowledge is not available in the form of explicitly deterministic if/then rules, but is implicitly contained in the data structure of the AI algorithm and its internal links. Such an AI algorithm is therefore regarded as a black box whose internal data processing cannot be understood without considerable additional technical effort (known as explainability). Models are either completely learned and ready for productive use or, in the form of what are referred to as “foundation models,” are trained for certain capabilities but still need to be optimized for a specific application context through further learning phases.

AI model weight
The stored knowledge that a model has acquired. Technically, these weights are numerical values that the model uses during computations to respond to queries. More importantly, however, these weights summarize everything the model has learned and all the information it possesses.

Application Programming Interface (API)
A set of functions and protocols that enable software applications to communicate with each other to exchange data, properties, and functions.

Artificial intelligence (AI) and machine learning (ML)
Artificial intelligence refers to methods from a subarea of computer science that pursue the approach of reproducing the human representation of knowledge or human reasoning on the basis of known facts in computer algorithms and calculations. The best-known form of artificial intelligence is machine learning (ML), in which the rules of data processing—i.e., the generation of a program output based on an input—are not predefined from the outset by the program code, but are learned by the computer itself through a step-by-step process of training. Other AI approaches, such as symbolic AI, decision trees, and evolutionary algorithms, mainly feature in specific applications. Artificial intelligence generally consists of a defined algorithm that specifies how information is processed for learning and a data structure in which the learned knowledge and the identified knowledge contexts are stored.

Artificial neural networks (ANN) and deep learning (DL)
In artificial neural networks, the functioning of human neuronal cells and their interactions are simulated. Regardless of the actual purpose, the basic technical principle of ANNs is based on a data structure for storing and processing information, similar to the systems used for cloud computing. This data structure is taught using extensive data tailored to the problem-solving capability to be achieved in what is referred to as a training process. The best-known and most important learning method is deep learning (DL). In this process, the structure is incrementally changed until the model, i.e., the finished structure, generates the desired output for an input. This mechanism is capable of processing extensive data and learning complex data patterns and knowledge contexts. Artificial neural networks and deep learning are currently among the most important forms of AI.
b
Biological design tools (BDT)
Rather than being trained on natural language, a bio-specific large language model is trained on amino acid sequences or other biological sequences, enabling it to generate biological sequences as outputs. These tools can learn the favorable properties of biomolecules and suggest promising candidates for laboratory testing, reducing the number of tests needed to identify desirable properties. For instance, UniRep assists researchers in engineering proteins based on their function, while ESMFold focuses on structure, both aiding in more efficient design of better therapies and improving protein engineering for more efficient biomanufacturing. These tools are trained on biological data and support the design of new proteins or other biological agents. Examples include RFDiffusion, as well as protein language models such as ProGen2. These have the potential to enable advances in protein engineering and design to solve crucial problems for human health and the environment.

Bystander tumor cells
Neighboring tumor cells.
c
Chimeric
Cells with different genes than the rest of the organ or a tissue.

CRISPR (Clustered regularly interspaced palindromic repeats)
Technology that allows parts of the genome to be edited by removing, adding, or altering sections of the DNA sequence. It involves two essential components: a guide RNA to match the target gene and usually CRISPR-associated protein 9 (Cas9)—an endonuclease (enzyme cleaving the DNA backbone) which causes a double-stranded DNA break, allowing modifications to the genome.

Culture-based microbial techniques
Growing microbial organisms by allowing them to reproduce in a predefined culture medium under controlled laboratory conditions. Not all microorganisms grow under such conditions.
d
Data and training (AI)
A current-generation AI is trained with the help of data that have to be tailored to the problem-solving capability to be achieved. These data can be in the form of text, images, videos, or other information. Some of these data are used in the so-called training process, in which the data are incrementally processed by AI, which leads to an adaptation of the application’s data storage structure. The rest of the data are then used to check the results of the training process. Due to the importance of the data for the subsequent results of the AI process, the quality and availability of training data plays a significant role. These data are often compiled by specialized companies, curated by hand, and offered on the market as an economic good.

Decision trees
Decision trees are a type of data structures for storing knowledge in which a tree structure is used as a model to draw conclusions from the observations contained in the training dataset. Learned rules are represented by nodes and branches of the tree and conclusions by its leaves. After training, the model can also be used to explicitly and graphically represent the rules that lead to a decision.

DNA polymerase
An enzyme that synthesizes longer chains of DNA molecules.

DNA primers (for PCR)
Short segments of DNA designed to be complementary to the beginning and end of the target sequence in the sample.
e
Evolutionary algorithms
Evolutionary algorithms are a variant of what are known as genetic algorithms, i.e., optimization methods based on the principles of natural evolution. With their help, solutions can be optimized on the basis of stochastic processes by emulating mechanisms of natural evolution, such as selection, mutation, crossbreeding, and “survival of the fittest.” Evolutionary algorithms also generate a solution to a problem from within the system rather than predetermining it in a deterministic manner.
g
Genomics
The study of the entire set of genes in the genome of a cell. Genetics is the study of individual genes, whereas genomics is the study of the entire genome (all of an organism’s genes), interactions among genes, and the way in which the environment affects them.

GPU and AI chips
The graphic processing unit (GPU) is a specialized computing chip that was originally developed to accelerate computer graphics and image processing in PCs, smartphones, and games consoles. Due to their ability to calculate algorithms in massive parallel processing, GPUs are also suitable for non-graphical applications such as training neural networks and mining cryptocurrencies. In contrast to the central processing unit (CPU), which can be understood as the control center of a computer, GPUs are generally optimized for continuous operation under full load and generate the commensurate electrical power of several hundred watts, with the corresponding power and cooling requirements. Due to the massive boom, GPUs are also increasingly being optimized specifically for AI applications and produced as complete device units that can be interconnected in hundreds or thousands in specialized data centers. Such device units can achieve continuous power consumption of more than 1,000 watts, which has a significant impact on the power supply and cooling requirements of data centers.
l
Lipidomics
The study of lipid composition of biological samples. Microbes contain a number of characteristic lipids and lipoproteins, aiding in the rapid detection of pathogens using mass spectrometry systems.

LLM, LMM, and AGI
Depending on the type of training data used and the form of possible user interaction of the AI algorithm, a distinction is made between different types. In large language models (LLM), text data are used for training and output. Users therefore chat with the AI algorithm. In large multimodal models (LMM), image, video, and audio data are also used in training and for interaction between AI and the user. In this case, AI is therefore able to process and produce different media. However, given the speed of technological progress, these boundaries are fluid, depending on the requirements of the field of application. The next big step that tech companies are working on is the vision of what has been dubbed artificial general intelligence (AGI), which is no longer optimized to solve a specific problem, but should be a highly flexible artificial system that is equivalent or superior to human cognitive abilities in all areas. An AGI model should be able to adapt to problems and develop solutions without having been specifically trained for them.
m
Mass spectrometry (MS)
An analytical tool used for measuring the mass-to-charge ratio (the ratio of an ion’s mass to its charge) of one or more molecules present in a sample. It does this by ionizing the material in the ionization source and separating the resulting ions in the analyzer according to mass-to-charge ratios. Several different technologies are available for both ionization and ion analysis, resulting in many different types of mass spectrometers with different combinations of these two processes.

Metagenomics
Metagenomic sequencing involves the sequencing of all microbial and host nucleic acids in a sample, without prior selection.
n
Next-generation sequencing (NGS)
A method of analyzing genetic material that allows for the rapid sequencing of large amounts of DNA or RNA. Compared to traditional sequencing techniques (e.g., Sanger sequencing), NGS can simultaneously sequence millions of small fragments of DNA.

NGS/sequencing library
Pools of DNA fragments containing adapter sequences compatible with a specific sequencing platform and indexing barcodes for individual sample identification.

Nucleic acid panel
A diagnostic test that simultaneously examines multiple nucleic acid sequences (DNA or RNA) to detect and identify various organisms or genetic markers within a single sample. This type of test is often used in medical diagnostics to quickly identify the presence of multiple pathogens, such as viruses, bacteria, or fungi, or to assess genetic variations related to specific diseases or conditions. Nucleic acid panels provide comprehensive information that can aid in accurate diagnosis, treatment planning, and monitoring of diseases. Unlike with metagenomic analysis, previous knowledge of the (microbial) agent is required.

Nucleic acids
RNA or DNA, a longer molecule made from nucleotides.

Nucleotide
The basic building block of nucleic acids (a monomer).
p
Pathogen agnosticism
A diagnostic sequencing or detection method that can detect any pathogen.

Peptide mass fingerprinting (PMF) or protein fingerprinting
An analytical technique for protein identification. The protein is first cleaved into smaller peptides, whose absolute masses are then measured with a mass spectrometer such as MALDI-TOF.

Polymerase chain reaction (PCR)
A laboratory technique for producing (or amplifying) millions to billions of copies of a specific segment of DNA, which can then be studied in greater detail. For this, PCR primers are needed. These are short, single-stranded segments of DNA that are designed to be complementary to the beginning and end of the target sequence to be amplified.

Proteomics
The study of the entire set of proteins produced by the cell and their location. This can be used to create a 3D protein map of the cell, providing information about protein regulation.
r
Red teaming (in the AI context)
An interactive method used to test AI models, with the aim of preventing harmful behavior such as the leakage of sensitive data and the generation of toxic, biased, or factually inaccurate content.
s
Shotgun sequencing
A method used to randomly break up DNA sequences into lots of small pieces, sequencing these individually, and then reassembling the sequence by looking for regions of overlap.

Single nucleotide polymorphism (SNP)
A variation, between individual organisms, in a single nucleotide that occurs at a specific position in a DNA sequence.

Stroma
Part of a tissue or organ with a structural and/or connective role that is made up of connective tissue, blood vessels, lymphatic vessels, and nerves.

Symbolic AI
Symbolic AI is a top-down approach. It can be used if a fixed and fully known set of rules is available. Using mathematical logic, new knowledge can be generated from the specifications. This is why it is also referred to as “knowledge-based AI.” However, symbolic AI reaches its limits when humans cannot feed it with correct and consistent knowledge. In addition, symbolic AI models cannot process very large state spaces, such as in the example of the board game Go, where there are around 10^170 possible constellations, and therefore cannot be used in such cases.

Synthetic biology
A field of science that involves redesigning organisms, or the biomolecules of organisms, at the genetic level to give them new characteristics.
t
Transgene product
Cell or organisms whose genome has been modified by introducing foreign DNA sequence(s).
v
Viral assembly
During the replication of the virus, proteins assemble around the viral nucleic acid, in the end forming a capsid.

Viral capsid library
Genomic library used to create viral capsids (the protein cell of the virus).

Viral titer
Concentration of virus; number of virus particles capable to invade a host cell.

Viral vector
Modified virus designed to deliver genetic material to the cells.
w
Whole Genome Sequencing (WGS)
The process of determining (almost) the entirety of the DNA sequence of an organism’s genome.

Wire Arc Additive Manufacturing
Wire Arc Additive Manufacturing combines the technoloy of Gas Metal Arc Welding with the process of Additive Manufacturing. In simple terms, layers of metal are welded on top of each other by a robot to realize the intended design.
x
XAI – Explainable Artificial Intelligence
Explainable artificial intelligence approaches are intended to counteract the “black box” tendency of machine learning, i.e., the fact that it is not clear why an AI algorithm has reached a certain decision. Although it is technically possible to monitor the internal processing of a query within the model of an AI, no deterministic conclusions can be drawn about the actual reasoning process. It is therefore impossible to explain AI decisions in terms of “for an input (a), the result (b) was generated on the basis of the learned facts (X) and (Y)”, as it is common in human communication. Explainable artificial intelligence approaches are intended to make these chains of reasoning visible as an extension of an AI model.

a

b

c

d

e

g

l

m

n

p

r

s

t

v

w

x