Seen above is a network developed from the microbial communities of fecal samples from patients labeled as "Obese" and "Lean". There have been a lot of studies which have measured what's called the "beta diversity" to show that dissimilarity exists between the microbial communities of obese individuals, and I was curious about the conclusions one could draw from the perspective of a network.
The blue nodes are samples that are "Lean", while the red nodes are samples labeled "Obese". The distal pink nodes are what are called "Operational Taxonomic Units" which represent microorganisms, and classifies groups of closely related individuals. The nodes are sized based on their degree, which represents the frequency that connections are made to the node. The purple lines are transparent based on a function of their edge weight, and the animation fades between the selection of "Obese" samples and their first neighbor in the network only, and the respective "Lean" view.
The network is configured using an "Edge-weighted Spring-Embedded Layout"; from the Cytoscape webpage, the spring-embedded layout is based on a “force-directed” paradigm as implemented by Kamada and Kawai (1988). Network nodes are treated like physical objects that repel each other, such as electrons. The connections between nodes are treated like metal springs attached to the pair of nodes. These springs repel or attract their end points according to a force function. The layout algorithm sets the positions of the nodes in a way that minimizes the sum of forces in the network.
The seemingly "explosive" nature of the visualization with respect to "Obese" samples translates to more connections in the network being made to a more diverse community of microorganisms, and supports the notion that obese samples have a more diverse gut microbiome when compared to lean samples. What's interesting is that the orientation is preserved, so shared OTU nodes are easily spotted, while the large increase in nodes for "Obese" samples can be readily observed. These shared OTUs are representative of the core shared microbiome.
Background
As a part of my Computational Biology course at the University of Washington, I was tasked with creating a statistically-backed visualization of a biological process or simulation. I had previously done an exploratory analysis of Jeff Gordon's A Core Gut Microbiome of Obese and Lean Twins, and I was interested in whether there were measurable and visual differences in the network developed from microbial communities of lean and obese twins. In my previous exploratory analysis, I had measured the dissimilarity of the sample operational taxonomic unit using a weighted-unifrac metric, which resulted in a measurable dissimilarity of the beta diversity of obese samples, which is pretty interesting to me! Jeff Gordon's study drew three core conclusions, which directed this project:
- Wide array of shared genes; there exists a core microbiome at the gene level.
- Obesity is associated with phylum-level changes in the microbiota.
- Deviations from this core microbiome are associate with physiological states.
Visualization
A link to the specific data can be found here:
The following preprocessing was done using Qiime2 Python scripts:
- Rarify the table for increased accuracy.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
single_rarefaction.py | |
-i 'sample.biom' | |
-o 'sample_1000.biom' | |
-d 1000 |
- Filter any samples that you are not analyzing. Here, we do not want 'Overweight' samples.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
filter_samples_from_otu_table.py | |
-i 'sample_1000.biom' | |
-m 'mapping_file.txt' | |
-o 'sample1000_filtered.biom' | |
--output_mapping_fp 'mapping_file_filtered.txt' | |
-s 'obesitycat:*,!Overweight' |
- Make the otu network with the filtered biom and mapping file, here we wanted properties based on "obesitycat"
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
make_otu_network.py | |
-i 'sample_1000_filtered.biom' | |
-m mapping_file_filtered.txt | |
-o otu_network_filtered | |
-b "obesitycat" |
Statistics & Python Script
The following slides are a simple test case, designed to explain the basic functions of the script.
Running the script on the lean-obese data display that the degree for OTU nodes that are associated with "Obese samples only" significantly exceeds that of "Lean-only" samples. This translates to a greater diversity of the Obese sample microbial communities, of which are outside the core microbiome.
Category | Min | Q1 | Mean | Median | Q3 | Max | StdDev |
---|---|---|---|---|---|---|---|
Lean | 206.103 | 248.114 | 271.906 | 271.788 | 296.252 | 344.526 | 35.574 |
Obese | 163.965 | 246.257 | 275.77 | 280.659 | 308.187 | 363.067 | 46.73 |
OTU_LeanOnly | 1.0 | 1.00 | 1.435 | 1.01 | 1.809 | 4.185 | 0.759 |
OTU_ObeseOnly | 1.0 | 1.003 | 2.732 | 1.644 | 3.016 | 16.614 | 3.154 |
OTU_Both | 2.067 | 6.385 | 24.117 | 14.049 | 31.4 | 119.475 | 26.959 |
There is a wider distribution of "Obese" samples, when compared to those that are "Lean", which is representative of a more diverse gut microbiome. The core OTU nodes represent the band of shared OTUs.
OTUs associated with Lean-Only nodes have a low mean degree when compared to those of Obese-only nodes. The higher degree of Obese-only nodes supports the notion of deviations from the core microbiome being associated with physiological states; in this case, Obesity.
Expand
...>python network_analysis.py -h
usage: network_analysis.py [-h] -node NODE_FILE -edge EDGE_FILE -f FEATURE -c
CATEGORIES CATEGORIES [-o OUTPUT_FILE]
[-n N_ITERATIONS] [-v] [--version]
network_analysis.py; analyze statistics of degree comparing between two categories of feature column.
Example: network_analysis.py
-node {PATH to NODE FILE}
-edge {PATH to EDGE FILE}
[-o {PATH to OUTPUT DIRECTORY}]
-f {FEATURE COLUMN for comparison}
-c {CATEEGORY of FEATURE} {CATEGORY of FEATURE}
[-n {N_ITERATIONS for Monte Carlo Simulation}]
optional arguments:
-h, --help show this help message and exit
-node NODE_FILE, --node_file NODE_FILE
path to an input node file, output from
make_otu_network.py
-edge EDGE_FILE, --edge_file EDGE_FILE
path to an input edge file, output from
make_otu_network.py
-f FEATURE, --feature FEATURE
Name of the feature column for analysis
-c CATEGORIES CATEGORIES, --categories CATEGORIES CATEGORIES
Name of categories within the feature column for
analysis, two (2) required.
-o OUTPUT_FILE, --output_file OUTPUT_FILE
PATH to output data. Default:
.~\[feature]_network_analysis.txt
-n N_ITERATIONS, --n_iterations N_ITERATIONS
Number of iterations for the analysis, will take
samples for n iterations. Default:1000
-v, --verbose display verbose output while program runs.
Default:True
--version display version number and exit
This script will analyze statistics between two categories of a feature column in a node table.
Returns output text file with statistics for the degree of each category, and otus associated with the
respective categories, as well as both. Accuracy of the statistics can be controlled with n_iterations.
Rationale
---------
Comparing the degree of the different categories of a feature column can display a disparity of otu frequency
in one category, or the other. This translates to a statistically significant difference between the microbial
communities with respect to the categories analyzed.
References
----------
Qiime: http://qiime.org/
Qiita: https://qiita.ucsd.edu/
Gut Microbiome Dataset: https://qiita.ucsd.edu/study/description/77
Biom-Format: http://biom-format.org/documentation/biom_format.html
Cytoscape: http://www.cytoscape.org/documentation_users.html
Make_otu_network.py: http://qiime.org/scripts/make_otu_network.html
Notes
----------
Given a BIOM and Mapping File, the following example can be used to generate the necessary node and edge files.
Requires QIIME.
Rarify the table for increased accuracy
single_rarefaction.py -i 'sample.biom' -o 'sample_1000.biom' -d 1000
Filter any samples that you are not analyzing. Here, we do not want 'Overweight' samples.
filter_samples_from_otu_table.py
-i 'sample_1000.biom' -m 'mapping_file.txt' -o 'sample1000_filtered.biom'
--output_mapping_fp 'mapping_file_filtered.txt' -s 'obesitycat:*,!Overweight'
Make the otu network with the filtered biom and mapping file, here we wanted properties based on "obesitycat"
make_otu_network.py
-i 'sample_1000_filtered.biom' -m mapping_file_filtered.txt -o otu_network_filtered -b "obesitycat"
Expand
#!/usr/bin/env python
from __future__ import division
import pandas as pd
import os
__author__ = "Samuel L. Peoples"
__credits__ = ["Dr. Jesse Zaneveld"]
__version__ = "0.0.1"
__email__ = "contact@lukepeoples.com"
__status__ = "Development"
from argparse import ArgumentParser, RawDescriptionHelpFormatter, FileType
# Documentation can be found here:https://docs.python.org/2/library/argparse.html#module-argparse
def make_commandline_interface():
"""Returns a parser for the commandline"""
short_description = \
"""
network_analysis.py; analyze statistics of degree comparing between two categories of feature column.
Example: \t network_analysis.py \n\t\t
-node {PATH to NODE FILE} \n\t\t
-edge {PATH to EDGE FILE} \n\t\t
[-o {PATH to OUTPUT DIRECTORY}] \n\t\t
-f {FEATURE COLUMN for comparison} \n\t\t
-c {CATEEGORY of FEATURE} {CATEGORY of FEATURE}\n\t\t
[-n {N_ITERATIONS for Monte Carlo Simulation}]
"""
long_description = \
"""
This script will analyze statistics between two categories of a feature column in a node table.
Returns output text file with statistics for the degree of each category, and otus associated with the
respective categories, as well as both. Accuracy of the statistics can be controlled with n_iterations.
Rationale
---------
Comparing the degree of the different categories of a feature column can display a disparity of otu frequency
in one category, or the other. This translates to a statistically significant difference between the microbial
communities with respect to the categories analyzed.
References
----------
Qiime: http://qiime.org/
Qiita: https://qiita.ucsd.edu/
Gut Microbiome Dataset: https://qiita.ucsd.edu/study/description/77
Biom-Format: http://biom-format.org/documentation/biom_format.html
Cytoscape: http://www.cytoscape.org/documentation_users.html
Make_otu_network.py: http://qiime.org/scripts/make_otu_network.html
Notes
----------
Given a BIOM and Mapping File, the following example can be used to generate the necessary node and edge files.
Requires QIIME.
Rarify the table for increased accuracy
single_rarefaction.py -i 'sample.biom' -o 'sample_1000.biom' -d 1000
Filter any samples that you are not analyzing. Here, we do not want 'Overweight' samples.
filter_samples_from_otu_table.py
-i 'sample_1000.biom' -m 'mapping_file.txt' -o 'sample1000_filtered.biom'
--output_mapping_fp 'mapping_file_filtered.txt' -s 'obesitycat:*,!Overweight'
Make the otu network with the filtered biom and mapping file, here we wanted properties based on "obesitycat"
make_otu_network.py
-i 'sample_1000_filtered.biom' -m mapping_file_filtered.txt -o otu_network_filtered -b "obesitycat"
"""
parser = ArgumentParser(description=short_description, \
epilog=long_description, formatter_class=RawDescriptionHelpFormatter)
# Required parameters
parser.add_argument('-node', '--node_file', type=str, required=True, \
help='PATH to an input NODE FILE, output from make_otu_network.py')
parser.add_argument('-edge', '--edge_file', type=str, required=True, \
help='PATH to an input EDGE FILE, output from make_otu_network.py')
parser.add_argument('-f', '--feature', type=str, required=True, \
help='Name of the FEATURE column for analysis')
parser.add_argument('-c', '--categories', type=str, nargs=2, required=True, \
help='Name of CATEGORIES within the feature column for analysis, two (2) required.')
# Optional parameters
parser.add_argument('-o', '--output_file', type=str, default='',
help='PATH to output DIRECTORY. Default: .~\[feature]_network_analysis.txt')
parser.add_argument('-n', '--n_iterations', type=int, default=1000, \
help="Number of iterations for the analysis, will take samples for n iterations. Default:%(default)s")
# Example of a 'flag option' that sets a variable to true if provided
parser.add_argument('-v', '--verbose', default=True, action='store_true', \
help="display verbose output while program runs. Default:%(default)s")
# Add version information (from the __version__ string defined at top of script
parser.add_argument('--version', action='version', version=__version__, \
help="display version number and exit")
return parser
def parse_node_table(node_file, feature, categories, verbose):
"""
Parses the node table's user_nodes degree and feature,
returns separated DataFrames based on feature categories.
:param node_file: filepath to node file
:param feature: feature column for analysis
:param categories: categories of feature column
:param verbose: verbosity
:return: DataFrame for each category containing node_disp_name, degree, and feature
"""
if verbose:
print("Parsing "+str(node_file))
# Read the node file
df = pd.read_csv(node_file, sep="\t")
# Save just user nodes
df = df[df.ntype == "user_node"]
# Reduce the node file DataFrame
df = df[["node_disp_name", "degree", feature]]
# Separate the DataFrame into the two defined categories
cat_0_table = df[df[feature] == categories[0]]
cat_1_table = df[df[feature] == categories[1]]
# Return the tables
return cat_0_table, cat_1_table
def parse_otu_node_table(node_file, edge_file, feature,verbose):
"""
Parses the otu nodes by joining the data in the edge file with the node file. Returns DataFrame for OTUs in each
category respectively, and both categories.
:param node_file: filepath to node file
:param edge_file: filepath to edge file
:param feature: feature column for analysis
:param categories: categories of the feature column
:param verbose: verbosity
:return: DataFrame containing from, to, degree, and feature for each category respectively and both categories.
"""
if verbose:
print("Parsing "+str(edge_file))
# Read the node file
node_column_list = ["node_name", "degree", feature]
df_node = pd.read_csv(node_file, sep="\t")
df_node = df_node[df_node.node_name != "NaN"]
df_node = df_node[node_column_list]
# Read the edge file
edge_column_list = ["from", "to", feature]
df_edge = pd.read_csv(edge_file, sep="\t")
df_edge = df_edge[edge_column_list]
# Wrangling to join the degree, from, to and feature columns
df_edge.rename(columns={'to': 'to'}, inplace=True)
df_edge = df_edge.sort_values(by=['to'])
df_edge['to'] = df_edge['to'].convert_objects(convert_numeric=True) # Doesn't work with to_numeric
df_node.rename(columns = {'node_name':'to'},inplace=True)
df_node = df_node.sort_values(by=['to'])
df_node['to'] = df_node['to'].convert_objects(convert_numeric=True) # Doesn't work with to_numeric
# Join the tables
df_union = df_edge.merge(df_node,how='inner', on='to')
df_union = df_union.drop([str(feature)+"_y"],axis=1)
df_union.rename(columns = {(feature+'_x'):str(feature)},inplace=True)
if verbose:
print("\nUnioned DataFrame: ")
print(df_union.head(n=10))
print("\t ...")
return df_union
def split_categories(df_union, categories, feature, verbose):
"""
Splits the unioned DF into three DFs with unique OTU nodes; cat_0 only, cat_1, only, cat_both
:param df_union: Joined dataframe
:param categories: categories of feature
:param feature: Feature column for testing
:param verbose: verbostiy
:return: cat_0_table, cat_1_table, cat_both_table
"""
# List of otu node identifiers for comparison
to_list = []
cat_0_list = []
cat_1_list = []
# Create feature lists
for row in df_union.iterrows():
if row[1][2] == categories[0]:
cat_0_list.append(row[1][1])
elif row[1][2] == categories[1]:
cat_1_list.append(row[1][1])
# Strip the lists into cat_0 only, cat_1 only, and both
set_b = set(cat_0_list) & set(cat_1_list)
for cat_0 in cat_0_list:
if cat_0 in set_b:
to_list.append(cat_0)
for cat_1 in cat_1_list:
if cat_1 in set_b:
if cat_1 not in to_list:
to_list.append(cat_1)
for item in to_list:
if item in cat_0_list:
cat_0_list.remove(item)
elif item in cat_1_list:
cat_1_list.remove(item)
# Lists for the first category's DataFrame
from_0 = []
to_0 = []
deg_0 = []
feat_0 = []
# Lists for the second category's DataFrame
from_1 = []
to_1 = []
deg_1 = []
feat_1 = []
# Lists for the otus which appear in both categories
from_b = []
to_b = []
deg_b = []
feat_b = []
u_to_list = []
u_0_list = []
u_1_list = []
# Populate separated DataFrames, reduce tables to distinct OTU nodes;
# Couldn't break into a separate function
for row in df_union.iterrows():
if row[1][1] in to_list:
if row[1][1] not in u_to_list:
u_to_list.append(row[1][1])
from_b.append(row[1]['from'])
to_b.append(row[1]['to'])
deg_b.append(row[1][feature])
feat_b.append(row[1]['degree'])
elif row[1][1] in cat_0_list:
if row[1][1] not in u_0_list:
u_0_list.append(row[1][1])
from_0.append(row[1]['from'])
to_0.append(row[1]['to'])
deg_0.append(row[1][feature])
feat_0.append(row[1]['degree'])
elif row[1][1] in cat_1_list:
if row[1][1] not in u_1_list:
u_1_list.append(row[1][1])
from_1.append(row[1]['from'])
to_1.append(row[1]['to'])
deg_1.append(row[1][feature])
feat_1.append(row[1]['degree'])
# Create the first category's DataFrame
cat_0_final = {"from": from_0, "to": to_0, feature: feat_0, "degree": deg_0}
otu_0_table = pd.DataFrame(data=cat_0_final)
otu_0_table.rename(columns={feature: 'degree', 'degree': feature}, inplace=True)
# Create the second category's DataFrame
cat_1_final = {"from": from_1, "to": to_1, feature: feat_1, "degree": deg_1}
otu_1_table = pd.DataFrame(data=cat_1_final)
otu_1_table.rename(columns={feature: 'degree', 'degree': feature}, inplace=True)
# Create the DataFrame for otus which appear in both categories
cat_both_final = {"from": from_b, "to": to_b, feature: feat_b, "degree": deg_b}
otu_both_table = pd.DataFrame(data=cat_both_final)
otu_both_table.rename(columns={feature: 'degree', 'degree': feature}, inplace=True)
if verbose:
print(categories[0] + " Only:")
print(otu_0_table.head(n=10))
print("\t\t ...")
print(categories[1] + " Only:")
print(otu_1_table.head(n=10))
print("\t\t\t ...")
print("Both " + categories[0] + " and " + categories[1] + ":")
print(otu_both_table.head(n=10))
print("\t\t\t\t ...")
return otu_0_table, otu_1_table, otu_both_table
def parse_stats(feature, categories, cat_0_table, cat_1_table, otu_0_table, otu_1_table, otu_both_table, n_iterations, output_file, verbose):
"""
Parse the statistics for each DataFrame by averaging n_iterations of random samples. Finds Min, Q1, Mean,
Median, Q3, Max, and Standard Deviation over the iterations.
:param feature: feature column for analysis
:param categories: categories of the feature column
:param cat_0_table: user_node degree DataFrame for the first category
:param cat_1_table: user_node degree DataFrame for the second category
:param otu_0_table: otu_node degree DataFrame which is associated with the first category only.
:param otu_1_table: otu_node degree DataFrame which is associated with the second category only.
:param otu_both_table: otu_node degree DataFrame which is associated with both categories.
:param n_iterations: number of iterations for the analysis
:param output_file: output file location, appends with the feature category and network_analysis.txt
ex: C:/.../data/output/feature_network_analysis.txt
:param verbose: verbosity
"""
v_string = "Processing statistics for "+categories[0]+" nodes, for "+str(n_iterations)+" iterations, with samples of 40."
# Parse the stats for the first category
stats_0 = individual_stats(cat_0_table, n_iterations, verbose, v_string)
v_string = "Processing statistics for "+categories[1]+" nodes, for "+str(n_iterations)+" iterations, with samples of 40."
# Parse the stats for the second category
stats_1 = individual_stats(cat_1_table, n_iterations, verbose, v_string)
v_string = "Processing statistics for otu nodes connected to " + categories[0] + " only, for " + str(
n_iterations) + " iterations, with samples of 40."
# Parse the stats for the otus associated with the first category only
stats_otu_0 = individual_stats(otu_0_table, n_iterations, verbose, v_string)
v_string = "Processing statistics for otu nodes connected to " + categories[1] + " only, for " + str(
n_iterations) + " iterations, with samples of 40."
# Parse the stats for the otus associated with the second category only
stats_otu_1 = individual_stats(otu_1_table, n_iterations, verbose, v_string)
v_string = "Processing statistics for otu nodes connected to both " + categories[0] + " and " + categories[
1] + ", for " + str(n_iterations) + " iterations, with samples of 40."
# Parse the stats for the otus associated with both categories
stats_otu_b = individual_stats(otu_both_table, n_iterations, verbose, v_string)
# Save the stats to the output file location
outfile = open(output_file+"/"+str(feature)+"_network_analysis.txt",'w')
outfile.write(categories[0] + ":\n" + stats_0+"\n")
outfile.write(categories[1] + ":\n" + stats_1+"\n")
outfile.write(categories[0] + "Only :\n" + stats_otu_0+"\n")
outfile.write(categories[1] + "Only :\n" + stats_otu_1+"\n")
outfile.write("Both " + categories[0] + " and " + categories[1] + ":\n" + stats_otu_b+"\n")
outfile.close()
#Print the stats
if verbose:
print("Statistics:")
print(categories[0] + ":\n" + stats_0)
print(categories[1] + ":\n" + stats_1)
print(categories[0] + "Only :\n" + stats_otu_0)
print(categories[1] + "Only :\n" + stats_otu_1)
print("Both "+ categories[0] + " and " + categories[1] + ":\n" + stats_otu_b)
print("Output saved to: "+output_file+"/"+str(feature)+"_network_analysis.txt")
def individual_stats(table, n_iterations, verbose, v_string):
"""
Individual stats for each table passed in
:param table: DataFrame with column labeled 'degree'
:param n_iterations: number of iterations for the test
:return: string containing statistics; Min, Q1, Mean, Median, Q3, Max, Std_Dev
"""
if verbose:
print(v_string)
# Define lists for the stats
minimum = []
q1 = []
mean_val = []
median_val = []
q3 = []
maximum = []
std_dev = []
# Save the original table for sampling
orig = table
for i in range(n_iterations):
# Take a sample
table = orig.sample(n=40,replace=True)
# Append the lists
minimum.append(table.degree.min())
q1.append(table.degree.quantile(.25))
mean_val.append(table.degree.mean())
median_val.append(table.degree.quantile(.5))
q3.append(table.degree.quantile(.75))
maximum.append(table.degree.max())
std_dev.append(table.degree.std())
# Create a DataFrame of Stats
d = {'minimum': minimum, 'q1': q1, 'mean_val': mean_val, 'median_val': median_val, 'q3': q3, 'maximum': maximum,
'std_dev': std_dev}
df = pd.DataFrame(data=d)
# Build the stats string to return
stats = ("\t Min: " + str(round(df.minimum.mean(), 3))
+ "\t 1Q: " + str(round(df.q1.mean(), 3))
+ "\t Mean: " + str(round(df.mean_val.mean(), 3))
+ "\t Median: " + str(round(df.median_val.mean(), 3))
+ "\t 3Q: " + str(round(df.q3.mean(), 3))
+ "\t Max: " + str(round(df.maximum.mean(), 3))
+ "\t Std: " + str(round(df.std_dev.mean(), 3)))
return stats
def main():
"""Main function"""
parser = make_commandline_interface()
args = parser.parse_args()
node_file = args.node_file
if not os.path.isfile(node_file):
print(node_file+" not found. Please verify location.")
exit(0)
edge_file = args.edge_file
if not os.path.isfile(edge_file):
print(edge_file+" not found. Please verify location.")
exit(0)
output_file = args.output_file
if not os.path.isdir(output_file):
print(node_file+" not found. Please verify location.")
exit(0)
feature = args.feature
categories = args.categories
n_iterations = args.n_iterations
verbose = args.verbose
if verbose:
print("network_analysis.py")
print("\t Node file:", node_file)
print("\t Edge file:", edge_file)
print("\t Output filepath:", output_file)
print("\t Feature: ", feature)
print("\t Categories: ", categories)
print("\t n_iterations: ", n_iterations)
cat_0_table, cat_1_table = parse_node_table(node_file, feature, categories, verbose)
df_union = parse_otu_node_table(node_file, edge_file, feature, verbose)
otu_0_table, otu_1_table, otu_both_table = split_categories(df_union, categories, feature, verbose)
parse_stats(feature, categories, cat_0_table, cat_1_table, otu_0_table,
otu_1_table, otu_both_table, n_iterations, output_file, verbose)
if __name__ == "__main__":
main()
Expand
...>python network_analysis.py -node "~./data/filtered_data/otu_network_filtered/real_node_table.txt" -edge "~./data/filtered_data/otu_network_filtered/real_edge_table.txt" -o "~./data/results" -f "obesitycat" -c "Lean" "Obese"
network_analysis.py
Node file: ~./data/filtered_data/otu_network_filtered/real_node_table.txt
Edge file: ~./data/filtered_data/otu_network_filtered/real_edge_table.txt
Output filepath: ~./data/results
Feature: obesitycat
Categories: ['Lean', 'Obese']
n_iterations: 1000
Parsing ~./data/filtered_data/otu_network_filtered/real_node_table.txt
Parsing ~./data/filtered_data/otu_network_filtered/real_edge_table.txt
Unioned DataFrame:
from to obesitycat degree
0 77.TS134 12727 Obese 2
1 77.TS126.2 12727 Obese 2
2 77.TS19 13986 Obese 3
3 77.TS127 13986 Lean 3
4 77.TS66 13986 Obese 3
5 77.TS2.2 15728 Lean 43
6 77.TS134.2 15728 Obese 43
7 77.TS27.2 15728 Obese 43
8 77.TS39.2 15728 Obese 43
9 77.TS124 15728 Lean 43
...
Lean Only:
obesitycat from degree to
0 Lean 77.TS185.2 1 16477
1 Lean 77.TS4.2 1 24162
2 Lean 77.TS155.2 1 32546
3 Lean 77.TS165.2 1 34789
4 Lean 77.TS13 1 70632
5 Lean 77.TS129 1 109587
6 Lean 77.TS109.2 1 110059
7 Lean 77.TS25 2 113278
8 Lean 77.TS2 1 113827
9 Lean 77.TS30.2 1 113919
...
Obese Only:
obesitycat from degree to
0 Obese 77.TS134 2 12727
1 Obese 77.TS119.2 2 24546
2 Obese 77.TS118.2 3 25534
3 Obese 77.TS190 1 28218
4 Obese 77.TS156 1 29566
5 Obese 77.TS169.2 1 33112
6 Obese 77.TS21 1 34139
7 Obese 77.TS87 1 35260
8 Obese 77.TS43 10 36330
9 Obese 77.TS169 3 36378
...
Both Lean and Obese:
obesitycat from degree to
0 Obese 77.TS19 3 13986
1 Lean 77.TS2.2 43 15728
2 Obese 77.TS116.2 39 16054
3 Lean 77.TS195 2 16340
4 Obese 77.TS94.2 4 17311
5 Obese 77.TS70.2 12 19611
6 Obese 77.TS74 5 31249
7 Obese 77.TS133 19 48084
8 Lean 77.TS13.2 4 49088
9 Obese 77.TS67.2 14 52624
...
Processing statistics for Lean nodes, for 1000 iterations, with samples of 40.
Processing statistics for Obese nodes, for 1000 iterations, with samples of 40.
Processing statistics for otu nodes connected to Lean only, for 1000 iterations, with samples of 40.
Processing statistics for otu nodes connected to Obese only, for 1000 iterations, with samples of 40.
Processing statistics for otu nodes connected to both Lean and Obese, for 1000 iterations, with samples of 40.
Statistics:
Lean:
Min: 206.103 1Q: 248.114 Mean: 271.906 Median: 271.788 3Q: 296.252 Max: 344.526 Std: 35.574
Obese:
Min: 163.965 1Q: 246.257 Mean: 275.77 Median: 280.659 3Q: 308.187 Max: 363.067 Std: 46.73
LeanOnly :
Min: 1.0 1Q: 1.0 Mean: 1.435 Median: 1.01 3Q: 1.809 Max: 4.185 Std: 0.759
ObeseOnly :
Min: 1.0 1Q: 1.003 Mean: 2.732 Median: 1.644 3Q: 3.016 Max: 16.614 Std: 3.154
Both Lean and Obese:
Min: 2.067 1Q: 6.385 Mean: 24.117 Median: 14.049 3Q: 31.4 Max: 119.475 Std: 26.959
Output saved to: ~./data/results/obesitycat_network_analysis.txt
No comments:
Post a Comment