A Huge List of Machine Learning And Statistics Repositories
The following is a list of machine learning, math, statistics, data visualization and deep learning repositories I have found surfing Github over the past 4 years. If you’re looking for more documentation and less code, check out awesome machine learning.
If you’re interested in this type of content, follow me on twitter: @josephmisiti
2012-paper-diginorm - Paper source, analysis notebook, and data generation/analysis scripts for diginorm paper
2014-talks - This is the official repository for slides and talks from GopherCon 2014
538model - 538 Election Forecasting Model
9m - 9m Unicode URL Shortener
aa228-notebook - IJulia notebooks for AA228/CS238 Decision Making Under Uncertainty course at Stanford University
aerosolve - A machine learning package built for humans.
airbnb - Data collection for Airbnb business
AL - Active Learning
alpha - Open-source web microblogging client for App.net
AlwaysRemember - Zipfian capstone project - Dan Morris
amazonaccess - Amazon Employee Access Challenge
angular-nvd3 - An AngularJS directive for NVD3 reusable charting library (based on D3). Easily customize your charts via JSON API.
annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
ark-tweet-nlp - CMU ARK Twitter Part-of-Speech Tagger
audio_fingerprinting - Exploration of using image processing algorithms in other domains
awesome-artificial-intelligence - A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers
awesome-d3 - A list of D3 libraries, plugins and utilities
awesome-go - A curated list of awesome Go frameworks, libraries and software. Inspired by awesome-python.
awesome-python - A curated list of awesome Python frameworks, libraries and software. Inspired by awesome-php.
babynames - Fun with the Social Security Administration’s baby name data
backbone.baseview - A simple base view class for Backbone.View,
backbone.radio - Messaging patterns for Backbone applications.
bank - Statsd and Metricsd frontend for UDP packets aggregation
barnes-hut-sne - Unofficial repository for the Barnes-Hut version of t-SNE by Laurens van der Maaten
bashplotlib - plotting in the terminal
Bastien-Theano-Workshop - Fred’s Theano Workshop
BayesDataAnalysisWithPyMC - Python (PyMC) adaptation of the R code from “Doing Bayesian Data Analysis”
bayesian - Utility for Bayesian reasoning
Bayesian-data-analysis-with-PyMC2 - Bayesian data analysis with PyMC(2)
BayesPy - Bayesian Inference Tools in Python
bci-challenge-ner-2015 - Code and documentation for the winning solution at the BCI Challenge @ NER 2015 : https://www.kaggle.com/c/inria-bci-challenge
biaxial-rnn-music-composition - A recurrent neural network designed to generate classical music.
bird - Pure Python implementation of the BIRD algorithm for (structured)-sparsity based denoising of multichannel array
bitcoin - Bitcoin Core integration/staging tree
blaze-scipy-2014 - Slides for scipy 2014 conference
blocks - A Theano framework for building and training neural networks
bokeh - Interactive Web Plotting for Python
bolt - Bolt Online Learning Toolbox
book - Crypto 101, the introductory book on cryptography.
bootstrap-wysiwyg - Tiny bootstrap-compatible WISWYG rich text editor
brainmets - Survival prediction for brain metastases
breeze - Breeze is a library for numerical processing, machine learning, and natural language processing. Its primary focus is on being generic, clean, and powerful without sacrificing (much) efficiency. Breeze is the merger of the ScalaNLP and Scalala projects, because one of the original maintainers is unable to continue development. The Scalala parts are largely rewritten.
cached-property - A decorator for caching properties in classes.
cayley - An open-source graph database
ccv - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library
censored_regression - Linear regression with lower-bound labels
chainer - A flexible framework of neural networks for deep learning
chicago-atlas - View citywide information about health trends and take action near you to improve your own health.
citibike_analysis - Let’s analyze Citibike data!
cloudy-tweets - Machine Learning solution for Kaggle.com’s “Partly Sunny with a Chance of Hashtags”
CNN_sentence - CNNs for sentence classification
coins - Bitcoin value tracker
compneuro - Computational Neuroscience class materials in PyDSTool based on Hugh Wilson’s book (and some from Eugene Izhikevich’s book). DRAFT VERSION! See files_info.txt for index. Please improve and discuss!
Conjecture - Scalable Machine Learning in Scalding
cookbook-code - Recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python
copper - Fast, easy and intuitive machine learning prototyping.
CoverTree - Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree
crab - Crab - A recommendation engine library for Python
craigslist-checker - Send text when a new Craigslist posting matches a given keyword or phrase
crime - None
cs229-project - Recommender System Project for CS229 at Stanford in Fall 2012
cudamat - Python module for performing basic dense linear algebra computations on the GPU using CUDA.
cudnn-python-wrappers - Python wrappers for the NVIDIA cuDNN libraries
curfil - CUDA Random Forest implementation for Image Labeling tasks
cython_lstm - Python LSTM Python library for getting things done quickly, greatly, and without waiting 50 years for compilation
d3.chart - A framework for creating reusable charts with d3.js.
daft - Render some probabilistic graphical models using matplotlib
darkmarket - the world’s fastest response to a govt takedown of digital markets
dash - Flask, JS, and CSS boilerplate for interactive, web-based visualization apps in Python
data - Data and code behind the stories and interactives at FiveThirtyEight
Data-Analysis-and-Machine-Learning-Projects - Repository of teaching materials, code, and data for my data analysis and machine learning projects.
dataanalysis - Coursera data analysis course, done in Python
datascience-anthology-pydata - PyData, The Complete Works of
datasets - Datasets of some standard computer vision / deep learning benchmarks
data_hacking - Click Security Data Hacking Project
data_hacks - Command line utilities for data analysis
db-readings - Readings in Databases
decision-weights - Homegrown analysis of Prospect Theory: Math, turkers and python =)
deep-limits - Repo for a paper about constructing priors on very deep models.
DeepANN - Theano based deep ANN learning code
DeepLearning - Deep Learning (Python, C/C++, Java, Scala)
DeepLearningTutorials - Tutorials from deeplearning.net converted for Torch
deepmat - Matlab Code for Restricted/Deep Boltzmann Machines and Autoencoders
deepnet - Implementation of some deep learning algorithms.
democracy-measurement-model - Replication materials for Bayesian measurement error model of dichotomous measures of democracy.
Diffusion-Probabilistic-Models - Reference implementation for Deep Unsupervised Learning using Nonequilibrium Thermodynamics
diffusion-segmentation - A collection of image segmentation algorithms based on diffusion methods
dimensionality-reduction-for-sparse-binary-data - convert a lot of zeros and ones to fewer real numbers
dist_lda - distributed latent dirichlet allocation
dl-machine - Scripts to setup a GPU / CUDA-enabled compute server with libraries for deep learning
docopt - Pythonic command line arguments parser, that will make you smile
Doing_bayesian_data_analysis - Python/PyMC3 versions of the programs described in Doing bayesian data analysis by John K. Kruschke
downhill - Stochastic gradient optimization routines for Theano
dpmm - Dirichlet process mixture model.
dsutils - Utilities for Python’s data science ecosystem
DWURecyclingAlert - A drop-in code snippet that dynamically detects non-recycled UI elements inside your UITableViewCells.
Elements-of-Statistical-Learning - Contains LaTeX, SciPy and R code providing solutions to exercises in Elements of Statistical Learning (Hastie, Tibshirani & Friedman)
elevator.js - Finally, a “back to top” button that behaves like a real elevator.
emcee - The Python ensemble sampling toolkit for affine-invariant MCMC
energy - Artsy Folio, The Partner iPhone / iPad app.
epoch - A general purpose, real-time visualization library.
equity - Series Seed Preferred Stock
fabric-ec2 - Some helpers to simplify running Fabric tasks on EC2 instances
faker - Faker is a Python package that generates fake data for you.
fb-mac-messenger - ⚡️ Mac app wrapping Facebook’s Messenger for desktop
featureforge - A set of tools for creating and testing machine learning features, with a scikit-learn compatible API
flow - Volumetric Particle Flow - http://david.li/flow
FM_FTRL - Hashed Factorization Machine with Follow The Regularized Leader for Kaggle Avazu Click-Through Rate Competition
freebayes - Bayesian haplotype-based polymorphism discovery and genotyping.
fuel - A data pipeline framework for machine learning
fuzzywuzzy - Fuzzy String Matching in Python
gdbn - George Dahl’s gdbn: Pre-trained deep neural networks
gender-data-pkg - A data package for R containing historical data sets about gender
gensim - Topic Modelling for Humans
getting-started-with-haskell - notes on where to find Haskell tutorials and tips to complete them
Ghost - Just a blogging platform
githut - Visualization of data from github archive.
glove-python - Toy Python implementation of http://www-nlp.stanford.edu/projects/glove/
go - The Open Source Data Science Masters
gobot - A Go framework/set of libraries for robotics and physical computing
goji - Goji is a minimalistic web framework for Golang inspired by Sinatra that’s high in antioxidants
golearn - Machine Learning for Golang
goop - A dependency manager for Go (golang), inspired by Bundler.
Grasp-and-lift-EEG-challenge - Code and documentation for the winning sollution at the Grasp-and-Lift EEG Detection challenge
GroundHog - Library for implementing RNNs with Theano
gtor - A General Theory of Reactivity
h2o - h2o = fast statistical, machine learning & math runtime for bigdata
hebel - GPU-Accelerated Deep Learning Library in Python
hedgehog - Re-implementation of method in Playing Atari with Deep Reinforcement Learning paper.
higgsml - The winning solution to the The Higgs Boson Machine Learning Challenge.
hillary-clinton-emails - Code to transform Hillary’s emails from raw PDF documents to a SQLite database
hiscore - HiScore makes creating sophisticated scores easy
hmmlearn - Hidden Markov Models in Python, with scikit-learn like API
hockey - Hockey analytics
How-to-Make-a-Computer-Operating-System - How to Make a Computer Operating System in C++
huginn - Build agents that monitor and act on your behalf. Your agents are standing by!
hydra - Multi-process MongoDB collection copier.
hyperopt-sklearn - Hyper-parameter optimization for sklearn
ida - An introduction to data analysis, using R. Experimental.
iir - Machine Learning / Natural Language Processing / Information Retrieval
impyla - Python client to Cloudera Impala
inception - None
intercooler-js - A declarative, REST-ful data binding library for web applications
intro-python-scarpentry-paris-2013 - Teaching material on Python for the Software Carpentry bootcamp at Telecom ParisTech, Paris, 2013
intro2stats - Introduction to Statistics using Python
ipython - Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.
ipython-notebooks - Collection of IPython Notebooks
IPython-plotly - A collection of IPython notebooks that feature Plotly graphs. Content moved to https://plotl.ly/ipython-notebooks/
iTorch - IPython kernel for Torch with visualization and plotting
ivalice - Boosting and ensemble learning in Python.
iwae - Code to train Importance Weighted Autoencoders on MNIST and OMNIGLOT
jobtastic - Make your user-responsive long-running Celery jobs totally awesomer.
kaggle-avito - Winning solution to the Avito CTR competition
kaggle-blackbox - Deep learning made easy
kaggle-burn-cpu - Code for the “Burn CPU, burn” competition at Kaggle. Uses Extreme Learning Machines and hyperopt.
kaggle-cifar - Code for the CIFAR-10 competition at Kaggle, uses cuda-convnet
kaggle-cifar10 - Code for Kaggle Cifar10 competition
Kaggle-Competitions - All Kaggle competitions
kaggle-digits - Some code for the Digits competition at Kaggle, incl. pylearn2’s maxout
kaggle-dogs-vs-cats - Code for Kaggle Dovs vs. Cats competition
kaggle-galaxies - Winning solution for the Galaxy Challenge on Kaggle (http://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge)
kaggle-gender - A Kaggle competition: discriminate gender based on handwriting
Kaggle-NDSB - doc and model for NDSB
kaggle-ndsb-1 - Winning solution for the National Data Science Bowl competition on Kaggle (plankton classification)
kaggle-stumbleupon - bag of words + sparsenn
kaggle-whales - code for the Whale Detection Challenge competition on Kaggle
kaggle_acquire-valued-shoppers-challenge - Code for the Kaggle acquire valued shoppers challenge
kaggle_diabetic_retinopathy - Fifth place solution of the Kaggle Diabetic Retinopathy competition.
kaggle_insults - Kaggle Submission for “Detecting Insults in Social Commentary”
Kalman-and-Bayesian-Filters-in-Python - Kalman Filter textbook using Ipython Notebook. This book takes a minimally mathematical approach, focusing on building intuition and experience, not formal proofs. Includes Kalman filters, Extended Kalman filters, unscented filters, and more. Includes exercises with solutions.
kartograph.py - Renders beautiful SVG maps in Python.
kastnerkyle.github.io - Github blog
Kayak - Kayak is a library for automatic differentiation with applications to deep neural networks.
kbmf - Kernelized Bayesian Matrix Factorization
keras - Theano-based Deep Learning library
klaus - A show-case of a state-of-the-art image classifier on iOS devices from libccv.org
kod - My personal code
ladder - Ladder network is a deep learning algorithm that combines supervised and unsupervised learning
lanyon - markdown web server
lda - LDA topic modeling for node.js
LearnDataScience - Open Content for self-directed learning in data science
learnhaskell - Learn Haskell
learning-pymc - Code I am using to learn PyMC
learning-spark - A practical example of Apache Spark using the StackExchange dataset.
leo-senate-model - Code and data for The Upshot’s Senate model.
librosa - Python library for audio and music analysis
libtins - High-level, multiplatform C++ network packet sniffing and crafting library.
libvips - A fast image processing library with low memory needs.
lifelines - Survival analysis in Python
lightfm - A Python implementation of LightFM, a hybrid recommendation algorithm.
lightning - Large-scale linear classification and regression in Python/Cython.
LightTable - The Light Table IDE
lmkkmeans - Localized Multiple Kernel k-Means Clustering
lstm-char-cnn - LSTM language model with CNN over characters
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
machine-learning - Code & Data for Introduction to Machine Learning with Scikit-Learn
mailinabox - Mail-in-a-Box helps individuals take back control of their email by defining a one-click, easy-to-deploy SMTP+everything else server: a mail server in a box.
mailr - Webmail client with gmail like conversations
make - Source code for unix make
math-as-code - a cheat-sheet for mathematical notation in code form
matplotlib-gallery - Examples of matplotlib codes and plots
MechanicalSoup - A Python library for automating interaction with websites.
meta - A Modern C++ Data Sciences Toolkit
metacademy-application - Metacademy.org’s application code
Metronome - Suite of parallel iterative algorithms built on top of Iterative Reduce
MFTracker - MF Tracker made in python using OpenCV 2.3.1 and SimpleCV
Mining-the-Social-Web-2nd-Edition - The official online compendium for Mining the Social Web, 2nd Edition (O’Reilly, 2013)
minirank - Ranking and ordinal regression algorithms in Python
ml-ease - ADMM based large scale logistic regression
mlb_terminal - A terminal interface for streaming real-time updates for MLB games
mlp-character-recognition - Trains a multi-layer perceptron (MLP) neural network to perform optical character recognition (OCR).
ML_for_Hackers - Code accompanying the book “Machine Learning for Hackers”
mne-python-notebooks - IPython notebooks for EEG/MEG data processing using mne-python
mondrianforest - Code for “Mondrian Forests: Efficient Online Random Forests”
mongotools - Wish MongoDB Tools
morb - Modular Restricted Boltzmann Machine (RBM) implementation using Theano
morphing_faces - Repository for the Morphing Faces demo
mortar-recsys - A customizable recommendation engine for Hadoop and Pig by Mortar Data.
Motivating_and_Visualizing_Recursion_in_Python - Riffing on Gustavo Duarte’s arguments and examples about using tree traversal rather than factorial for motivating and visualizing recursion
moviepy - Script-based movie editing with python
mpld3 - D3 Renderings of Matplotlib Graphics
mxnet - An efficient, flexible distributed framework for deep learning
neupy - NeuPy is a Python library for Artificial Neural Networks.
neural-networks - Artificial Neural Networks / Python
neural-networks-and-deep-learning - Code samples for my book “Neural Networks and Deep Learning”
neural-style - Torch implementation of neural style algorithm
neuraltalkTheano - Theano implementation of neuraltalk code by karpathy (https://github.com/karpathy/neuraltalk)
neurokernel - Neurokernel Project
neurosynth - NeuroSynth core tools
nfldata - Combining datasets with MapReduce on NFL play by play data.
nfl_results - Results from NFL games since 1978 in CSV format
ngcm_pandas_course - Python data analysis course for 2015 NGCM Summer Academy
ngs-course.github.io - NGS course
ngxtop - Real-time metrics for nginx server
nilearn - Machine learning for NeuroImaging in Python
NIPS2013_sklearn - Abstract for NIPS 2013
nipspreview - Scripts that generate .html to more easily see NIPS papers
nipy - Neuroimaging in python
nntools - neural network tools for Theano
nolearn - scikit-learn compatible wrappers for neural net libraries, and other utilities.
notebooks - Some sample IPython notebooks for scikit-learn
notes-on-neural-networks - Rough working notes on neural networks
nsq - A realtime distributed messaging platform
nupic - Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms.
nut - Natural language Understanding Toolkit
nye - New Year’s Eve Cocktail Menu
ocropy - Python-based OCR package using recurrent neural networks.
office-nfl-pool - A fun introduction to Pandas andScikit-Learn using nfl data
openfootball.github.io - Open Football Data (football.db) Web Site
openscoring - REST web service for scoring PMML models
OpenTLD - Official source code for TLD
PAC-Bayes_sample_compress_for_kernel_methods - Learning algorithms introduced in “A PAC-Bayes Sample Compression Approach to Kernel Methods” (ICML 2011)
pandas-cookbook - Recipes for using Python’s pandas library
pandashells - :panda_face: Bringing the python data stack to the shell prompt
panns - Python Approximate Nearest Neighbor Search in very high dimensional space with optimized indexing.
parallel_ml_tutorial - Tutorial on scikit-learn and IPython for parallel machine learning
pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
pattern_classification - A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks
PDA_Book - Code Examples
pgloader - Loading data into PostgreSQL
phraug - A set of simple Python scripts for pre-processing large files
pmc - Probabilistic Multiplicity Counting
pomegranate - Graphical models for Python, implemented in Cython for speed.
portia - Visual scraping for Scrapy
ppod - Collection of snippets and what-not to share with others
presentations - Presentations for JuliaCon
psdash - A linux system information web dashboard using psutils and flask
pybloomfiltermmap - Fast Python Bloom Filter using Mmap
pybo - Python package for modular Bayesian optimization
pybrain-practice - A regression example for PyBrain
pycaffe-recurrent - IPython notebook for training multilayer LSTM and RNN networks with pycaffe
pycon-pydata-sprint - Experimental work for using IPython.parallel with scikit-learn
pyconsg2013-tut - Introduction to data processing with Python
pydata-gbrt-tutorial - IPython notebook for PyData SF 2014 tutorial: “Gradient Boosted Regression Trees in scikit-learn”
PyData2014-Berlin - Networks meet Finance in Python - July 27 2014
pydata_ninja - The Path of the PyData Ninja
pydub - Manipulate audio with a simple and easy high level interface
pyGPs - pyGPs is a library containing an object-oriented python implementation for Gaussian Process (GP) regression and classification.
pylearn2 - A Machine Learning library based on Theano
pymc - PyMC: Bayesian Stochastic Modelling in Python
pymc-examples - PyMC Example Notebooks
PyPoi - “Py”thon program for “Poi”sson Image Editing
pyston - Pyston is a new, open-source Python implementation using JIT techniques, being developed by Dropbox.
pystruct - Simple structured learning framework for python
pytenn2014_tutorial - PyTennessee 2014: Statistical Data Analysis in Python
python-bloomfilter - Scalable Bloom Filter implemented in Python
python-destin - Python implementation of the DeSTIN deep learning perception system using the Theano library
Python-for-Signal-Processing - Notebooks for “Python for Signal Processing” book
python-lda - LDA in Python.
python-recsys - A python library for implementing a recommender system
PythonCounterPmf - Examples using Python’s Counter collection to implement a probability mass function (PMF)
PythonicPerambulations - A port of jakevdp.github.com to the Pelican platform
Python_for_Data_Science - A rapid on-ramp primer for programmers who want to learn Python for doing data science research and development.
python_reference - Syntax examples for useful Python functions, methods, and modules
pywt - PyWavelets - Discrete Wavelet Transform in Python
quietnet - Simple chat program using inaudible sounds and a computer’s microphone and speaker.
rcnn - R-CNN: Regions with Convolutional Neural Network Features
recsys-mapreduce-mrjob - Examples of Recommendations powered by MapReduce and mrjob
reddit-bigquery - Code + Jupyter notebook for analyzing and visualizing Reddit Data quickly and easily
rendr - Render your Backbone.js apps on the client and the server, using Node.js.
responses - A utility for mocking out the Python Requests library.
restricted-boltzmann-machines - Restricted Boltzmann Machines in Python.
rnn - Recurrent Neural Network library for Torch7’s nn
RPi-KittyCam - Raspberry Pi app using a camera and PIR motion sensor, written in Node.js with Johnny-Five and Kittydar for cat facial detection
sarah-palin-lda - Topic Modeling the Sarah Palin emails.
scientific-python-intro-45mins - An intro presentation of scientific Python in 45 minutes
scikit-fuzzy - Fuzzy Logic SciKit (Toolkit for SciPy)
scikit-image - Image Processing SciKit (Toolbox for SciPy)
scikit-learn-tutorial - Applied Machine Learning in Python with scikit-learn
scikits.cuda - CUDA SciKit
scipy - Scipy main repository
scipy-lecture-notes - Tutorial material on the scientific Python ecosystem
scipy-tutorials - SciPy tutorials. This is outdated, check out scipy-lecture-notes
scipy_2015_sklearn_tutorial - Scikit-Learn tutorial material for Scipy 2015
scipy_proceedings - Tools used to generate the SciPy conference proceedings
SCRNNs - This is a self contained software accompanying the paper titled: Learning Longer Memory in Recurrent Neural Networks: http://arxiv.org/abs/1412.7753.
seaborn - Statistical data visualization using matplotlib
SeattleBike - Understanding Seattle Bike Count data
secondorderdemos - second order demos
secrets-newsblur-skel - Project to get NewsBlur running on a cluster of VMs
sentence2vec - Tools for mapping a sentence with arbitrary length to vector space
sentiment-analyzer - Tweets Sentiment Analyzer
sentiment_analysis_python - Working with sentiment analysis in Python.
server-configs-nginx - Nginx HTTP server boilerplate configs
shotcut - cross-platform (Qt), open-source (GPLv3) video editor
simhash-py - Simhash and near-duplicate detection
SimpleAintEasy - A compendium of the pitfalls and problems that arise when using standard statistical methods
SimpleCV - The Open Source Framework for Machine Vision
skdata - Data sets for machine learning in Python
skimage-tutorials - Scikit-image tutorials
skip-thoughts - Sent2Vec encoder and training code from the paper “Skip-Thought Vectors”
sklearn-theano - Scikit-learn compatible tools using theano
sklearn_pycon2013 - Files for my scikit-learn tutorial at PyCon 2013
sklearn_pycon2014 - Repository containing files for my PyCon 2014 scikit-learn tutorial.
sklearn_scipy2013 - Scikit-learn tutorials for the Scipy 2013 conference
sklearn_tutorial - Intro to machine learning with sklearn
skll - SciKit-Learn Laboratory makes it easy to run machine learning experiments.
smile - Statistical Machine Intelligence & Learning Engine
snake-charmer - A self-contained Python workbench for scientific programming, data mining, maths, stats and visualization
snakebite - A pure python HDFS client
snap - Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
spaCy - Industrial strength NLP with Python and Cython
spark-movie-lens - An on-line movie recommender using Spark, Python, and the MovieLens dataset
SparseConvNet - Spatially-sparse convolutional networks. Allows processing of sparse 2, 3 and 4 dimensional data.Build CNNs on the square/cubic/hypercubic or triangular/tetrahedral/hyper-tetrahedral lattices.
spearmint - Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper: Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural Information Processing Systems, 2012
speech-language-processing - A curated list of speech and natural language processing resources
spylearn - Repo for experiments on pyspark and sklearn
Squire - HTML5 rich text editor
stardose - A recommender system for GitHub repositories
stat-learning - Notes and exercise attempts for “An Introduction to Statistical Learning”
statistical-analysis-python-tutorial - Statistical Data Analysis in Python
statlearning-notebooks - Python notebooks for exercises covered in Stanford statlearning class (where exercises were in R).
statsintro - Introduction to Statistics
statsmodels - Statsmodels: statistical modeling and econometrics in Python
streamparse - streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
streamtools - tools for working with streams of data
swift-cheat-sheet - A short guide to using Apple’s new programming language, Swift.
syncthing - Open Source Continuous Replication / Cluster Synchronization Thing
talks - IPython notebooks and slides for talks I’ve given
tan-clustering - Hierarchical word clustering, following “Brown clustering” (Brown et al., 1992)
tauCharts - D3 based data-focused charting library. Designed with passion. Flexible.
TextBlob - Simple, Pythonic, text processing–Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
textract - extract text from any document. no muss. no fuss.
Theano-Lights - Deep learning research framework based on Theano
theano-tutorial - A collection of tutorials on neural networks, using Theano
Theano-Tutorials - Bare bones introduction to machine learning from linear regression to convolutional neural networks using Theano.
ThinkBayes - Code repository for Think Bayes.
ThinkStats2 - Text and supporting code for Think Stats, 2nd Edition
thumbor - thumbor is an open-source photo thumbnail service by globo.com
toolz - A functional standard library for Python.
trackpy - Python particle tracking toolkit
trackpy-examples - sample images, examples, and speed tests for trackpy
trials - Tiny Bayesian A/B testing library
tutorial_ml_gkbionics - A Tutorial on Simple Machine Learning Methods Held for the Graduate School on Bionics, 2012
twisted - Event-driven networking engine written in Python.
ufldl_tutorial - Stanford Unsupervised Feature Learning and Deep Learning Tutorial
universal-jst - precompiling your templates into JST’s, with some sugar
us-address-parser - US address parsing
Variational-Autoencoder - Implementation of a variational Auto-encoder
video_cnn - CS231N Final Project - Andrew Giel, Ryan Diaz
visual-semantic-embedding - Implementation of the image-sentence embedding method described in “Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models”
vowpal_wabbit - John Langford’s original release of Vowpal Wabbit – a fast online learning algorithm
waifu2x - Image Super-Resolution for Anime/Fan-Art
wand - The ctypes-based simple ImageMagick binding for Python
watchy - Stats aggregation and embeddable client library - inspired from StatsD
wikichallenge - An implementation of Dell Zhang’s solution to Wikipedia’s Participation Challenge on Kaggle
wiwinwlh - What I Wish I Knew When Learning Haskell
word2vec - Python interface to Google word2vec
Word2VecExample - An example application using Word2Vec. Given a list of words, it finds the one which isn’t ‘like’ the others - a typical language understanding evaluation task.
xgboost - eXtreme Gradient Boosting (Tree) Library
xmltodict - Python module that makes working with XML feel like you are working with JSON
zipline - Zipline, a Pythonic Algorithmic Trading Library