The following is a list of machine learning, math, statistics, data visualization and deep learning repositories I have found surfing Github over the past 4 years. If you’re looking for more documentation and less code, check out awesome machine learning.

If you’re interested in this type of content, follow me on twitter: @josephmisiti

2012-paper-diginorm - Paper source, analysis notebook, and data generation/analysis scripts for diginorm paper

2014-talks - This is the official repository for slides and talks from GopherCon 2014

538model - 538 Election Forecasting Model

9m - 9m Unicode URL Shortener

aa228-notebook - IJulia notebooks for AA228/CS238 Decision Making Under Uncertainty course at Stanford University

aerosolve - A machine learning package built for humans.

airbnb - Data collection for Airbnb business

AL - Active Learning

alpha - Open-source web microblogging client for App.net

AlwaysRemember - Zipfian capstone project - Dan Morris

amazonaccess - Amazon Employee Access Challenge

angular-nvd3 - An AngularJS directive for NVD3 reusable charting library (based on D3). Easily customize your charts via JSON API.

annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

ark-tweet-nlp - CMU ARK Twitter Part-of-Speech Tagger

audio_fingerprinting - Exploration of using image processing algorithms in other domains

awesome-artificial-intelligence - A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers

awesome-d3 - A list of D3 libraries, plugins and utilities

awesome-go - A curated list of awesome Go frameworks, libraries and software. Inspired by awesome-python.

awesome-python - A curated list of awesome Python frameworks, libraries and software. Inspired by awesome-php.

babynames - Fun with the Social Security Administration’s baby name data

backbone.baseview - A simple base view class for Backbone.View,

backbone.radio - Messaging patterns for Backbone applications.

backgrid - Finally, an easily stylable semantic HTML data grid widget with a Javascript API that doesn’t suck.

bank - Statsd and Metricsd frontend for UDP packets aggregation

barnes-hut-sne - Unofficial repository for the Barnes-Hut version of t-SNE by Laurens van der Maaten

bashplotlib - plotting in the terminal

Bastien-Theano-Workshop - Fred’s Theano Workshop

BayesDataAnalysisWithPyMC - Python (PyMC) adaptation of the R code from “Doing Bayesian Data Analysis”

bayesian - Utility for Bayesian reasoning

Bayesian-data-analysis-with-PyMC2 - Bayesian data analysis with PyMC(2)

BayesPy - Bayesian Inference Tools in Python

bci-challenge-ner-2015 - Code and documentation for the winning solution at the BCI Challenge @ NER 2015 : https://www.kaggle.com/c/inria-bci-challenge

benchm-ml -

biaxial-rnn-music-composition - A recurrent neural network designed to generate classical music.

bird - Pure Python implementation of the BIRD algorithm for (structured)-sparsity based denoising of multichannel array

bitcoin - Bitcoin Core integration/staging tree

blaze-scipy-2014 - Slides for scipy 2014 conference

blocks - A Theano framework for building and training neural networks

bokeh - Interactive Web Plotting for Python

bolt - Bolt Online Learning Toolbox

book - Crypto 101, the introductory book on cryptography.

bootstrap-wysiwyg - Tiny bootstrap-compatible WISWYG rich text editor

brain - Neural networks in JavaScript

brainmets - Survival prediction for brain metastases

breeze - Breeze is a library for numerical processing, machine learning, and natural language processing. Its primary focus is on being generic, clean, and powerful without sacrificing (much) efficiency. Breeze is the merger of the ScalaNLP and Scalala projects, because one of the original maintainers is unable to continue development. The Scalala parts are largely rewritten.

cached-property - A decorator for caching properties in classes.

cayley - An open-source graph database

ccv - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

censored_regression - Linear regression with lower-bound labels

chainer - A flexible framework of neural networks for deep learning

chicago-atlas - View citywide information about health trends and take action near you to improve your own health.

citibike_analysis - Let’s analyze Citibike data!

cloudy-tweets - Machine Learning solution for Kaggle.com’s “Partly Sunny with a Chance of Hashtags”

CNN_sentence - CNNs for sentence classification

Coelho2015_NetsDetermination -

coins - Bitcoin value tracker

compneuro - Computational Neuroscience class materials in PyDSTool based on Hugh Wilson’s book (and some from Eugene Izhikevich’s book). DRAFT VERSION! See files_info.txt for index. Please improve and discuss!

Conjecture - Scalable Machine Learning in Scalding

convnetjs - Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.

cookbook-code - Recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python

copper - Fast, easy and intuitive machine learning prototyping.

CoverTree - Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

crab - Crab - A recommendation engine library for Python

craigslist-checker - Send text when a new Craigslist posting matches a given keyword or phrase

crime - None

cs229-project - Recommender System Project for CS229 at Stanford in Fall 2012

cudamat - Python module for performing basic dense linear algebra computations on the GPU using CUDA.

cudnn-python-wrappers - Python wrappers for the NVIDIA cuDNN libraries

curfil - CUDA Random Forest implementation for Image Labeling tasks

cython_lstm - Python LSTM Python library for getting things done quickly, greatly, and without waiting 50 years for compilation

d3.chart - A framework for creating reusable charts with d3.js.

d3js-presentation -

daft - Render some probabilistic graphical models using matplotlib

darkmarket - the world’s fastest response to a govt takedown of digital markets

dash - Flask, JS, and CSS boilerplate for interactive, web-based visualization apps in Python

data - Data and code behind the stories and interactives at FiveThirtyEight

Data-Analysis-and-Machine-Learning-Projects - Repository of teaching materials, code, and data for my data analysis and machine learning projects.

dataanalysis - Coursera data analysis course, done in Python

DataGotham2013 -

datascience-anthology-pydata - PyData, The Complete Works of

datascientist_tutorial -

datasets - Datasets of some standard computer vision / deep learning benchmarks

data_hacking - Click Security Data Hacking Project

data_hacks - Command line utilities for data analysis

db-readings - Readings in Databases

decision-weights - Homegrown analysis of Prospect Theory: Math, turkers and python =)

deep-limits - Repo for a paper about constructing priors on very deep models.

deep-pink -

DeepANN - Theano based deep ANN learning code

DeepLearning - Deep Learning (Python, C/C++, Java, Scala)

DeepLearningTutorials - Tutorials from deeplearning.net converted for Torch

deepmat - Matlab Code for Restricted/Deep Boltzmann Machines and Autoencoders

deepnet - Implementation of some deep learning algorithms.

democracy-measurement-model - Replication materials for Bayesian measurement error model of dichotomous measures of democracy.

Diffusion-Probabilistic-Models - Reference implementation for Deep Unsupervised Learning using Nonequilibrium Thermodynamics

diffusion-segmentation - A collection of image segmentation algorithms based on diffusion methods

dimensionality-reduction-for-sparse-binary-data - convert a lot of zeros and ones to fewer real numbers

dist_lda - distributed latent dirichlet allocation

dl-machine - Scripts to setup a GPU / CUDA-enabled compute server with libraries for deep learning

docopt - Pythonic command line arguments parser, that will make you smile

Doing_bayesian_data_analysis - Python/PyMC3 versions of the programs described in Doing bayesian data analysis by John K. Kruschke

DonorsChoose_Visualization -

downhill - Stochastic gradient optimization routines for Theano

dpmm - Dirichlet process mixture model.

dsutils - Utilities for Python’s data science ecosystem

DWURecyclingAlert - A drop-in code snippet that dynamically detects non-recycled UI elements inside your UITableViewCells.

Elements-of-Statistical-Learning - Contains LaTeX, SciPy and R code providing solutions to exercises in Elements of Statistical Learning (Hastie, Tibshirani & Friedman)

elevator.js - Finally, a “back to top” button that behaves like a real elevator.

emcee - The Python ensemble sampling toolkit for affine-invariant MCMC

energy - Artsy Folio, The Partner iPhone / iPad app.

epoch - A general purpose, real-time visualization library.

equity - Series Seed Preferred Stock

euroscipy2012_document -

exploratory_computing_with_python -

fabric-ec2 - Some helpers to simplify running Fabric tasks on EC2 instances

faker - Faker is a Python package that generates fake data for you.

fastnet -

fb-mac-messenger - ⚡️ Mac app wrapping Facebook’s Messenger for desktop

featureforge - A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

fix-macosx -

flow - Volumetric Particle Flow - http://david.li/flow

FM_FTRL - Hashed Factorization Machine with Follow The Regularized Leader for Kaggle Avazu Click-Through Rate Competition

freebayes - Bayesian haplotype-based polymorphism discovery and genotyping.

fuel - A data pipeline framework for machine learning

fuzzywuzzy - Fuzzy String Matching in Python

gdbn - George Dahl’s gdbn: Pre-trained deep neural networks

gender-data-pkg - A data package for R containing historical data sets about gender

gensim - Topic Modelling for Humans

getting-started-with-haskell - notes on where to find Haskell tutorials and tips to complete them

ggplot-tutorial -

Ghost - Just a blogging platform

githut - Visualization of data from github archive.

glove-python - Toy Python implementation of http://www-nlp.stanford.edu/projects/glove/

go - The Open Source Data Science Masters

gobot - A Go framework/set of libraries for robotics and physical computing

goji - Goji is a minimalistic web framework for Golang inspired by Sinatra that’s high in antioxidants

golearn - Machine Learning for Golang

goop - A dependency manager for Go (golang), inspired by Bundler.

Grasp-and-lift-EEG-challenge - Code and documentation for the winning sollution at the Grasp-and-Lift EEG Detection challenge

GroundHog - Library for implementing RNNs with Theano

group_lasso -

gtor - A General Theory of Reactivity

h2o - h2o = fast statistical, machine learning & math runtime for bigdata

hawkes -

hebel - GPU-Accelerated Deep Learning Library in Python

hedgehog - Re-implementation of method in Playing Atari with Deep Reinforcement Learning paper.

higgsml - The winning solution to the The Higgs Boson Machine Learning Challenge.

hillary-clinton-emails - Code to transform Hillary’s emails from raw PDF documents to a SQLite database

hiscore - HiScore makes creating sophisticated scores easy

histogram.go -

hmmlearn - Hidden Markov Models in Python, with scikit-learn like API

hockey - Hockey analytics

How-to-Make-a-Computer-Operating-System - How to Make a Computer Operating System in C++

huginn - Build agents that monitor and act on your behalf. Your agents are standing by!

hydra - Multi-process MongoDB collection copier.

hyperopt-sklearn - Hyper-parameter optimization for sklearn

HypTrails -

ida - An introduction to data analysis, using R. Experimental.

iir - Machine Learning / Natural Language Processing / Information Retrieval

impyla - Python client to Cloudera Impala

inception - None

intercooler-js - A declarative, REST-ful data binding library for web applications

intro-python-scarpentry-paris-2013 - Teaching material on Python for the Software Carpentry bootcamp at Telecom ParisTech, Paris, 2013

intro2stats - Introduction to Statistics using Python

ipython - Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

ipython-notebooks - Collection of IPython Notebooks

IPython-plotly - A collection of IPython notebooks that feature Plotly graphs. Content moved to https://plotl.ly/ipython-notebooks/

iTorch - IPython kernel for Torch with visualization and plotting

ivalice - Boosting and ensemble learning in Python.

iwae - Code to train Importance Weighted Autoencoders on MNIST and OMNIGLOT

java-deeplearning -

jliszka.github.com -

jobtastic - Make your user-responsive long-running Celery jobs totally awesomer.

js-must-watch - Must-watch videos about javascript

JSON-js - JSON in JavaScript

kaggle-avito - Winning solution to the Avito CTR competition

kaggle-blackbox - Deep learning made easy

kaggle-burn-cpu - Code for the “Burn CPU, burn” competition at Kaggle. Uses Extreme Learning Machines and hyperopt.

kaggle-cifar - Code for the CIFAR-10 competition at Kaggle, uses cuda-convnet

kaggle-cifar10 - Code for Kaggle Cifar10 competition

Kaggle-Competitions - All Kaggle competitions

kaggle-digits - Some code for the Digits competition at Kaggle, incl. pylearn2’s maxout

kaggle-dogs-vs-cats - Code for Kaggle Dovs vs. Cats competition

kaggle-galaxies - Winning solution for the Galaxy Challenge on Kaggle (http://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge)

kaggle-gender - A Kaggle competition: discriminate gender based on handwriting

kaggle-marinexplore -

Kaggle-NDSB - doc and model for NDSB

kaggle-ndsb-1 - Winning solution for the National Data Science Bowl competition on Kaggle (plankton classification)

kaggle-solar-energy -

kaggle-stumbleupon - bag of words + sparsenn

kaggle-whales - code for the Whale Detection Challenge competition on Kaggle

kaggle_acquire-valued-shoppers-challenge - Code for the Kaggle acquire valued shoppers challenge

kaggle_diabetic_retinopathy - Fifth place solution of the Kaggle Diabetic Retinopathy competition.

kaggle_insults - Kaggle Submission for “Detecting Insults in Social Commentary”

Kalman-and-Bayesian-Filters-in-Python - Kalman Filter textbook using Ipython Notebook. This book takes a minimally mathematical approach, focusing on building intuition and experience, not formal proofs. Includes Kalman filters, Extended Kalman filters, unscented filters, and more. Includes exercises with solutions.

kartograph.py - Renders beautiful SVG maps in Python.

kastnerkyle.github.io - Github blog

Kayak - Kayak is a library for automatic differentiation with applications to deep neural networks.

kbmf - Kernelized Bayesian Matrix Factorization

keras - Theano-based Deep Learning library

klaus - A show-case of a state-of-the-art image classifier on iOS devices from libccv.org

kod - My personal code

ladder - Ladder network is a deep learning algorithm that combines supervised and unsupervised learning

lanyon - markdown web server

lda - LDA topic modeling for node.js

LearnDataScience - Open Content for self-directed learning in data science

learnhaskell - Learn Haskell

learning-pymc - Code I am using to learn PyMC

learning-spark - A practical example of Apache Spark using the StackExchange dataset.

leo-senate-model - Code and data for The Upshot’s Senate model.

librosa - Python library for audio and music analysis

libtins - High-level, multiplatform C++ network packet sniffing and crafting library.

libvips - A fast image processing library with low memory needs.

lifelines - Survival analysis in Python

lightfm - A Python implementation of LightFM, a hybrid recommendation algorithm.

lightning - Large-scale linear classification and regression in Python/Cython.

LightTable - The Light Table IDE

lmkkmeans - Localized Multiple Kernel k-Means Clustering

lstm-char-cnn - LSTM language model with CNN over characters

luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

machine-learning - Code & Data for Introduction to Machine Learning with Scikit-Learn

mailinabox - Mail-in-a-Box helps individuals take back control of their email by defining a one-click, easy-to-deploy SMTP+everything else server: a mail server in a box.

mailr - Webmail client with gmail like conversations

make - Source code for unix make

math-as-code - a cheat-sheet for mathematical notation in code form

matplotlib-gallery - Examples of matplotlib codes and plots

MechanicalSoup - A Python library for automating interaction with websites.

meta - A Modern C++ Data Sciences Toolkit

metacademy-application - Metacademy.org’s application code

Metronome - Suite of parallel iterative algorithms built on top of Iterative Reduce

MFTracker - MF Tracker made in python using OpenCV 2.3.1 and SimpleCV

Mining-the-Social-Web-2nd-Edition - The official online compendium for Mining the Social Web, 2nd Edition (O’Reilly, 2013)

minirank - Ranking and ordinal regression algorithms in Python

ml-ease - ADMM based large scale logistic regression

mlb_terminal - A terminal interface for streaming real-time updates for MLB games

mlp-character-recognition - Trains a multi-layer perceptron (MLP) neural network to perform optical character recognition (OCR).

ML_for_Hackers - Code accompanying the book “Machine Learning for Hackers”

mne-python-notebooks - IPython notebooks for EEG/MEG data processing using mne-python

mocha - mocha - simple, flexible, fun javascript test framework for node.js & the browser. (BDD, TDD, QUnit styles via interfaces)

mondrianforest - Code for “Mondrian Forests: Efficient Online Random Forests”

mongo-hdfs-export -

mongotools - Wish MongoDB Tools

morb - Modular Restricted Boltzmann Machine (RBM) implementation using Theano

morphing_faces - Repository for the Morphing Faces demo

mortar-recsys - A customizable recommendation engine for Hadoop and Pig by Mortar Data.

Motivating_and_Visualizing_Recursion_in_Python - Riffing on Gustavo Duarte’s arguments and examples about using tree traversal rather than factorial for motivating and visualizing recursion

moviepy - Script-based movie editing with python

mpld3 - D3 Renderings of Matplotlib Graphics

mrec -

Multimedia-Machine-Learning-Tutorials -

mxnet - An efficient, flexible distributed framework for deep learning

neupy - NeuPy is a Python library for Artificial Neural Networks.

neural-networks - Artificial Neural Networks / Python

neural-networks-and-deep-learning - Code samples for my book “Neural Networks and Deep Learning”

neural-style - Torch implementation of neural style algorithm

neuraln -

neuraltalkTheano - Theano implementation of neuraltalk code by karpathy (https://github.com/karpathy/neuraltalk)

neurokernel - Neurokernel Project

neurosynth - NeuroSynth core tools

nfldata - Combining datasets with MapReduce on NFL play by play data.

nfl_results - Results from NFL games since 1978 in CSV format

ngcm_pandas_course - Python data analysis course for 2015 NGCM Summer Academy

ngs-course.github.io - NGS course

ngxtop - Real-time metrics for nginx server

nice -

nilearn - Machine learning for NeuroImaging in Python

NIPS2013_sklearn - Abstract for NIPS 2013

nipspreview - Scripts that generate .html to more easily see NIPS papers

nipy - Neuroimaging in python

nntools - neural network tools for Theano

nolearn - scikit-learn compatible wrappers for neural net libraries, and other utilities.

notebooks - Some sample IPython notebooks for scikit-learn

notes-on-neural-networks - Rough working notes on neural networks

nsq - A realtime distributed messaging platform

nupic - Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms.

nut - Natural language Understanding Toolkit

nye - New Year’s Eve Cocktail Menu

ocropy - Python-based OCR package using recurrent neural networks.

office-nfl-pool - A fun introduction to Pandas andScikit-Learn using nfl data

openfootball.github.io - Open Football Data (football.db) Web Site

openscoring - REST web service for scoring PMML models

OpenTLD - Official source code for TLD

OS-X-Yosemite-Security-and-Privacy-Guide -

OverFeat -

PAC-Bayes_sample_compress_for_kernel_methods - Learning algorithms introduced in “A PAC-Bayes Sample Compression Approach to Kernel Methods” (ICML 2011)

pandas-cookbook - Recipes for using Python’s pandas library

pandashells - :panda_face: Bringing the python data stack to the shell prompt

panns - Python Approximate Nearest Neighbor Search in very high dimensional space with optimized indexing.

parallel_ml_tutorial - Tutorial on scikit-learn and IPython for parallel machine learning

pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

pattern_classification - A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks

PDA_Book - Code Examples

pgloader - Loading data into PostgreSQL

phraug - A set of simple Python scripts for pre-processing large files

pmc - Probabilistic Multiplicity Counting

pomegranate - Graphical models for Python, implemented in Cython for speed.

portia - Visual scraping for Scrapy

pourover -

ppod - Collection of snippets and what-not to share with others

presentations - Presentations for JuliaCon

psdash - A linux system information web dashboard using psutils and flask

pybloomfiltermmap - Fast Python Bloom Filter using Mmap

pybo - Python package for modular Bayesian optimization

pybrain -

pybrain-practice - A regression example for PyBrain

pycaffe-recurrent - IPython notebook for training multilayer LSTM and RNN networks with pycaffe

pycon-pydata-sprint - Experimental work for using IPython.parallel with scikit-learn

pyconsg2013-tut - Introduction to data processing with Python

pydata-gbrt-tutorial - IPython notebook for PyData SF 2014 tutorial: “Gradient Boosted Regression Trees in scikit-learn”

PyData2014-Berlin - Networks meet Finance in Python - July 27 2014

pydata_ninja - The Path of the PyData Ninja

pydub - Manipulate audio with a simple and easy high level interface

pyGPs - pyGPs is a library containing an object-oriented python implementation for Gaussian Process (GP) regression and classification.

pylearn2 - A Machine Learning library based on Theano

pymc - PyMC: Bayesian Stochastic Modelling in Python

pymc-examples - PyMC Example Notebooks

pyneural -

PyPoi - “Py”thon program for “Poi”sson Image Editing

pyston - Pyston is a new, open-source Python implementation using JIT techniques, being developed by Dropbox.

pystruct - Simple structured learning framework for python

pytenn2014_tutorial - PyTennessee 2014: Statistical Data Analysis in Python

python-bloomfilter - Scalable Bloom Filter implemented in Python

python-destin - Python implementation of the DeSTIN deep learning perception system using the Theano library

Python-for-Signal-Processing - Notebooks for “Python for Signal Processing” book

python-lda - LDA in Python.

python-recsys - A python library for implementing a recommender system

PythonCounterPmf - Examples using Python’s Counter collection to implement a probability mass function (PMF)

PythonicPerambulations - A port of jakevdp.github.com to the Pelican platform

Python_for_Data_Science - A rapid on-ramp primer for programmers who want to learn Python for doing data science research and development.

python_reference - Syntax examples for useful Python functions, methods, and modules

pywt - PyWavelets - Discrete Wavelet Transform in Python

quietnet - Simple chat program using inaudible sounds and a computer’s microphone and speaker.

rcnn - R-CNN: Regions with Convolutional Neural Network Features

recsys-mapreduce-mrjob - Examples of Recommendations powered by MapReduce and mrjob

reddit-bigquery - Code + Jupyter notebook for analyzing and visualizing Reddit Data quickly and easily

rendr - Render your Backbone.js apps on the client and the server, using Node.js.

responses - A utility for mocking out the Python Requests library.

restricted-boltzmann-machines - Restricted Boltzmann Machines in Python.

rnn - Recurrent Neural Network library for Torch7’s nn

RPi-KittyCam - Raspberry Pi app using a camera and PIR motion sensor, written in Node.js with Johnny-Five and Kittydar for cat facial detection

sarah-palin-lda - Topic Modeling the Sarah Palin emails.

scientific-python-intro-45mins - An intro presentation of scientific Python in 45 minutes

scikit-fuzzy - Fuzzy Logic SciKit (Toolkit for SciPy)

scikit-image - Image Processing SciKit (Toolbox for SciPy)

scikit-learn-tutorial - Applied Machine Learning in Python with scikit-learn

scikits.cuda - CUDA SciKit

scipy - Scipy main repository

scipy-lecture-notes - Tutorial material on the scientific Python ecosystem

scipy-tutorials - SciPy tutorials. This is outdated, check out scipy-lecture-notes

scipy_2015_sklearn_tutorial - Scikit-Learn tutorial material for Scipy 2015

scipy_proceedings - Tools used to generate the SciPy conference proceedings

SCRNNs - This is a self contained software accompanying the paper titled: Learning Longer Memory in Recurrent Neural Networks: http://arxiv.org/abs/1412.7753.

seaborn - Statistical data visualization using matplotlib

SeattleBike - Understanding Seattle Bike Count data

secondorderdemos - second order demos

secrets-newsblur-skel - Project to get NewsBlur running on a cluster of VMs

sentence2vec - Tools for mapping a sentence with arbitrary length to vector space

sentiment-analyzer - Tweets Sentiment Analyzer

sentiment_analysis_python - Working with sentiment analysis in Python.

server-configs-nginx - Nginx HTTP server boilerplate configs

shotcut - cross-platform (Qt), open-source (GPLv3) video editor

sigma.js - A JavaScript library dedicated to graph drawing

simhash-py - Simhash and near-duplicate detection

SimpleAintEasy - A compendium of the pitfalls and problems that arise when using standard statistical methods

SimpleCV - The Open Source Framework for Machine Vision

skdata - Data sets for machine learning in Python

skimage-tutorials - Scikit-image tutorials

skip-thoughts - Sent2Vec encoder and training code from the paper “Skip-Thought Vectors”

sklearn-theano - Scikit-learn compatible tools using theano

sklearn_pycon2013 - Files for my scikit-learn tutorial at PyCon 2013

sklearn_pycon2014 - Repository containing files for my PyCon 2014 scikit-learn tutorial.

sklearn_scipy2013 - Scikit-learn tutorials for the Scipy 2013 conference

sklearn_tutorial - Intro to machine learning with sklearn

skll - SciKit-Learn Laboratory makes it easy to run machine learning experiments.

smc-challenge -

smile - Statistical Machine Intelligence & Learning Engine

snake-charmer - A self-contained Python workbench for scientific programming, data mining, maths, stats and visualization

snakebite - A pure python HDFS client

snap - Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.

spaCy - Industrial strength NLP with Python and Cython

spark-movie-lens - An on-line movie recommender using Spark, Python, and the MovieLens dataset

SparseConvNet - Spatially-sparse convolutional networks. Allows processing of sparse 2, 3 and 4 dimensional data.Build CNNs on the square/cubic/hypercubic or triangular/tetrahedral/hyper-tetrahedral lattices.

spearmint - Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper: Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural Information Processing Systems, 2012

speech-language-processing - A curated list of speech and natural language processing resources

spylearn - Repo for experiments on pyspark and sklearn

Squire - HTML5 rich text editor

sr-captcha -

stardose - A recommender system for GitHub repositories

stat-learning - Notes and exercise attempts for “An Introduction to Statistical Learning”

statistical-analysis-python-tutorial - Statistical Data Analysis in Python

statlearning-notebooks - Python notebooks for exercises covered in Stanford statlearning class (where exercises were in R).

statsintro - Introduction to Statistics

statsmodels - Statsmodels: statistical modeling and econometrics in Python

streamparse - streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.

streamtools - tools for working with streams of data

swift-cheat-sheet - A short guide to using Apple’s new programming language, Swift.

syncthing - Open Source Continuous Replication / Cluster Synchronization Thing

talk-pydata2015 -

talks - IPython notebooks and slides for talks I’ve given

tan-clustering - Hierarchical word clustering, following “Brown clustering” (Brown et al., 1992)

tauCharts - D3 based data-focused charting library. Designed with passion. Flexible.

TextBlob - Simple, Pythonic, text processing–Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

textract - extract text from any document. no muss. no fuss.

textures - Textures.js is a JavaScript library for creating SVG patterns

Theano-Lights - Deep learning research framework based on Theano

theano-tutorial - A collection of tutorials on neural networks, using Theano

Theano-Tutorials - Bare bones introduction to machine learning from linear regression to convolutional neural networks using Theano.

ThinkBayes - Code repository for Think Bayes.

ThinkStats2 - Text and supporting code for Think Stats, 2nd Edition

thumbor - thumbor is an open-source photo thumbnail service by globo.com

toolz - A functional standard library for Python.

trackpy - Python particle tracking toolkit

trackpy-examples - sample images, examples, and speed tests for trackpy

trials - Tiny Bayesian A/B testing library

tutorial_ml_gkbionics - A Tutorial on Simple Machine Learning Methods Held for the Graduate School on Bionics, 2012

twisted - Event-driven networking engine written in Python.

ufldl_tutorial - Stanford Unsupervised Feature Learning and Deep Learning Tutorial

universal-jst - precompiling your templates into JST’s, with some sugar

us-address-parser - US address parsing

Variational-Autoencoder - Implementation of a variational Auto-encoder

videogrep -

video_cnn - CS231N Final Project - Andrew Giel, Ryan Diaz

visual-semantic-embedding - Implementation of the image-sentence embedding method described in “Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models”

vowpal_wabbit - John Langford’s original release of Vowpal Wabbit – a fast online learning algorithm

waifu2x - Image Super-Resolution for Anime/Fan-Art

wand - The ctypes-based simple ImageMagick binding for Python

watchy - Stats aggregation and embeddable client library - inspired from StatsD

wifite -

wikichallenge - An implementation of Dell Zhang’s solution to Wikipedia’s Participation Challenge on Kaggle

wiwinwlh - What I Wish I Knew When Learning Haskell

word2vec - Python interface to Google word2vec

Word2VecExample - An example application using Word2Vec. Given a list of words, it finds the one which isn’t ‘like’ the others - a typical language understanding evaluation task.

xgboost - eXtreme Gradient Boosting (Tree) Library

xmltodict - Python module that makes working with XML feel like you are working with JSON

zipline - Zipline, a Pythonic Algorithmic Trading Library