The goal of this project is to provide captions to the images like our brains do. All thanks to great development and progress in the field of Deep learning and Computer Vision algorithms. Two main techniques involved: convolutional neural networks (CNN) and deep learning, which includes long short term memory (LSTM), a sort of RNN.