This poster discusses the data pipeline for a project involving car and road sensor data, which is processed for creating machine learning models for traffic and weather behavior. The car data includes the coordinates of the car, and the road data includes max/min temperatures, precipitation and humidity. The simulated data is ingested by Kafka and consumed by Spark for stream processing. The data is persisted in Cassandra, a NoSQL datastore. These systems are deployed on NCSA hardware and Microsoft Azure. Each system runs on a cluster of virtual machines. The pipeline is integrated with Python APIs.