Jekyll2023-11-07T10:28:39+00:00https://billyc.github.io/feed.xmlBilly CharltonData Scientist at TU-Berlin. I create interactive data visualization and analysis tools for modeling large systems.
Creating and visualizing a model of all traffic in Switzerland2020-02-20T00:00:00+00:002020-02-20T00:00:00+00:00https://billyc.github.io/blog/2020/02/swiss-traffic-viz<p>At TU Berlin, we are building a MATSim transport simulation model of all trips occurring within the nation of Switzerland. Our goal: build a full national model <em>only from easily-obtainable data sources</em> such as census and aggregate mobile phone data. So, is it possible?</p>
<h3 id="yes">Yes!</h3>
<p><strong>Click on the image below</strong> to open a live animation of a 1% sample of 8 A.M. traffic. Be patient, this will take some time to load. The dataset is quite large!!</p>
<p><a href="https://vsp-snf.surge.sh" target="_blank">
<img src="/images/2020/snf-swiss-simulation.jpg" alt="Swiss Traffic" />
</a></p>
<p>To build a MATSim model, the two key inputs needed are a depiction of the transport system (i.e. the roadway network and transit routes and services) and the activity patterns of the population.</p>
<ul>
<li>The networks can be generated from publicly available data sources such as <a href="">OpenStreetMap</a> and <a href="">GTFS</a>. For this study we used OSM-derived networks created at ETH Zurich.</li>
<li>But the activity patterns are usually more difficult to create without specialized datasets such as regional travel surveys.</li>
</ul>
<p>Our approach was to use the Swiss Census to identify the home locations of the population, and aggregate summaries from the SwissCom mobile phone provider for estimating a matrix of municipality-to-municipality commute patterns. Much more detail will be forthcoming in the project final report.</p>
<h3 id="comparison-to-existing-simulation-model">Comparison to existing simulation model</h3>
<p>ETH Zurich has a very detailed model of Switzerland already in operation. How different are these models?</p>
<ul>
<li>The ETH model includes freight, which we have not addressed thus far</li>
<li>The ETH model is fully calibrated based on existing traffic counts and other measures</li>
</ul>
<p>After removing freight trips so the models are comparable, our simplified model had about 4.5 percent fewer activities. Since both models use census data for home locations, and since the work locations are coming from the mobile phone data, we decided to take a closer look at the other activities: nonwork trips such as social, recreational, and shopping trips for which we had less data.</p>
<p>Some differences are visible in the side-by-side comparison:</p>
<p><img src="/images/2020/snf-secondary-activities-comparison.png" alt="Non-work activity locations" /></p>
<p>We will be looking into this in more detail as the project continues. Lots of fun stuff to sink our teeth into!</p>
<h3 id="afterword">Afterword</h3>
<p>The research described here is a proof of concept, not a fully-calibrated tool ready for planning or decisionmaking. With further development, this approach could be considered one of the most cost-effective and simple methods of generating a usable MATSim scenario for many different regions.</p>
<p>This work is sponsored by the <a href="http://www.snf.ch/en/Pages/default.aspx">Swiss National Science Foundation</a> and performed by researchers at <a href="https://vsp.tu-berlin.de">TU Berlin</a>.</p>At TU Berlin, we are building a MATSim transport simulation model of all trips occurring within the nation of Switzerland. Our goal: build a full national model only from easily-obtainable data sources such as census and aggregate mobile phone data. So, is it possible?Exploring Uber/Lyft pickups and dropoffs in San Francisco2020-01-15T00:00:00+00:002020-01-15T00:00:00+00:00https://billyc.github.io/blog/2020/01/visualizing-uber-lyft<p>I was working with the San Francisco County Transportation Authority on some other data visualization tasks when they asked me if the platform we were building could help them explore a new dataset they had stealthily acquired on ridesharing trips…</p>
<p>For the full blog entry on this topic, <strong>check out my <a href="https://medium.com/hackernoon/visualizing-uber-and-lyft-usage-in-san-francisco-928208b1978a">Medium post</a>.</strong></p>
<p>The <a href="https://www.sfcta.org">SFCTA</a> is at the forefront of research on the impact “ridesharing” services such as Uber and Lyftare having on our most congested cities. These services, referred to as “Transportation Network Companies” or TNC’s by urban planners, often don’t share much data with public agencies. The following data was an exciting first.</p>
<h3 id="the-tncs-today-data-explorer">The “TNCs Today” data explorer</h3>
<p><strong>Click to explore more than 200,000 daily TNC (Uber and Lyft) trips in San Francisco:</strong></p>
<p><a href="https://tncstoday.sfcta.org/" target="_blank">
<img src="/images/2020/tncs-screenshot.jpg" alt="TNCs Today" />
</a></p>
<p>There is a lot to play with:</p>
<ul>
<li>Try out the <strong>2D and 3D views</strong>: 3D really shows the striking patterns of TNC activity in the city across different days of the week, while the 2D view makes it a bit easier to click and explore individual locations.</li>
<li><strong>Clicking on any block</strong> on the map will pop up a daily graph of pickups and dropoffs for that area. You can then switch to different days of the week to see how Mondays differ from Fridays, for example.</li>
<li>You can select views by day of week, and you can explore either all-day totals or focus on trips during a specific hour.</li>
</ul>
<h3 id="notable-nuggets-in-the-data">Notable nuggets in the data</h3>
<ul>
<li>
<p>Trips in Ubers and Lyfts go up and up as the week progresses.</p>
</li>
<li>
<p>Fridays have the most daily trips on average. You can easily see the commute “humps” during the AM and PM rush hours — when traffic is already at its worst. You can also see a lot of evening and late-night trips, which aren’t as prevalent mid-week.</p>
</li>
</ul>
<p><img src="/images/2020/tnc-gif.gif" alt="Trips by Day of Week" /></p>
<ul>
<li>
<p>Weekdays have a predictable commute pattern with two peaks in the AM and PM rush. Fridays and Saturdays have much more evening travel than other days do, extending very late into the nighttime</p>
</li>
<li>
<p>Uber and Lyft trips are far more frequent in the northeast quadrant of the city, basically north of Cesar Chavez and east of Divisadero, on all days and at all times of day</p>
</li>
<li>Notable tourist attractions such as Fisherman’s Wharf, the Golden Gate Bridge, and GG Park museums are easily visible, and have very different time-of-day distributions than downtown</li>
<li>
<p>Weekend hotspots show up on Friday and Saturday nights: the Castro, Mission/Valencia, North Beach, the ballpark, and many others</p>
</li>
<li>Lots of late-night trips to and from the Castro on Friday nights: 🍸🍸</li>
</ul>
<p><img src="/images/2020/tnc-castro.png" alt="Castro" /></p>
<h3 id="leveraging-open-source-in-the-public-sector">Leveraging open source in the public sector</h3>
<p>We had already settled on a fully open-source stack of technologies as a base for the agency’s upcoming data visualization efforts. These were more than just “free tools” — the combination of these components resulted in something far more flexible, and just as powerful, as any off-the-shelf product I could have envisioned.</p>
<ul>
<li>
<p>The back-end database is PostgreSQL with PostGIS spatial extensions. The PostGIS extension allows us to do cool stuff like geocoding, spatial buffers, paths, and offsets. PostGIS is awesome.</p>
</li>
<li>
<p>The Javascript front-end uses <a href="https://vuejs.org">Vue.js</a> for templating and reactive elements; what a pleasure learning and using this framework, and <a href="https://www.mapbox.com">Mapbox GL</a> for the interactive 2D/3D map.</p>
</li>
</ul>
<p>Have fun playing around with the tool! I’m proud of this one.</p>I was working with the San Francisco County Transportation Authority on some other data visualization tasks when they asked me if the platform we were building could help them explore a new dataset they had stealthily acquired on ridesharing trips…Software Carpentry: teaching real-world software skills to planners2015-12-17T00:00:00+00:002015-12-17T00:00:00+00:00https://billyc.github.io/blog/2015/12/software-carpentry<h3 id="or-how-to-stop-using-excel-for-everything">(or, how to stop using Excel for everything)</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>This was originally posted on the PSRC I/O Data Team Blog -- https://psrc.github.io
</code></pre></div></div>
<p>By far, the most difficult challenge I faced at PSRC was finding the time and resources to sharpen our team’s technical skills, especially related to software and programming. Modeling requires more and more software development chops these days, yet few of agency staff have Computer Science degrees.</p>
<p>This disconnect has real consequences: familiar tools like Excel and GIS encourage point-and-click workflows that aren’t very reproducible, and results are difficult to review for accuracy. So, how can we catch up and learn more modern approaches to building great software for data analysis?</p>
<p>It turns out we’re not the only people thinking about this.</p>
<h3 id="software-carpentry">Software Carpentry</h3>
<p><a href="http://software-carpentry.org/">Software Carpentry</a> (SWC) is a non-profit volunteer organization whose mission is to teach modern lab skills (i.e., software) to scientific computing researchers. They focus on training university graduate students in the sciences, but their materials and methods are extremely relevant to the technical staff at organizations such as ours.</p>
<p><img src="/images/blog/software-carpentry-large.png" alt="Software Carpentry Logo" /></p>
<p>SWC sponsors <a href="http://software-carpentry.org/workshops/index.html">workshops</a> for learners around the globe and also has a <a href="http://software-carpentry.org/pages/join.html">formal training program</a> for aspiring instructors like myself. The workshops focus on just a few key skills:</p>
<ul>
<li>Learning the <a href="http://swcarpentry.github.io/shell-novice/">command prompt</a> like a ninja (called the Unix “Bash Shell”);</li>
<li>Using <a href="http://swcarpentry.github.io/git-novice/">Git for version control</a>;</li>
<li><a href="http://swcarpentry.github.io/python-novice-inflammation/">Introductory programming</a> in Python or R, conveniently the two most common languages for data scientists like us;</li>
<li>Data and database management using <a href="http://swcarpentry.github.io/sql-novice-survey/">SQL</a>;</li>
<li>Workflow automation using <a href="http://swcarpentry.github.io/make-novice/">Make</a>.</li>
</ul>
<p>Their approach is proven: students often report being 10%, 20% or <a href="http://software-carpentry.org/pages/testimonials.html">even 10X</a> more productive after going through the SWC workshops. As a result, the trainings are far more popular than they can possibly handle: I’ve been on the waiting list for instructor training for well over a year now.</p>
<h3 id="going-rogue">Going Rogue</h3>
<p>This fall, I decided PSRC couldn’t wait any longer. Software Carpentry puts all of their workshop materials online in a open format that encourages reuse, remixing, and collaboration. I “went rogue” and tried to learn as much as I could about their approach, in order to roll out a home-spun workshop here at PSRC on my own.</p>
<p><img src="/images/blog/how-learning-works.png" alt="How Learning Works Book Cover" /></p>
<ul>
<li>
<p>Their workshop materials are based on actual educational theory and research. The book <a href="http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470484101.html">How Learning Works</a> summarizes that research and I read it cover to cover. (I thought I was just going to skim it, but I couldn’t believe how much useful, relevant information was packed in there! A valuable resource for anyone who occasionally teaches but doesn’t have educational training.)</p>
</li>
<li>
<p>I spent a lot of time in the past month <a href="https://www.youtube.com/results?search_query=software+carpentry">watching recorded videos</a> of other SWC trainers teaching shell, Git, and Python. Having all that on Youtube is incredibly helpful. Great stuff to watch on the big TV while I’m chopping onions and carrots in the kitchen.</p>
</li>
<li>
<p>All of SWC’s lessons <a href="http://software-carpentry.org/lessons.html">are on GitHub</a>, as well as some high-level slideshows. Think about that: all these materials are high-quality and available for anyone with an interest. There are even user-contributed lessons on topics like <a href="http://swcarpentry.github.io/capstone-novice-spreadsheet-biblio">From a Spreadsheet to a Database</a> — particularly relevant to my group of learners.</p>
</li>
<li>
<p>Just this fall, the instructor training lesson itself went up on Github. I already had a good idea where things were headed but this lesson emphasized some important teaching guidelines which I think helped ensure a successful workshop.</p>
</li>
</ul>
<h3 id="initial-workshop">Initial Workshop</h3>
<p>This all converged for our first scripting class this week. We targeted a very small set of students (just six), so that I could test out the materials (and myself) before exposing a wider audience. Each of the students was already well-versed in at least one of the other topics; the plan is for them to be my co-teachers for Git, Python, and SQL in the coming weeks. I wanted them to sit through a training class first, so they’d get a feel for the SWC way of doing things. I’ll give them real “trainer training” after the holidays.</p>
<p><img src="/images/blog/sw-carpentry-class.jpg" alt="Class Snapshot" /></p>
<p>Since this was a team of coworkers working from one office downtown, we used “Remote Desktop” in the classroom so that each student would remote control their primary desktop workstation from a loaner laptop. Why? Because this way, everyone could install the required software on their primary desktop ahead of time. They accessed their familiar desktop machine via a laptop during the workshop, and then had all that software waiting for them when they returned to their desks at the end of the day. This only worked because it was an all-staff in-house production; you probably couldn’t do this for a general audience, but it was like magic for us here.</p>
<p>The Software Carpentry method is big on feedback: every student puts a green or pink sticky note on the back of their laptop to signify whether they’re keeping up, getting behind, or need help. The stickies are used for written good/bad feedback at the end of the session, too. The stickies showed me right away that the chapter on scripting “loops” was confusing, while the lesson on pipes and filters went really well.</p>
<p>Hands-on quiz-like questions at the end of each lesson allowed students to test their new skills. Enthusiasm at the end of the day was really high: one pink (“needs work”) note said <em>“I want to learn more: are there homework or applications for our team specifically or other resources???”</em> — a pretty good sign. =)</p>
<h3 id="whats-to-come">What’s to Come</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2016 UPDATE: I am now a certified Software Carpentry Instructor!
</code></pre></div></div>
<p>Given the success of the shell scripting lesson, we’ve decided to move forward internally with the rest of the Software Carpentry curriculum in early 2016. Everyone on the Data team here at PSRC will go through every class; those of us who already have some of the skills will be “helpers” for those who are at earlier stages of learning. And I’m not teaching all the courses – there are plenty of people here at PSRC who know these topics well enough that I can sit back and watch them teach each other.</p>
<p>I’m going to wait for feedback from the wider Data Team rollout before committing to further work, but my hope is that things are successful enough that we’ll want to advertise some similar workshops for interested staff at our member/peer agencies in the region.</p>
<h3 id="thanks-where-thanks-are-due">Thanks where thanks are due</h3>
<p>I can’t even begin to thank those who had a hand in developing and improving the Software Carpentry approach and materials. The quality and breadth are both really stunning. Even without the formal training, with just a bit of legwork I was able to craft a successful internal workshop.</p>
<p>Let us know if you have questions about any of this! Your feedback is welcome as we consider expanding this to a wider audience in 2016.</p>
<h3 id="links-and-resources-you-can-use-right-now">Links and Resources you can use right now</h3>
<ul>
<li><a href="http://software-carpentry.org">Software Carpentry Website</a>. Free and open workshop materials, and information on how to attend a workshop or become an instructor.</li>
<li><a href="http://www.amazon.com/How-Learning-Works-Research-Based-Principles/dp/0470484101">How Learning Works</a>. The latest educational research on what motivates students to learn and how to get them to actually learn what you’re teaching.</li>
</ul>(or, how to stop using Excel for everything)OMX Trip Tables Now Available2014-09-30T00:00:00+00:002014-09-30T00:00:00+00:00https://billyc.github.io/blog/2014/09/omx-trip-tables<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>This was originally posted on the PSRC I/O Data Team Blog -- https://psrc.github.io
</code></pre></div></div>
<p>PSRC now has OMX-format trip tables available for public use. Trip tables are two-dimensional matrices containing the estimated number of trips between any two origin/destination points in the Puget Sound region. The tables for our region are aggregated into 3,700 neighborhoods (or “zones”) for convenience, so the matrices are 3700x3700 in size. You can think of a trip table as a big “from/to” table, on a neighborhood-to-neighborhood scale.</p>
<p><strong>Download the file here: <a href="https://file.ac/G20Z7E0ezbU/">PSRC 2010 OMX Trip Tables</a></strong></p>
<p>Trips are stored separately by:</p>
<ul>
<li>Mode (auto, transit, bike, walk, etc)</li>
<li>Household income quartile, for work trips</li>
<li>Time period (A.M. peak, midday, P.M. peak, evening, late-night)</li>
</ul>
<h3 id="how-were-these-tables-created">How were these tables created?</h3>
<p>We use the <a href="http://www.psrc.org/data/models/trip-based-travel-model">PSRC travel model</a> to predict the travel patterns of area residents. The model is based on census data, local survey data, and human behavior research to produce a reasonable estimate of travel patterns. These are just estimates and can’t possibly predict or capture every nuance of human behavior, but they’re useful for analyzing the effects of growth and investment in our transportation infrastructure.</p>
<h3 id="how-do-i-look-at-these">How do I look at these?</h3>
<p>The <a href="https://sites.google.com/site/openmodeldata/home">OMX format</a> is an open format jointly created by PSRC and several other public and private agencies in the transportation planning field. It was expressly designed to allow sharing of matrix data amongst planning agencies and with the public. You can download the <a href="https://sites.google.com/site/openmodeldata/file-cabinet/omx-viewer">OMX Viewer app for Windows</a>; other platforms should search for “vitables”. OMX is really just an <a href="http://www.hdfgroup.org/HDF5/">HDF5</a> file with a specific layout; HDF5 can be read by <a href="http://pandas.pydata.org/">pandas</a> and <a href="http://www.pytables.org/moin">pytables</a> easily.</p>
<p><img src="/images/blog/omx-screenshot.png" alt="OMX Viewer" /></p>
<p><strong>Zone definitions</strong></p>
<p>You’ll also need to know the lookups for the zone numbers. The trip tables are divided into “zones” of various sizes (smaller zones in dense areas, larger in more suburban and rural areas). Zones are numbered continuously from 1-3700. Thus, to make heads or tails of these trip tables you’ll need to know what areas those numbers correspond to.</p>
<p>The attached zone “shape file” defines the zones. You can view this shape file using the freely available <a href="http://www.qgis.org/en/site">QGis</a> program, or you can use ArcGIS if you have access to that non-free program.</p>
<p>To view the zone definitions boundaries and their corresponding zone numbers using the QGis application:</p>
<ul>
<li>Download and unzip the <a href="/attachments/2014/psrc-shapefile.zip">TAZ2010 shapefile</a> (a “shapefile” is actually a collection of related files which define the boundaries). Be sure to unzip it after downloading.</li>
<li>Open QGis and choose menu “Layer > Add Vector Layer…”, and browse/open the <strong>taz2010.shp</strong> file</li>
<li>In the “Layers” panel on the left, click on the taz2010 layer to activate it</li>
<li>Choose menu “Layer > Labeling”, and select “Label this layer with” and choose “TAZ”.</li>
</ul>
<p>You can now see the zone numbers. QGis is a very advanced and feature-rich application, which is a nice way of saying it’s hard to use. It may take you a while to learn your way around it. <em>Pro-tip:</em> your mouse wheel will zoom the map in and out; press and hold your mouse wheel down, and then drag to pan the map.</p>
<p><img src="/images/blog/qgis-screenshot.jpg" alt="QGIS Zone Definition" /></p>
<h3 id="usage-example-bike-trips-from-ballard">Usage example: bike trips from Ballard</h3>
<p>If you’re savvy with GIS, mapping the data in these trip tables is fairly straightforward. Here’s a python code snippet that uses the omx libraries (see above) to fetch the estimated midday bike trips from zone 242 (in Ballard) to all destinations. Sample python code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">omx</span><span class="p">,</span><span class="n">numpy</span>
<span class="n">trips</span> <span class="o">=</span> <span class="n">omx</span><span class="p">.</span><span class="n">openFile</span><span class="p">(</span><span class="s">'non_motorized.omx'</span><span class="p">)</span>
<span class="n">midday_bikes</span> <span class="o">=</span> <span class="n">trips</span><span class="p">[</span><span class="s">'mbike'</span><span class="p">]</span>
<span class="n">ballard</span> <span class="o">=</span> <span class="n">midday_bikes</span><span class="p">[</span><span class="mi">242</span><span class="p">,:]</span>
<span class="n">numpy</span><span class="p">.</span><span class="n">savetxt</span><span class="p">(</span><span class="s">"ballard.csv"</span><span class="p">,</span> <span class="n">ballard</span><span class="p">)</span>
</code></pre></div></div>
<p>Once you have that row of data, you can use GIS to map it. Here’s what such a map might look like: lots of trips to downtown, plenty around north Seattle, and a few outliers further afield:</p>
<p><img src="/images/blog/bike-trips-from-ballard.jpg" alt="Bike trips from Ballard" /></p>
<h3 id="what-next">What next?</h3>
<p>We’re excited to finally be able to provide this data to you in an open format, and are curious to see what you do with it. You could pull one column from a trip table to get all the morning commute trips into a block of downtown Seattle, for example. Or pull a row for your home zone to see the travel destinations of all the households in your neighborhood.</p>
<p>If you know the Python programming language, go back to the <a href="https://sites.google.com/site/openmodeldata">OMX website</a> to see if you can crack into these OMX files and start poking around the tables yourself.</p>
<p>If you don’t know Python, well… we’ll be adding further blog posts soon with some recipes of how to do some of that stuff.</p>Billy CharltonThis was originally posted on the PSRC I/O Data Team Blog -- https://psrc.github.io