Resources/Tools

In this page, I will document and share some useful self-learn recources and tools.

  1. If you are on the road of applying for PhD in communication related fields in the U.S., hope my experience could help you, PLS check here: How to apply for PhD in communication in the United States of America?

2. How to extract Twitter data from Twitter Academic API?


(Unfortunately, below tips can not be applied as of now due to Twitter (X) limiting free API access)

Twitter data is one of the most frequently used data in computational social science research. Currently, there are two major ways for researchers to access the twitter data: One is Twitter APIs (all levels); another one is commercial third party platforms such as Brandwatch, Synthesio.

I had experience with both Twitter Academic API and third party platforms, and I list some Pros/Cons of both ways here.

Twitter Academic API: (Pros) free, flexible, and full-archive history data; (Cons) Requiring programming skills as you need to write the code for access and scrape.

Third party platforms: (Pros) 1. More user friendly interface/dashboard, so you do not have to use programming skills. 2. More convenient when collecting large-scale data from cross platforms as most of those third party platforms allow you to select multiple data sources with the same search query. (Cons) 1. Paid. 2. Downloading limits per day or per downloading action.

If you want to know more systematic and scientific assessment of those different tools, you could read this paper: Chen, K., Duan, Z., & Yang, S. (2022). Twitter as research data: Tools, costs, skill sets, and lessons learned. Politics and the Life Sciences, 41(1), 114-130.

If you have programming background and skills, I highly suggest you to try with the Twitter Academic API! Check out the official Twitter Academic API documentations and Tutorials. If you use R, there is one R package “academictwitteR” that could help you collect tweets from new v2 API endpoint for the Academic Research Product Track. This package also has one function that could help you convert JSON files into dataframe directly in R. From my experience, this package is super simple and easily to use, all functions work well except for two functions: the resume_collection() and update_collection(). When you collect large volume of data, it is normal that your search query will be interrupted at some points. If that is the case for you, and above two functions also can not work out, one of the solutions is to run a new round of collection with the same query to collect remain tweets, set the end tweets date and time from where the previous collection stopped (You could check the last collected tweets' date and time in the “data_xxx.json"file). Note 1:The collection is processed from the newest dates and time back to the older dates and times. Note 2: The current cap of Twitter Academic API is set to 10 millions per month, plan your time schedule when you need to collect more than 10 millions' data.

If you use Python, there are also many Python packages, search on google!

I used the twitter Academic API v2 endpoint with “academictwitteR” package to collect more than 10 millions' tweets around the 2020 U.S. Presidential Election misinformation for my master thesis (it cost me more than 1 weeks with 24 hours consecutive collecting in R per day), I will upload the scrape R script along with the dataset to the github later (also will update here) after I finish this project!

Good luck with your data collection trip!

3. How to build up a personal website?


I started learning web design in Dr. Lei Guo’s EM757 “User-Producers 2.0: Developing Interactivity” class at BU. Largely credited to this class’s content, there are different ways to create a website: 1. HTML + CSS coding by hand; 2. WYSIWYG Web authoring tools such as Adobe Dreamweaver; 3. Content management systems (CMS) such as Wordpress, Blogger.com, Drupal, and Joomla.

We used the Wordpress combined with CSS and HTML for the course project. Later, I started learning another way for website building: Hugo + GitHub, this is exactly the way I used for this persobal website building!

Here are some Pros and Cons of “Hugo + GitHub” approach I summarized upon my own experience: (Pros) 1. Totally free! You can host your website on the GitHub without paying any fee for the hosting sites or domain name. Of cource, if you are not satisfied with this type of “https://[username].github.io” domain name, you can buy a domain name separately. 2. It is faster and easier for you to update contents on your website, as Hugo documentation stated “Hugo is a static HTML and CSS website generator written in Go. It is optimized for speed, ease of use, and configurability.” So, if you are blog writing enthusiasts, or you have the need to update contents on your website frequently, go with Hugo! (Cons) 1. More complicated than using content management systems (CMS) such as Wordpress, Blogger.com, Drupal, and Joomla, as you need to know and write basic HTML, CSS codes; 2. You have to adapt to write the code or content in text editor and Markdown file; 3. There is access limitation of GitHub host site, therefore, if you presume your website will have large volume of visitors, you’d better to buy a commercial host site.

If you are ready to use “Hugo + GitHub”, check out those super useful tutorials:

Hugo official documentations and sources; Hugo official tutorials

Hongtao Hao: 如何零基础免费搭建个人网站, How to Deploy A Hugo Website Using GitHub Pages

Good luck with your website building!