top of page

Avocado in Text

Market Correlation with Media Coverage and Topic Modeling Analytics with R

avocado price 2.gif

Tools

Team Size:

Time Frame: 

Tools:    

My Role: 

1

8 weeks

Google Colab, Python, Excel, R Programming, R Studio, Topic Modelling, 

Data Analytics Data Visualization Engineer

Avocado Trade Illustrated by Andrew Rae

avocado trade.jpeg

from New York Times

Why This Topic

It is true that avocado is a superstar in global trade, according to Brook Larmer in his article published in New York Times Magazine, “The precious commodity that drives Michoacán’s economy and feeds an American obsession is not marijuana or methamphetamines but avocados, which local residents have taken to calling “green gold.” (Larmer, 2018)  The superstar brings lucrative  to some stakeholders but all product participants such as farmers, cartels, businesspersons within the border of Mexico as well as movers, package or unpackaging person, restaurants owners and consumers within the border of the US, they are all happy about the way it works.

Data Analytics

New York Times Avocado Articles with Pub_date and Lead_paragraph

image4.png

​about the correlation between price/volume of avocado and article counts in New York Times. Since two datasets are not equal in length, I changed the weekly data into monthly sum or monthly average for healthy analysis in R. It is also possible to accomplish the aggregation in R, but I assume it is more direct to see the count visually in Excel. Formulas used for aggregating time data can be found above.

Excel Formula For Aggregating Volume and Price From Week to Month

image18.png

Volumn Trend Before and After aggregation

image8.png
image2.png

Monthly Avocado Price and Counts of NYTimes Article as Keywords

image10.png
image22.png

It is interesting to find out that Pearson's coefficient value is ( r = -0.4481 ), which is almost half of the same value of before aggregation, ( r = -0.8574 ). I assume that the shrink of correlation of volume and price could come from the sum and average process, since merged fluctuates into a trend that is relatively smoother.

In addition, it is surprising to see that while there is a moderate correlation( r = 0.366) between the number of articles published in New York Times and monthly total volume of avocado consumption in New York, the correlation between monthly average price and article counts is fairly weak( r = -0.07683)

Topic Modelling of avocado articles in New York Times

image9.png

The method used in this text analysis is latent Dirichlet allocation(LDA) statistical model, which according to Martin Ponweiser, it is the most prominent topic model. Based on data requested from New York Times API, only the leading paragraphs of avocado articles are being analyzed in this model. R codes are accessible in appendix.

 

Since there are more than 1500 articles in this dataset, I set the number of topics to 10, displaying the top 5 terms. It can be seen from the following graph of how strong each term is within each topic. Top three terms across topics are restaurant, new and city with appearance of 4 times, 8 times and 3 times. The higher beta values means placing less weight on having each topic composed of only a few dominant words.

avocado gif 1.gif

Key Takeawas

From analysis above, it can be seen that coefficient with raw data rather than aggregated data shows greater correlation in terms of sum of volumes and average of prices. In this aggregated monthly data, it is certain that there is stronger correlation between price and volume of avocado, there is also moderate correlation between volume of avocado and no. of articles in New York Times. However, the coefficient value of correlation between price and no. of articles is relatively low. 

 

In further studies, what are the driven factors of price and number of articles can be studied. If the full text of avocado articles are available, a closer touch of topic modeling could be implemented. Besides, it is suggested that sort data in excel rather than r studio, and remember that CSV won’t save formulas.

Created by De Han

  • LinkedIn
  • Instagram
  • Spotify
bottom of page