Academic – SmexyyWeby

Recently I have been intrigued by the uses of Monte Carlo simulation and a myriad of their uses in probabilistic modelling techniques especially in NLP and social network modelling. I wanted to learn more about the monte carlo process and understand what are some of its basic uses. An often cited example people gave me was that of calculating the value of $\pi$ defining it as 4 times the area of a circle inscribed in a unit length square.

The below figure shows the distribution of randomly generated points falling inside and outside the circle. Also the other figure shows how with the greater number of iterations the value of pi converges to the actual value of pi.

Monte Carlo Method for approximating the value of pi using ratio of area of circle inscribed in unit length square.

The intuition behind the simulation process is that if we randomly generate points confined inside a square circumscribing a circle then the probability of the points falling inside the circle will be directly proportional to the area of the circle. Since the circle is inscribed in the square hence its radius is 0.5 and hence the area will be $\pi/4$. This ratio is also approximately equal to the number points randomly generated inside the square which fall inside the circle to the total generated points. The approximation will should converge to the value equal to pi if we have very large number of points ideally tending to infinite points inside the square.

The below code does the simulation and generates the above figures:

	"""
	Using Monte Carlo to find the value of pi.
	Intiution: Ratio of area of circle to area of uniq length squre it is inscribed it can be used to approximate pi.
	Area of circle = pi/4
	Area of square = 1
	Ratio = pi/4
	Let us generate random points x,y inside the square. The probabilty of the point being inside circle is equal to the above ratio.
	Circle centered on origin.
	Given: -0.5 <= x,y <= 0.5
	A point is inside circle if x2 + y2 <= 1/4
	"""
	import matplotlib.pyplot as plt
	from matplotlib import gridspec
	import numpy as np

	plt.clf()
	MAXN = 10000 # Max number of iterations
	n = 1 # Current value of iteration
	n_arr = [] # Saving values of iterations
	pi_arr = [] # Approx values pi for given iteration
	n_circle = 0 # Number of points inside circle

	fig = plt.figure(figsize=(10,15))
	gs = gridspec.GridSpec(2, 1, height_ratios=[2, 1])
	ax = [plt.subplot(gs[0]),plt.subplot(gs[1])]
	# Create a circle just for representation
	circle = plt.Circle((0, 0), radius=0.5, fc='y')
	ax[0].add_patch(circle)

	# Run the simulation
	while n <= MAXN:
	p_color = 'b' # If point is outside circle mark it with blue color
	x = np.random.uniform(-0.5,0.5)
	y = np.random.uniform(-0.5,0.5)
	if (x2 + y2) <= 0.25:
	n_circle += 1
	p_color = 'r' # If point is outside circle mark it with blue color
	ax[0].plot(x,y,p_color+'+')
	n_arr.append(n)
	pi_arr.append(n_circle4.0/n) # Value of pi = 4points in circle/points overall
	n+= 1

	print n, n_circle, pi_arr[-1], n_arr[-1]
	ax[1].plot(n_arr,pi_arr,"b-",label="monte carlo")
	ax[1].plot(n_arr, np.pi*np.ones(len(n_arr)),"r-", label="actual")
	ax[1].legend()
	ax[1].set_xlabel("Number of iterations")
	ax[1].set_ylabel("Value of pi")
	plt.title("Monte Carlo approximation of pi")

view raw

monte_carlo_circle.py

hosted with ❤ by GitHub

The intriguing thing about this simulation is that using just 10000 points I am getting pretty good value of pi around 3.1392. This approximation however will change with each different run of the code as the randomly generated points change.

I am interested in learning more about such basic applications of monte carlo processes so that the understanding of the more complicated models is easier.

As a part of my course on Social Visualization (CS 467) I had the opportunity to review the following three articles about ethical research on the internet related to the Facebook Cornell study of emotion on Facebook newsfeed:

Facebook fiasco: was Cornell’s study of ‘emotional contagion’ an ethics breach? by Chambers
Annoying Internet Users in the Name of Science by Bruckman
The Destructive Silence of Social Computing Researchers by Bernstein

Here is my review based on the above articles on the issue of ethics in academic – industry collaboration for social research:

Summary of Articles

The different articles all discuss about ethics of research without consent in the social media platforms specifically in context of the controversial Facebook study on contagion of emotions on Facebook. The three articles shed light on the situation from different perspectives. The guardian article discusses the whole situation in detail and talks about the negligence of IRB by all the parties involved in the study i.e. Facebook, Cornell University, PNAS journal. The WordPress article talks about another social scientist’s experience and pros of doing online research without human consent. The medium article talks from industry’s perspective of why these kind of researches continue to happen at industry level and this collaboration with academia just exposes the need to altering the policies for IRB for online research.

Key ethical points

I think the guardian article highlights how all the involved parties have tried to escape giving explanation of the whole situation and questions if the research was funded by the Army. It exposes the details in the facebook data policy which allows the company to run such experiments however the involvement of academic scientists without proper IRB is questionable. The article also discusses on of those scary situations where the nexus between corporate and academia will be looked as a way to bypass ethical research standards, which is not a good thing.

The wordpress article is by another social scientists who explains using her own previous researches that if the research doesn’t cause any harm then it should be allowed. She offers suggestions on making the research non-harmful by removing the negative sentiment aspect from the study. In her previous research the author entered various chat forums and depending on the experimental design shared their intention of doing research and allowed system to kick them out if the chat room was unwilling to participate.

The medium article talks from the perspective of a previous data scientist and current academic researcher. He advocates the need for a differentiating social media IRB policies with that of the real world scenario. Socio technical systems allow us to run very huge sized experiments with high efficiency which is not possible in the physical world experimental setting. He also details how making online research systems include a consent form and other nitty gritties of IRB requirements makes the systems unusable and reduces the participation because of people’s fear of things that can happen to their data.

Conclusion

According to me, the industry and academia collaboration are really useful and required if we want to do representative researches. Most of the research which happens by the academic community is on very small sample of social media data because of their lack of access. The corporate partnership if done for a more academic cause would help in getting more useful results which can be applied back to the advancement of social systems.

Also, any kind of research which may cause any kind of physical or mental pain should be highly regulated. This however, gives an opportunity to tackle this problem from a more user interface perspective as well. How can we make interfaces which don’t scare people away from participating in research and how can they still serve as mediums of communication of the way the user data will be used.

We cannot control what experiments which corporate companies run without our consent and we rarely get to even get access to their results. Most of the experimental results are used as cash generators for future. However, with the corporate and academic partnership these results can be used for not just the revenue increase of the company but also to advance human science and this in a way demonstrates the involvement of the company in question in corporate social responsibility in some way.

Source: http://www.globalresearch.ca/wp-content/uploads/2014/07/Facebook-Emotional-Manipulation-400x300.jpg — Source: http://www.globalresearch.ca/wp-content/uploads/2014/07/Facebook-Emotional-Manipulation-400×300.jpg

To conclude, I agree to the research nexus between facebook and cornell however I feel the effect of the research should have been limited to positive and neutral messages only so as not to cause any harm. To quote the wordpress article “spreading sunshine” in not unethical.

Category: Academic

Calculating Pi using Monte Carlo Simulation

A review of research ethics of internet using the Facebook Cornell Collaboration

Share this:

Share this: